National Weather Service data outage: 3 reminders for improving resiliency

    June 7, 2018

    Memorial Day Weekend 2018 was a deluge for many parts of the country. Flash floods ripped through Ellicott City, Maryland. Subtropical storm Alberto triggered states of emergency in Florida, Mississippi and Alabama. Severe thunderstorms and tornadoes tore through the West and Midwest.

    Amid all this, the National Weather Service (NWS) experienced an outage from Sunday evening into Monday, leaving meteorologists unable to access the weather data the NWS provides.

    It turns out the NWS switched to a new system for distributing data in recent years, and AccuWeather and other consumers of that data have expressed concerns about how the system would handle spikes in requests for data during major storms. Those fears weren’t unfounded.

    It’s not the first time the NWS had an outage, either. There were several in 2014 due to firewall issues and in one case, too many requests from an Android app. In February 2017, two of the NWS’s core routers lost power. The Network Control Facility tried to switch over to a backup site, but failed. With both the primary and the backup unavailable, forecasts, warnings and other data went dark for nearly three hours.

    There are three big takeaways from these outages.

    First, always maintain a healthy level of paranoia. As the NWS found out last year, having a Plan A and a Plan B wasn’t enough – it needed a Plan C, or even further contingencies.

    Second, test, test, test. We’ve said it before and we’ll say it again: Your DR plan is only as good as your last test. Test regularly and often, especially if you have new systems or applications that might throw a wrench in the plan.

    Third, avoid single points of failure. Systems, especially at large government organisations, are complex, and a single point of failure can bring the whole operation to a halt. Last year it seemed to be networking and communications components that took NWS systems down. This year, it might have been applications or servers failing under a surge in traffic. In both cases, contingency plans were needed.

    An organisation as crucial to safety as the NWS needs to take a close look at its resiliency, and if Memorial Day Weekend was any indication, come up with a new plan for maintaining a consistent stream of warnings, watches and forecasting data that keeps both people and organisations updated on impending storms.

    Other Posts You Might Be Interested In

    55% of consumers switched providers due to tech complications during COVID-19

    Consumers’ dependence on digital services has been on full display during COVID-19. As per a new study conducted by OnePoll on behalf of Sungard Availability Services...

    The use of digital services more than doubled on average during the pandemic. Here’s what that means for businesses.

    Consumers have become more and more reliant on digital services in recent years. During the COVID-19 pandemic, however, those services became essential. ...

    EXL: Moving Towards a New DR Model

    When customers rely on you to manage their critical business processes, their success— and yours—rides on IT resiliency. And as more of those processes...