National Weather Service data outage: 3 reminders for improving resiliency

    June 7, 2018

    Memorial Day Weekend 2018 was a deluge for many parts of the country. Flash floods ripped through Ellicott City, Maryland. Subtropical storm Alberto triggered states of emergency in Florida, Mississippi and Alabama. Severe thunderstorms and tornadoes tore through the West and Midwest.

    Amid all this, the National Weather Service (NWS) experienced an outage from Sunday evening into Monday, leaving meteorologists unable to access the weather data the NWS provides.

    It turns out the NWS switched to a new system for distributing data in recent years, and AccuWeather and other consumers of that data have expressed concerns about how the system would handle spikes in requests for data during major storms. Those fears weren’t unfounded.

    It’s not the first time the NWS had an outage, either. There were several in 2014 due to firewall issues and in one case, too many requests from an Android app. In February 2017, two of the NWS’s core routers lost power. The Network Control Facility tried to switch over to a backup site, but failed. With both the primary and the backup unavailable, forecasts, warnings and other data went dark for nearly three hours.

    There are three big takeaways from these outages.

    First, always maintain a healthy level of paranoia. As the NWS found out last year, having a Plan A and a Plan B wasn’t enough – it needed a Plan C, or even further contingencies.

    Second, test, test, test. We’ve said it before and we’ll say it again: Your DR plan is only as good as your last test. Test regularly and often, especially if you have new systems or applications that might throw a wrench in the plan.

    Third, avoid single points of failure. Systems, especially at large government organizations, are complex, and a single point of failure can bring the whole operation to a halt. Last year it seemed to be networking and communications components that took NWS systems down. This year, it might have been applications or servers failing under a surge in traffic. In both cases, contingency plans were needed.

    An organization as crucial to safety as the NWS needs to take a close look at its resiliency, and if Memorial Day Weekend was any indication, come up with a new plan for maintaining a consistent stream of warnings, watches and forecasting data that keeps both people and organizations updated on impending storms.

    Other Posts You Might Be Interested In

    Major luxury retail brand weaves resilience into business plan

    When a well-known U.S.-based retailer acquired several new brands, management sought to establish a common IT platform to integrate operations and accommodate future...

    How To Conduct An Information Security Gap Analysis

    One of the most critical tasks I have as an information security consultant is conducting a security gap analysis. This analysis provides a comparison of your security...

    Reimagining operational resilience in the aftermath of COVID-19

    Many organizations believed they were prepared to withstand a business disruption. Then COVID-19 arrived and the reality set in: They were not as ready as they thought. ...