Memorial Day Weekend 2018 was a deluge for many parts of the country. Flash floods ripped through Ellicott City, Maryland. Subtropical storm Alberto triggered states of emergency in Florida, Mississippi and Alabama. Severe thunderstorms and tornadoes tore through the West and Midwest.
Amid all this, the National Weather Service (NWS) experienced an outage from Sunday evening into Monday, leaving meteorologists unable to access the weather data the NWS provides.
It turns out the NWS switched to a new system for distributing data in recent years, and AccuWeather and other consumers of that data have expressed concerns about how the system would handle spikes in requests for data during major storms. Those fears weren’t unfounded.
It’s not the first time the NWS had an outage, either. There were several in 2014 due to firewall issues and in one case, too many requests from an Android app. In February 2017, two of the NWS’s core routers lost power. The Network Control Facility tried to switch over to a backup site, but failed. With both the primary and the backup unavailable, forecasts, warnings and other data went dark for nearly three hours.
There are three big takeaways from these outages.
First, always maintain a healthy level of paranoia. As the NWS found out last year, having a Plan A and a Plan B wasn’t enough – it needed a Plan C, or even further contingencies.
Second, test, test, test. We’ve said it before and we’ll say it again: Your DR plan is only as good as your last test. Test regularly and often, especially if you have new systems or applications that might throw a wrench in the plan.
Third, avoid single points of failure. Systems, especially at large government organisations, are complex, and a single point of failure can bring the whole operation to a halt. Last year it seemed to be networking and communications components that took NWS systems down. This year, it might have been applications or servers failing under a surge in traffic. In both cases, contingency plans were needed.
An organisation as crucial to safety as the NWS needs to take a close look at its resiliency, and if Memorial Day Weekend was any indication, come up with a new plan for maintaining a consistent stream of warnings, watches and forecasting data that keeps both people and organisations updated on impending storms.