By James A. Martin
Over the past two years, 93 percent of organisations experienced tech-related business disruptions, according to IDC’s 2018 “The State of IT Resilience” white paper. Of those, 17 percent said the disruption was “severe.” And 20 percent said their business suffered “major reputational damage” and permanent loss of customers resulting from a disruption.
Technology isn’t the only thing that can create havoc, of course. Along with cyberattacks and critical information infrastructure breakdowns, the World Economic Forum’s 2019 Global Risks Report lists extreme weather events, natural disasters, and man-made environmental disasters among the top 10 risks most likely to occur and to have the biggest impact.
In this post, Sungard AS thought leaders look at three high-profile examples of businesses affected by unexpected events in recent years: Atlanta’s Hartsfield-International Airport, which suffered a fire and power outage; BB&T Bank, which experienced an equipment malfunction; and Marriott International Inc., the target of a massive data breach. Each event raises important resilience questions that any organisation’s business and IT leaders should be prepared to confidently answer.
Atlanta Hartsfield-International Airport (2017)
The disruption: On December 17, 2017, an electrical fire erupted in an underground tunnel that carried seven power lines from two sources to the airport. The fire caused an hours-long power outage that plunged the airport into darkness, grounded nearly 1,000 flights, and stranded about 30,000 travelers. International flights were diverted to other airports. Planes idled on the tarmac for hours. Georgia Power, the regional utility, had backup electrical equipment. But it was located in an adjacent room that the fire also damaged.
The damage: It took airlines several days to resume normal schedules following the power outage. Delta Air Lines, which has its largest hub in Atlanta, lost an estimated $25 million to $50 million in income.
The company response: Delta dispatched about 200 additional employees to help passengers. The airline also provided fee waivers so travelers could rebook their tickets. Other carriers followed Delta’s lead.
The public reaction: Angry travelers took to social media to vent their outrage at being kept in the dark. Former U.S. Transportation Secretary Anthony Foxx tweeted: “Total and abject failure here at ATL Airport today. I am stuck on @delta flight, passengers and crew tolerating it. But there is no excuse for lack of workable redundant power source. NONE!”
Key lesson learned: “A basic rule of disaster recovery, whether it’s for power systems or IT systems, is to ensure there’s a degree of physical geographic separation between your primary and secondary/backup systems,” notes Joseph George, VP of Product Management, Global Recovery Services at Sungard AS. “Also, with cyberattacks and malware-caused outages, we must ensure separation is no longer just about physical distance. There needs to be logical network separation as well.”
Resilience questions raised for every company:
- Are you aware of all the single points of failure that can affect your organisation? What’s the plan to recover from a failure at each point?
- How reliable/vulnerable are your redundant systems?
- Are your redundant systems in close proximity to each other?
- How will you quickly determine if there’s a cyber event related to an outage caused by fire or other disruption?
- How will you bring in additional employees in the event of a major disruption (as Delta did)? What will their responsibilities be?
BB&T Bank (2018)
The disruption: On February 22, 2018, an equipment malfunction in a BB&T data centre caused a three-day service outage affecting the bank’s automated phone service and ATMs. Though services began to recover the next day, customers continued to complain of account problems nearly a week later.
The damage: $15 million in lost revenue; $5 million in expenses tied to fee waivers and other costs.
The company response: In the outage’s aftermath, BB&T issued customers refunds for overdraft, overdraft protection, foreign ATM transaction and negative account balance fees—whether those transactions were related to the outage or not.
In a video posted to Twitter, BB&T CEO Kelly King apologised to angry customers. King also announced the bank spent $300 million on a new data centre with duplicate redundant data hauls, an investment he said would address the outage’s root cause.
The public reaction: “Customers were infuriated as they grappled with a system shutdown that struck in time for payday and brought a range of problems, from uncertainty over whether direct deposits had been made to inability to pay bills through online banking,” The Charlotte Observer reported.
Key lesson learned: “King did an excellent job in coming out swiftly with communications for the bank’s customers,” says Kaushik Ray, SVP of Global Client Service Management for Sungard AS. “He was transparent about what happened and shared his team’s action plan with customers. Failures can always happen. But how you handle it is very important in controlling the damage a failure can cause.”
Resilience questions raised for every company:
- Are you consistently performing the necessary tests to make sure system backups work?
- How often do you—and should you—perform those tests?
- How should the tests be performed?
- Are you testing for specific disaster scenarios?
- Does your business have duplicate redundant systems? If not, what’s the plan to ensure resilience after a data centre outage?
- What is your communications plan for customers when there’s a disruption? What should your reaction be?
Marriott International Inc. (2014-2018)
The disruption: On September 8, 2018, an internal security tool alerted Marriott International Inc. officials about an attempt to access the U.S. Starwood guest reservation database. (Marriott acquired Starwood in 2016.) In an investigation, Marriott discovered unauthorised access to the Starwood network going back to 2014.
The breach affected up to 500 million people, though later estimates from Marriott were closer to 383 million. U.S. government investigators concluded that Chinese state hackers were most likely responsible. Hackers copied and encrypted personal information of people who had made reservations at Starwood properties.
The damage: The breach resulted in data-privacy-related lawsuits and investigations from the European Union and other governments. Total costs from the breach could reach as high as $1 billion, Bloomberg estimated.
The company response: Though Marriott learned of the breach in September, the company began notifying customers in late November. The company sent millions of emails to customers warning of the data breach. But the emails were sent from “email-marriott.com,” which was registered to a third-party firm working on the hotel chain’s behalf, raising concerns among some that the email wasn’t legitimate and that scammers could easily spoof the email to exploit the situation.
The public reaction: One analyst called the breach one of the top five worst hacks to directly impact consumers. Class-action lawsuits seeking billions in damages were filed. Lawmakers in Washington cited the breach as another reason why the U.S. needs federal privacy rules. Marriott is also facing the possibility of substantial financial penalties under the European Union’s General Data Protection Regulation (GDPR).
Key lesson learned: When one company acquires another, it should investigate the acquired company’s cybersecurity before the acquisition is completed, notes Asher de Metz, Senior Manager of Security Consulting at Sungard AS. “If breaches or risks are uncovered, it could stop the acquisition or reduce the purchase price. Either way, it gives the acquiring company a heads-up.”
Resilience questions raised for every company:
- What due diligence is required when/if our company merges its systems with other systems?
- How far back do we need to look to make sure we don’t have any unidentified vulnerabilities?
- What do we need to do to uncover possible attackers who have been lurking in our system for months or years?
- Are we at risk for state- or nation-sponsored hacker attacks?
- If so, what steps do we need to take to prevent or minimise that risk?
- How do those steps differ from steps we would normally take?
Planning to avoid downtime costs of $300,000 per hour
Businesses face many types of risks, which continue to grow in severity. At the same time, because of digital transformation, your organisation is more reliant on its information systems than ever before. Add to this the staggering cost of network downtime, which Gartner estimates at an average of $5,600 per minute or $300,000 per hour.
Therefore, your entire approach to resilience must continually adapt in order to protect against all the different types of risks today. All companies should conduct a Business Impact Analysis (BIA) to determine which systems need what kind of protection, and how much downtime and data loss you can afford, Ray says.
“Your organisation must also develop and periodically exercise a comprehensive crisis management/emergency preparedness, business continuity and disaster recovery plan,” Ray continues. “While this sounds like a daunting and expensive proposition that might take you away from other priorities, having these plans ready will save your company, your employees and your clients when a crisis occurs.”
James A. Martin has written about security and other technology topics for CIO, CSO, Computerworld, PC World, and others.