By James A. Martin
It’s all too easy to spot an organization that isn’t fully resilient. Every week, it seems, a company makes headlines for struggling to be resilient after a power outage, cyberattack, or other unplanned event. Target Corp.’s two consecutive days of cash register outages in June is just one example. Others from recent years involved Atlanta Hartsfield International Airport, BB&T Bank, and Marriott International, Inc.
But what, exactly, does enterprise resilience look like? What are the core ingredients that make an organization better able to avoid, withstand, and/or bounce back from unexpected and potentially damaging incidents?
At a high level, a recent survey* conducted for Sungard AS illustrates the key components of a resilient enterprise. In the survey, 33% of respondents (the largest percentage) say a resilient enterprise identifies emerging threats and understands their impact; 31% believe preparedness is essential; 30% say clear direction from leadership is key; and 29% believe strong and supportive communications among key stakeholders is also essential.
Drilling down deeper, true enterprise resilience — especially in our technology-dependent business environments — is the result of baking resilience into your infrastructure architecture, application architecture, backup and recovery architecture, security posture, governance and change management.
These are the five pillars of resilience, and they’re all intertwined with application tiering underpinning how they are best leveraged, says Michelle LeVan, Vice President of Global AWS Alliances at Sungard AS.
1. Infrastructure architecture
Infrastructure architecture consists of all the servers, storage, network and other hardware upon which your applications run, whether that hardware exists in your private data center or is managed by a cloud infrastructure service such as Amazon AWS, Microsoft Azure, or Google Cloud Platform.
“From a resiliency perspective, your infrastructure architecture is the foundation of everything you build,” LeVan says.
Hardware and network failures or downtime resulting from power outages are signs that your infrastructure architecture isn’t fully resilient. “You might have local redundancy at your data center or in a specific region in a hyperscale cloud,” LeVan says. “But have you also distributed that redundancy geographically, so that your architecture isn’t completely tied to your data center or your immediate region?”
Redundancy in an infrastructure architecture can be expensive to build. And enterprises don’t always have enough team members with the expertise needed to build, manage and maintain redundant hardware. Consequently, more companies are turning to the cloud for infrastructure redundancy to overcome those challenges, LeVan adds.
2. Application architecture
Applications are the lifeblood of most enterprises today. When they fail — as when Target’s cash registers went down — your organization can lose millions of dollars, suffer brand reputation damage, and more.
To avoid this, your applications must be tiered. Tier 1 applications, for example, are critical to revenue generation and should be prioritized in terms of redundancy and resilience over less-essential Tier 2 and Tier 3 applications.
Also, your applications must be architected to fully take advantage of your infrastructure’s redundancy and scalability. Here again, cloud infrastructures make more sense over traditional mainframe-based private data centers.
Cloud infrastructures can offer redundancy and scalability that’s far more affordable and scalable than what private data centers can offer. “If your applications are portable and can be rearchitected to take advantage of hyperscale cloud environments, you’ll have a much more cost-effective solution in the long-term,” LeVan says. “And you’ll be in a better position to take advantage of geographically diverse infrastructure redundancy to build your application redundancy.”
By default, some applications can’t immediately switch over to a secondary server, CPU or storage hardware when the primary hardware it runs upon becomes unavailable. These applications must be rearchitected to take advantage of local as well as geographically dispersed infrastructure redundancy.
If legacy applications can’t be rearchitected, you may need to build brand new ones to take advantage of redundant hardware. “And that’s a big deal,” LeVan explains. Rearchitecting or building new applications can be costly, time-consuming, and tax the skills of your in-house developers — which is another reason why it’s important to tier (or prioritize) your applications as well as consider bringing in external expertise.
3. Backup and recovery architecture
In a November 2014 cyberattack, hackers made public unreleased Sony Pictures films as well as confidential information related to executive salaries — while at the same time preventing Sony from accessing its own data. The incident was a painful reminder of the need for a solid backup and recovery architecture.
Everyone needs a reliable and isolated backup to be resilient. If for no other reason, your backup system is where you archive data that may be needed for auditing and compliance purposes. Your backups help you bounce back from cyberattacks, ransomware attacks, or other events in which your data has been corrupted or becomes inaccessible and you must roll back to a predetermined recovery point to keep operations going. And ideally, your backups should be isolated off the network so that the data contained within them can’t be erased or corrupted by either internal or external malicious actors.
Your recovery point determines how much data loss you’ll have and, will determine how quickly you get up and running again. Determining what those acceptable recovery points and recovery times are will help you choose the backup and recovery solution best suited to your enterprise needs for application and data resiliency.
4. Security posture
Your security posture’s foundation should include basics such as firewalls and anti-virus software. Beyond that, your posture must take into account a multitude of both internal and external attack vectors and incorporate robust methods for bolstering security against those vectors. Those methods include Intrusion Detection Systems (IDS) and Intrusion Prevention Systems (IPS) as well as Identity and Access Management (IAM) systems to control at a granular level who can access your data and what they can do with it.
Data breaches in which outsiders gain access to sensitive information, such as credit card numbers, often result when organizations don’t have a robust security posture. So far this year, hackers have stolen credit card information from OXO, the Atlanta Hawks, AMC Networks, Freedom Mobile and others.
The ranks of criminal hackers seems to be constantly growing while hiring cybersecurity talent is an ongoing challenge for many companies. Consequently, organizations may need to tap outside experts to help them build and fortify their security posture.
5. Governance and change management
How are you rolling out changes across your production and recovery environments? Do you understand the interdependencies of all your applications? If you make a change to one application, what’s the impact to other applications? How are you testing changes so that they won’t cause problems when they’re rolled out? The answers to these and other questions help you build resiliency into your governance and change management process.
Rolling out changes can trigger unexpected consequences. For example, in May, Salesforce experienced one of its biggest outages. The incident resulted from a change made to Salesforce’s production environment that “broke access permission settings across organizations and gave employees access to all of their company’s files,” ZDNet reported. The incident illustrates resiliency challenges that can arise from governance and change management processes.
In traditional governance and change management, changes are tested in a test/dev environment and released into production on a monthly schedule. For many enterprises, a better approach is Continuous Integration/Continuous Delivery (CI/CD), in which changes are automated and then immediately validated, making the process of rolling out changes more efficient — and resilient.
Building resilience into your organization via the five pillars can be complicated, time-consuming, and potentially costly. It’s also essential if you want to keep operations humming along under adverse and unexpected conditions. If resilience in these five areas seems beyond your organization’s grasp, reach out to experts who can help you put together a plan and take the first steps.
*The research of 500 C-suite respondents in companies with 500+ employees in the U.S. was conducted by Censuswide on behalf of Sungard Availability Services® in March 2019.
James A. Martin has written about security and other technology topics for CIO, CSO, Computerworld, PC World, and others.