By James A. Martin
It’s all too easy to spot an organization that isn’t fully resilient. Every week, it seems, a company makes headlines for struggling to be resilient after a power outage, cyberattack, or other unplanned event.
A nationwide T-Mobile outage left customers unable to make calls and texts for nearly a day. Three months later, computer systems for Universal Health Services began failing after one of the largest medical cyberattacks in U.S. history.
But what, exactly, does enterprise resilience look like? What are the core ingredients that make an organization better able to avoid, withstand, and/or bounce back from unexpected and potentially damaging incidents?
According to a survey conducted on behalf of Sungard AS, 33% of respondents (the largest percentage) believe a resilient enterprise identifies emerging threats and understands their impact; 31% say preparedness is essential; 30% feel clear direction from leadership is paramount; and 29% believe strong and supportive communication among key stakeholders is a must.
Drilling down deeper, true enterprise resilience — especially in our technology-dependent business environments — is the result of baking resilience into your infrastructure architecture, application architecture, backup and recovery architecture, security posture, governance and change management.
These are the five pillars of resilience, and they’re all intertwined, with application tiering underpinning how they are best leveraged, says Michelle LeVan, Vice President of Global Marketing and Field Engagement at Sungard AS.
1. Infrastructure architecture
Infrastructure architecture consists of all the servers, storage, network and other hardware upon which your applications run, whether that hardware exists in your private data center or is managed by a cloud infrastructure service such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform.
“From a resiliency perspective, your infrastructure architecture is the foundation of everything you build,” LeVan says.
Hardware and network failures or downtime resulting from power outages are signs that your infrastructure architecture isn’t fully resilient. “You might have local redundancy at your data center or in a specific region in a hyperscale cloud,” LeVan says. “But have you also distributed that redundancy geographically, so that your architecture isn’t completely tied to your data center or your immediate region?”
Redundancy in an infrastructure architecture can be expensive to build. And enterprises don’t always have enough team members with the expertise needed to build, manage and maintain redundant hardware. Consequently, more companies are turning to the cloud for infrastructure redundancy to overcome those challenges, LeVan adds.
2. Application architecture
Applications are the lifeblood of most enterprises today. When they fail, organizations can lose millions of dollars, suffer brand reputation damage and more.
To avoid this, your applications must be tiered. Tier 1 applications, for example, are critical to revenue generation and should be prioritized in terms of redundancy and resilience over less-essential Tier 2 and Tier 3 applications.
Also, your applications must be architected to fully take advantage of your infrastructure’s redundancy and scalability. Here again, cloud infrastructures make more sense over traditional mainframe-based private data centers.
Cloud infrastructures can offer redundancy and scalability that’s far more affordable and scalable than what private data centers can offer. “If your applications are portable and can be rearchitected to take advantage of hybrid cloud environments across both private and hyperscale clouds, you’ll have a much more cost-effective solution in the long-term,” LeVan says. “And you’ll be in a better position to take advantage of geographically diverse infrastructure redundancy to build your application redundancy.”
By default, some applications can’t immediately switch over to a secondary server, CPU or storage hardware when the primary hardware it runs upon becomes unavailable. These applications must be rearchitected to take advantage of local as well as geographically dispersed infrastructure redundancy.
If legacy applications can’t be rearchitected, you may need to build brand new ones to take advantage of redundant hardware. “And that’s a big deal,” LeVan explains. Rearchitecting or building new applications can be costly, time-consuming, and tax the skills of your in-house developers — which is another reason why it’s important to tier (or prioritize) your applications as well as consider bringing in external expertise.
3. Backup and recovery architecture
In 2019, hackers attacked the U.S. servers of email provider VFEmail.net, resulting in the destruction of all U.S. customer data. The incident was a painful reminder that you need a solid backup and recovery architecture.
Everyone needs a reliable and isolated backup to be resilient. If for no other reason, your backup system is where you archive data that may be needed for auditing and compliance purposes. Your backups help you bounce back from cyberattacks, ransomware attacks, or other events in which your data has been corrupted or becomes inaccessible and you must roll back to a predetermined recovery point to keep operations going. And ideally, your backups should be isolated off the network so that the data contained within them can’t be erased or corrupted by either internal or external malicious actors.
Your recovery point determines how much data loss you’ll have and will determine how quickly you get up and running again. Determining what those acceptable recovery points and recovery times are will help you choose the backup and recovery solution best suited to your enterprise needs for application and data resiliency.
4. Security posture
Your security posture’s foundation should include basics such as firewalls and anti-virus software. Beyond that, your posture must take into account a multitude of both internal and external attack vectors and incorporate robust methods for bolstering security against those vectors. Those methods include intrusion detection systems (IDS) and intrusion prevention systems (IPS) as well as identity and access management (IAM) systems to control – at a granular level – who can access your data and what they can do with it. Data breaches in which outsiders gain access to sensitive information, such as credit card numbers, often result when organizations don’t have a robust security posture.
The ranks of criminal hackers seems to be constantly growing, while hiring cybersecurity talent is an ongoing challenge for many companies. Consequently, organizations may need to tap outside experts to help them build and fortify their security posture.
5. Governance and change management
How are you rolling out changes across your production and recovery environments? Do you understand the interdependencies of all your applications? If you make a change to one application, what’s the impact to other applications? How are you testing changes so that they won’t cause problems when they’re rolled out? The answers to these and other questions help you build resiliency into your governance and change management process.
Rolling out changes can trigger unexpected consequences. For example, a Facebook code change in June prevented users from accessing apps like Spotify, Waze and Pinterest. This wasn’t the first time, as many iOS apps crashed a few months prior because of problems with Facebook’s SDK. These incidents illustrate resiliency challenges that can arise from governance and change management processes.
In traditional governance and change management, changes are tested in a test/dev environment and released into production on a monthly schedule. For many enterprises, a better approach is Continuous Integration/Continuous Delivery (CI/CD), in which changes are automated and then immediately validated, making the process of rolling out changes more efficient — and resilient.
Enterprise resilience: complicated but essential
Building resilience into your organization via the five pillars can be complicated, time-consuming and potentially costly. It’s also essential if you want to keep operations humming along under adverse and unexpected conditions.
If resilience in these five areas seems beyond your organization’s grasp, reach out to experts who can help you put together a plan and take the first steps.
James A. Martin has written about security and other technology topics for CIO, CSO, Computerworld, PC World, and others.