Far too many organizations’ business continuity (BC) plans failed to anticipate a global pandemic, and it cost them dearly. Chris Butler, Lead Principal Consultant, Resilience and Security at Sungard Availability Services (Sungard AS) joins IT Availability Now to dissect the state of preparedness both before and after the pandemic, along with how companies are now rethinking resilience and what steps they can take to achieve their goals:
Oliver Lomer is a Senior Solutions Marketing Manager at Sungard AS, where he illustrates business challenges and how technology can solve them through engaging content. Oliver has six years’ experience in marketing, commercial and sales enablement roles in global technology organizations.
A former Lieutenant Colonel and helicopter pilot in the British Army, Chris Butler now provides executive advisory services in the field of organizational resilience, business challenges of the 21st century, and cyber resilience. As a resilience consultant at Sungard AS, Chris provides senior leadership teams with bespoke training sessions on crisis leadership and management, business continuity, disaster recovery, information security, IT service continuity, and crisis response capability building and coaching.
The full transcript of this episode is available below.
OLIVER LOMER (OL): What does resilience look like in a post-COVID world? Let’s just say it’s complicated. But it should be one of your top priorities right now. I’m your host, Oliver Lomer, and this is IT Availability Now, the show that tells stories of business resilience from the people who keep the digital world available. Today we’re taking a fresh look at resilience, specifically at business continuity plans, and all that entails. We’ll look at not only why so many BC plans failed to anticipate the pandemic, but what companies need to do now to refresh, rethink, and reinvigorate their resilience. I’m joined by Chris Butler. He’s Lead Principal Consultant, Resilience and Security at Sungard AS. Chris, welcome to the show.
CHRIS BUTLER (CB): Hi Ollie, thanks for having me.
OL: Obviously this year has completely transformed the way businesses think about resilience, so I wanted to start with a look back. Before 2020 and the coronavirus pandemic, what was the state of operational resilience at most companies?
CB: So I'm going to be guilty of a gross generalization here but I think it's fair to say that the business continuity industry has been seen as quite mature in recent years. Unless, of course, you've been amongst those who've experienced the disruption, personally, the hard way. You could say there's been a general decline in interest around business continuity in some sectors, but by no means all sectors, over a number of years. The types of drivers for that have been a gradual increase in the use of online platforms, like Office 365 and Zoom, and the perception that the cloud is safe and secure and resilient. Business continuity managers, themselves, have felt slightly marginalized as people have been looking to make savings in certain areas, with business continuity being one of those areas. It's also fair to say that in certain sectors there's been a bit of box ticking when it comes to business continuity. There may have been plans in place but they were never properly tested and certainly weren't challenged to any great depth. We've talked with companies who wanted a plan because they have to for compliance reasons or because their supply chain requires them to have one. But this wider idea of organizational or enterprise resilience has come to the forefront as a slightly more recent development. There is an example that goes against that, which is the financial services sector. They take operational resilience very seriously because they're heavily regulated. They are being forced down this line but they are following it quite willingly. Other sectors haven't yet devoted as much time to operational resilience or business continuity, but they've been required to go down this route of understanding their complex systems in greater depth, the impacts of disruption and tolerances to this business impact. But basically what we've seen now is that the pandemic has shown that business continuity plans have not really been up to scratch for many companies - they've been insufficient, and they now have to take resilience and business continuity more seriously. And those who've had some inherent resilience have succeeded.
OL: So it sounds like more organizations are taking business continuity more seriously. How has this changed perceptions of resilience since the pandemic started?
CB: I’d certainly say so. The fact that some companies have realized their plans aren't up to scratch has opened their eyes. We've certainly seen the C-suite and executives become much more engaged as companies examine their resilience. Some have been hands on in their response to the pandemic, almost marginalizing their business continuity managers at the same time, because they've decided that, due to the impacts and consequences on their business, they've had to step up their ownership of resilience and their response. To what extent this executive oversight remains long-term remains to be seen but it's certainly a good thing for now. Companies are also looking at their resilience in new ways. For example, the timing and the duration. Many organizations had plans for disruptions that were focused around days or weeks but certainly not months. Many plans didn't fully incorporate the impact of third-party vendors, suppliers or partners struggling with the same disruption over a long period of time. Cybersecurity, equally, has been a challenge, and increasingly so with an extended remote workforce.
Many companies pushed people to work from home, and had less access into their office environment. And that's another area that had to have some significant focus. Some companies are also using this to accelerate the transformation to cloud. And equally, there are some who are guilty of rushing that way as well, but it can open up a whole range of pitfalls in this particular area including cybersecurity. But that's just sort of the higher level if you like. At the tactical level, some companies are still working out how they'll operate in the future with a dispersed workforce for a longer period. Other companies are working out what their future BC strategies will look like, along with the BC manager’s role within that. And it's worth reminding everyone that in this pandemic, by and large, IT has remained available. There's been a real lack of IT disruption in the majority of cases, and most companies have managed to continue to operate quite happily with their IT infrastructure in normal operations.
OL: And as they work out those tactics and fundamentally reevaluate their future state of operational resilience, are there any particular areas that they should be focusing on?
CB: I think there's several. One thing, having just mentioned the idea of IT working fine. That interconnection between business continuity and IT disaster recovery, I think, is something that needs to be addressed for future operational resilience. We've seen that these are areas are siloed in many companies, and obviously with people outsourcing many services, this is quite a complex environment, and the interface between those two areas is not well understood by everyone, either on the business side or indeed on the IT side. And, of course, critical business activities are always supported by critical business applications and technology, and keeping those going is an important area to bring focus back onto. But learning from the financial sector and their operational resilience work, companies need to address more deeply their tolerances. This is this idea that you need to accept that something's going to happen, and you need to understand how far you're willing to tolerant that impact. At one point does that become truly intolerable for the organization? And that's something I say the financial services are leading on, and I think all companies should address that in a bit more detail. Cyber is another big area. We've been recently talking to the chief executive of a bank, and he's going to be running a cyber crisis workshop as an awareness exercise for them and their board. There are big cyber issues brewing potentially. According to cybersecurity professionals, there’s a significant wave of cyberattacks on the horizon because of the vulnerabilities created by extended homeworking. We've seen an increase in the use of personal devices, which also brings in a number of cybersecurity questions. And in the rush to the cloud to outsource services as part of the debit response, if that's being rushed, then there's likely to be misconfiguration issues, for example, security settings that aren't optimized as a consequence. So there's a few cybersecurity issues brewing there, and it's definitely time to get your act together from a incident crisis response to a cyberattack. They just asked, most recently Garmin and earlier in the year Travelex, “have you thought about what you do if you did suffer a ransomware attack? Would you pay?” It's not a simple straightforward answer. But the final area I think to focus in on the future is this idea of concentration risk, which is essentially having all your eggs in one basket. Do all your employees with the same role work in the same office that’s located in the same region or the same area? Or all your vendors, are they located in the same geographical area? You should look into ways to be a bit more distributed in those assets to be more resilient against disruption.
OL: So once organizations have reassessed their BC plan and their resilience, and they made some changes to put some of these things in place, what’s the next level? What else should they be re-evaluating?
CB: Yeah, to really go to the next level I think this idea of supply chain resilience is really important, and analyzing critically what that supply chain resilience is. You know, this idea of forging stronger partnerships rather than having a zero sum game commercially competitive approach. If you have that support network or a collaborative ecosystem, I think everyone can win from that. Where suppliers up and down the supply chain support each other like friends with a coffee shop in a crisis, for example. So reviewing third party suppliers is important. To further develop the idea of concentration risk that I mentioned earlier, you don't look any further than the automotive industry and their supply chain. Back in February, you can see Tesla stock fell 30 percentage points in one day due to the sudden inability to manufacturer and ship key components from their factories in China. You know you can mitigate this through multiple sourcing, but equally all supply chains got several layers for that example you just need to go to the construction industry. There's outsourcing all the way down supply chain, and what companies need to understand is their risk level below those direct vendor relationships, your sort of first order relationships. So again, supply chain due diligence is important. Ask to see their plans and critically review them as well. And I think the third area to go on to the next level comes back to this cloud and security aspect. If companies are indeed being driven by the pandemic to accelerate their cloud transformation journey, then this is something that could open up some real vulnerabilities for them. If they have been forced into rushing plans, they're creating potentially costly vulnerabilities. You know, cloud itself does have inherent resilience, if it's managed correctly. But, of course, cloud means different things to different people and most companies have different hybrid approaches to cloud. It is complex, there are shared responsibilities, and I think it just feels like there's a bit of an unseemly rush and there's too many assumptions made behind it, so it's definitely a third area to focus in on.
OL: As companies take these steps right now in pursuit of greater resilience, the pandemic is still top of mind. What can organizations do to make sure they don’t fall into old habits, get complacent, and find themselves back where they were pre-COVID?
CB: There is a very good question. The first thing, and for me the most important thing, is this idea of capturing and then learning lessons from the experience of the last six months. I'd stress that you really learn a lesson once you've decided what the change needs to be and then implement that change. It's no good just saying “lessons must be learned,” you actually have to make a change in order to learn it properly. So going through the robust lessons capture process to review what they've experienced to see how their organization needs to change as a consequence. But equally you got to look forward to potential other disruptions and conduct your war games, to use the military analogy, you need to wargame various scenarios - you could take the pandemic and make it worse, you could look at a ransomware attack, all the way down to reimagining your more mundane IT outages. But you got to be creative about the scenarios. You can think about if you were in the current situation with an extended working from home arrangement and then suffered a serious cyberattack, how would you respond to that? You’ve got to be creative about it, and you have to make sure that people can't say “that couldn't happen here.” There's also something around people's assumptions, I think, they need to really reexamine their initial assumptions behind business continuity plans. Particularly, for example, around things like single points of failure. Organizations before the pandemic never considered that they could have potentially had 60% of their workforce unavailable. That was too extreme. So actually, you need to look at the various options generically and what you would do if you had 10% available, 40% available, or 70% available, for example. So look at those points of failure. The office being a point of failure, came to light with one customer, because the only place they could conduct laptop builds for working from home staff was in the office. If they couldn’t get into the office, they couldn't do that. So what are single points of failure? That leads into the idea of being able to make sure you manage your IT state properly. Not everyone has been able to patch or update systems easily when they've had a remote workforce, some who are using their own devices. If your staff, who are working from home, would have to remote back into their office, then you lose the office, there's another example of an apparent vulnerability. So, not everyone is necessarily updating their assumptions based on those sorts of experiences. I think that's fundamental to be redressed when they start to consider what their future resilience posture is going to be once they've worked out what their future businesses as usual posture is going to be.
OL: Well Chris, thanks very much. And obviously we’re not out of the pandemic just yet, but you’ve given us a lot to think about in terms of improving resilience. It sounds like the next steps are key - you know what is your office going to look like, how much of your workforce remains remote, etc. - but at the same time it's addressing elements like cybersecurity, concentration risk in third-party supply chains to eliminate single points of failure, etc. And then moving forward is about testing and challenging those plans against a variety of disruptions or combination of disruptions.
CB: Yeah that's right. I think the idea of ongoing resilience has to involve continuous improvement, and there's no doubt that this is at the forefront of many companies’ minds right now.
OL: Chris, thanks very much for joining us. I appreciate your time.
CB: Thanks Ollie. It’s been great talking with you.
OL: You can find the show notes for this episode at SungardAS.com/ITAvailabilityNow. Please subscribe to the show on your podcast platform of choice to get new episodes as soon as they’re available. IT Availability Now is a production of Sungard Availability Services. I’m your host, Oliver Lomer, and until next time, stay available.