As organizations re-evaluate the state of their resilience, many are quickly discovering a serious blind spot when it comes to their third-party partners. Tom Holloway, Principal Consultant, Business Resilience at Sungard Availability Services (Sungard AS) joins IT Availability Now to discuss the importance of identifying third-party risk and how failing to do so can put your business in serious jeopardy.
- Steps to limit third-party risk
- Specific questions to ask about third-party suppliers’ business continuity and disaster recovery capabilities
- The biggest missteps companies make when evaluating third-parties
Oliver Lomer is a Senior Solutions Marketing Manager at Sungard AS, where he illustrates business challenges and how technology can solve them through engaging content. Oliver has six years’ experience in marketing, commercial and sales enablement roles in global technology organizations.
Tom Holloway is a Principal Business Resilience Consultant at Sungard AS, specializing in helping board-level executives improve and develop their crisis management capabilities through leadership and communication. Tom works with companies to create strategies and frameworks for organizational and operational resilience, as well as full lifecycle resilience and business continuity implementation. He consults across a broad range of sectors, including media and broadcasting, market research, financial services and insurance.
The full transcript of this episode is available below.
Oliver Lomer (OL): Every company is taking a fresh look at the state of resilience in the wake of the pandemic, and one of the weakest links for many businesses is their third-party partners. I'm your host Oliver Lomer, and this is IT Availability Now, the show that tells stories of business resilience from the people who keep the digital world available.
Third-party providers offer businesses everything from products to services to support. Some perform essential functions that keep your business running. But when your company relies so heavily on another business, you need to know how that business will handle a disruption or disaster. How can you make sure your partners are just as resilient as you are? To find out, we're talking to Tom Holloway, Principal Consultant of Business Resilience at Sungard AS. Tom, welcome.
Tom Holloway (TH): Hey Ollie, good to be here.
OL: So Tom, why should organizations be taking a second look at the third-party products and services?
TH: In recent decades supply chains have been getting increasingly efficient and, by implication, more complex. With this complexity, the ability to quantify and mitigate supply chain risks throughout the procurement, manufacturing, transportation, and sales lifecycle is paramount. So organizations need to identify the critical risks to minimize disruption and help protect their operational, financial, and, importantly, reputational exposures. And whilst this is more obvious in manufacturing, and the automotive industry provides some great examples of this, most modern organizations have become heavily dependent on the high availability of information for transactions and other interactions, and rely on others to provide critical business activities. There is simply no longer such a thing as an isolated incident, and even relatively minor disruptions now have the potential to upset the money making capability of the wider and often delicately balanced commercial ecosystem in which organizations operate. The COVID-19 pandemic brought to light the impact of disruption to the physical supply chain. Although you should also consider the impacts of network outages, delays in payment on your cash flow and the corruption of data critical to your business, you should also consider concentration risk. The fact that more than 50% of Fortune 500 businesses have a presence in Wuhan, China is an often quoted statistic, demonstrating the dangers of geographical concentration risk. And this could extend to your fourth-parties, your suppliers’ suppliers, who may place great reliance on a particular product or service, such as a public cloud provider. Now in some sectors, such as financial services here in the UK, regulation is driving increased scrutiny of third-party suppliers.
OL: So it sounds like there are lots of risks. Are there any recent topical examples where organizations didn't take into account third-party risk and things went quite wrong?
TH: Yeah. So, again, back to my earlier points about complexity and the fourth-parties, that's your suppliers’ suppliers. Supply chains introduce increasingly interconnected attack surfaces outside your direct control. And whilst this vulnerability has existed for many years - and think back to the cyberattack on the American outlet store, Target, in 2013 through an air conditioning supplier - this has very much been a feature of the past nine months during the pandemic. For example, T-Mobile suffered a data breach over the last year in the U.S., with unauthorized access to emails via its email vendor. At the beginning of this year, Travelex in the UK was hacked through a known VPN server vulnerability. They suffered weeks of downtime, loss of customer data, and they received a ransomware demand. The point here is that Travelex was disrupted, but it's disruption hit a large number of banks who outsourced their currency transactions to Travelex. Now, Finablr, Travelex’s parent company, has since had trading of its shares suspended on the London Stock Exchange, so you've always got to consider the bigger impact out there. Looking at more operational issues, in the U.S., General Mills has had to add manufacturers and suppliers in order to continue to cope with the demand of people eating more at home.
This has obviously increased their costs. And most recently, Pennsylvania's voter services website failed after a third-party data center experienced an equipment failure. So that just gives you a few examples of issues in the last couple of months.
OL: Thanks Tom, so these are really examples with really significant business consequences. So, how would an organization get started with starting to evaluate their third-party suppliers and their associated risks?
TH: So, traditionally third-party, or vendor if you prefer, due diligence and risk management focused on things like compliance, privacy, identity, fourth-parties, and financial matters, such as whether the business was a growing concern. Now, COVID-19 aside, we've seen an increased interest in recent years, around an organization's business continuity (BC) and their resilience credentials. Now, there is no easy way of doing this. Vendor risk management in itself is a time-consuming activity, and with a number of suppliers quickly running into the hundreds for many organizations, this presents a significant challenge for most businesses. Consider also that some of the controls-based assessments, often run into more than 1,000 questions, across a range of disciplines. To reduce the size of the challenge, consider tiering suppliers based on their criticality to your business and their access to your confidential information. Self-assessment responses are often creatively written, so ensure that you ask very specific questions regarding resilience, BC and disaster recovery controls. We would recommend that where key partners have been identified, you actively review their arrangements. Do consider inviting your partners to participate in activities such as exercises and tests where appropriate and where relevant. Also, do consider that survey fatigue and decreasing response rates for questionnaires amongst recipients is very likely, particularly where we're using these lengthy control-based questionnaires.
OL: So you mentioned performing tests and exercises, but what are some other methods, tools or strategies that one can take to evaluate third-parties?
TH: Well as a business resilience consultant, you might expect me to say a couple of words on the subject. So, obviously it's always good to start a definition. Now, the Bank of England defines operational resilience as the ability of firms to prevent, adapt, respond to, recover, and learn from operational disruptions. And any definition will generally include those components, but there are two distinct aspects here. There's the preventative bit - the things you can do before it's gone wrong - and that really is about improving your capacity to resist a disruption in the first place, and then delay and reduce the impact of that disruption when it happens, all achieved by putting in place the necessary measures ahead of time. And the definition also has a sort of reactive element to it, and that's the ability to respond to and recover from a disruption. This is much more difficult to measure, as it really relates to the leadership's ability to manage their way out of trouble, and also the cultural norms inside an organization. Simply put, resilience is positive outcomes despite negative stress. So, when faced with known risks, you should be investing in those preventative measures before the time. And then when faced with uncertainty, you need to invest in reactive measures. So, looking at both of those on the preventative side, it's the surveys, it's the controls based questionnaires, adherence to recognized standards such as ISO 22 301, which is the international code for business continuity, and 20 7031 which is the suitability of it to support business continuity. Use market data for many of the organizations that provide that. Perhaps some benchmarking information provided by shared assessments and other businesses. Use risk profiling tools. Go on site visits and validate with your own eyes what they are telling you. And do assess high-risk suppliers more frequently. On the reactive side, for me this is all about really building a partnership, a cooperative partnership, and that's done through joint exercising, sharing of information, and frankly, the leadership of both parties getting engaged and understanding the vulnerabilities and the risks on both sides.
OL: Could we just briefly zoom in on the types of questions that should be asked when organizations are reviewing their suppliers’ BC and DR arrangements?
TH: Yeah, so for both business continuity and disaster recovery, there are four categories of questions to ask. You're looking at the scope of the documentation - i.e., which departments are included, which products are included, which systems are included. You're looking at the detail of plans -i.e., are other departments that produce what you consume covered by those plans, or is there just one plan for the business? Have they validated the plan - so have they conducted exercises, put them to the test, conducted remedial training where appropriate? And are the people ready? Are they trained and have they got the necessary tools to support them? With regards to disaster recovery, there are a couple of additional categories. You want to look at the recovery objectives, and here we're talking about recovery time objectives, RTOs, and recovery point objectives, RPOs. But also, would they be able to recover data after a successful cyberattack? We’ve very much seen in recent months and years that ransomware, the corruption of data, is a real issue. And old school disaster recovery was very much focused on what to do should your data center catch fire or your hard drives do similarly. So really, consideration of recovering data is important for DR.
OL: And let's take a closer look at BC first. What are those specific things that you should know about your vendor’s bc plan?
TH: Okay, so taking each of those areas I just touched on just now. If you're looking at scope - are all the departments that support the products and services you receive from that organization covered by business continuity plans? And these are the particular questions you need to be asking the organization. With regard to those plans, do the plans include specific response strategies for the unavailability of specific resources? And by those I mean, the workplace, the equipment, the workforce, IT services data, and third-parties and services. Also, have they got detailed actions to implement and sustain each strategy, and will those strategies be effective when covering the unavailability of those resources beyond 30 days? Obviously with the pandemic, this has gone on for months, and many people only considered that it would go for a maximum of a few weeks so their plans only permitted the two week period. Do your BC plans include explicit response strategies for the possible permanent loss of data from a successful cyberattack? In terms of the validation, have you defined specific exercises to validate the effectiveness of each of the resource specific response plans, such as working from home, which I think is all pretty well ticked in the current era, but have you tried to exist without the main building having power, do your backup IT systems kick in as they should, or are you very reliant on there being power to the PCs in your main office? And, really important, has at least one exercise been conducted annually within each department? Too often, that's not the case. They might think about it but it's not being done. And finally, on the people readiness side of things, has everyone been trained on the plan applicable to their department, and has everyone been trained on the communication and the collaboration tools to be employed? There are numerous tools available for communication and collaboration. And if you consider that your primary business systems might be out of commission because of a cyberattack, you need to have outer band communications tools to work with.
OL: Thanks Tom, and can we just do the same closer look at DR, and in particular, which questions you should ask your vendor about their DR plans and capabilities?
TH: They're similar. So for scope, are the systems that support the products and services you receive covered under your program? This is where a new area of recovery objectives begins to divulge. So, have the recovery time objectives, the RTOs, and the recovery point objectives, the RPOs, been defined for all systems? Was the backup architecture defined to meet all the RPOs, and was the DR environment designed to meet all RTOs? And do these objectives transcend all the environments operated in? By this I mean, on-prem and all the cloud and SaaS applications that your vendor is using. In terms of the plans, do you have a DRF management and control plan? Do they have scripts established for the DR infrastructure, and scripts to recover each application system? In terms of validation - again, it's about the testing. Have they conducted recent tests that validate that all RTOs and RPOs can be met? Does the test regime include all compute environments, such as SaaS, or cloud hybrid and on-prem? Do you conduct DR testing for each environment, including SaaS, platform-as-a-service, and infrastructure-as-a-service, as well as your own on-prem computer? And are you looking across all production environments? In terms of the people, this is a really important one. Can the recovery team work effectively remotely? Too often, businesses rely on physical access to computer rooms to be able to failover. It needs to be able to work remotely because you might not be able to get into that primary site for a variety of reasons. And secondly, on the people side of things, have all primary and alternate recovery team members participated in at least one DR test? The point is here, too often businesses have one person who's worked there for decades and knows it all, but if they're not available, then you're going to face a real problem. And finally with DR, is data recovery following a successful cyberattack. Do you have a plan focused exclusively on recovering data compromised in order to restore those crown jewels and get back in business quickly? Too often businesses who've had data corrupted as a result of a ransomware attack, have had to go and pay them the ransom because they have no way of recovering that data.
OL: Thanks Tom. A very comprehensive set of questions there, but when you actually get answers to those questions, how would you know that the vendor is telling you the truth?
TH: Well that's the tricky one, Ollie, because fundamentally you've got to trust and verify. You've got to trust that the answers you've been given are full and valid, but then you got to go in and verify, speak to the SMEs on-site, review the documentation yourself. Now, every question that you asked should be in the context of the products and services that you're using for that vendor, rather than vague comments about yes we do have a BC plan - well, does it cover it or not. So you need to get specific. And I'd recommend, rather than having simple yes/no answers, which is all too binary, go for detailed narrative responses where it's appropriate.
OL: Is there anything that we've missed that companies typically overlook when they're evaluating third-party risk?
TH: Well I mean, the problem here is there's an awful lot to do, and the devil is in the detail. So, what period of time does the plan cover? For example, often when we're writing plans, it might be what are your actions in the first 24 hours, rising up to the first 30 days? Do they extend beyond that? Of course pandemic plans, as we've seen, really need to consider very long periods of time. What are their response strategies for the permanent loss of data? What sort of exercise are they conducting - work from home exercises, loss of site exercises loss of key supplier and impacts on their business processes - and have people been trained? But most importantly, I think that we're going to be moving to a more of a partnership relationship between vendor and the organization concerned - from a simple transactional one, which probably has existed up to this point, to developing enduring business relationships that go well beyond simple SLAs.
OL: Is there anything else that companies should know about third-party risk? Any closing remarks from you, Tom?
TH: It is life and death business, as I touched on there with the case study with Finablr. With the survival of your brand, possibly depending on factors well outside your control, you really need to take the matter seriously and be prepared to act quickly when it goes wrong. In manufacturing, in the last few months, we've seen a shift from that just in time production philosophy to adjusting case philosophy, with a sharp increase in the uptake of storage space as businesses stockpile increased levels of inventory. During the COVID-19 lockdown, a study commissioned by Sungard AS in North America found that consumers' use of digital services more than doubled on average. In hand with this leap towards online services, the survey also found rising levels of expectation by consumers around availability and much less tolerance for outages, with 55% of respondents changing service providers because of problems. So you're looking, I think, for four things here. You're looking for leadership engagement, oversight from boards and senior leadership - it's a really strong predictor of practice maturity, as with most things related to business continuity and resilience, all-level engagement tends to lead to appropriate governance, resource allocation, and internal coordination between functional areas. This leadership engagement is also critical in fostering trusted partnerships with key suppliers on one hand, and preparing the organization on the other, both in terms of robust yet flexible IT infrastructure and developing the teams to manage the incidents and emergencies and crises that will arise. You need to be continually monitoring suppliers business continuity and disaster recovery arrangements. As I said, this is more of a partnership than a once a year activity. No longer is it simple box checking - you need to ask more in depth, resilience, BC and DR related questions. And expect tough questions to be asked from your customers. Your key partners will want to see that you have an established culture and commitment to resilience. So strengthening and integrity of your business continuity and disaster recovery plans ultimately increases the confidence your customers have in your ability to deliver the products and services that you've committed to provide them in times of trouble. Indeed, I think it'll be viewed as a competitive advantage where done properly.
OL: Every company should be taking a close look, not only at their own business resilience, but at their third-party partners as well. You want to ask specific questions in the context of the products and services you receive from that partner. You should ask about the scope, plans, validation, and people readiness of their BC plans. For DR plans, ask about those same areas, plus recovery objectives and data recovery after a successful cyberattack. With the increased focus on resilience, all organizations should expect a greater degree of scrutiny from their customers and prospects about their business continuity and their disaster recovery programs. And as you ask these questions of your vendors, make sure you have good answers when your own customers start to ask you the same questions. Tom, thanks very much for your time, it's always a pleasure talking to you.
TH: Ollie, a huge pleasure speaking with you and thanks for having me.
OL: Tom Holloway is principal consultant of business resilience Sungard AS. You can find the show notes for this episode at SungardAS.com/ITAvailabilityNow. Please subscribe to the show on your podcast platform of choice to get new episodes as soon as they're available. IT Availability Now is a production of Sungard Availability Services. I'm your host Oliver Lomer, and until next time, stay available.