IT Resilience: Lessons from the pandemic
The response to the Covid-19 pandemic has demonstrated the importance of information technology (IT) in operational resilience. In the case of the pandemic the value of flexible IT arrangements that enable effective and sustainable remote working have been proven, with over half of the working population working fully or partially from home at points in the pandemic. As a result of this experience many companies have indicated that they will be seeking to increase their investment in IT solutions with the aim of enhancing their operational resilience . Before they do so they would do well to look more closely at this particular facet of resilience.
The relatively speaking generous time window initially afforded by the pandemic, as opposed to a no notice disruption, enabled many organisations to cobble together rudimentary work-from home solutions. It’s worth noting some 50% had no such plans to begin with. The universal impact of the virus, meaning organisations were initially perceived as victims rather than at fault, provided a reputation cushion to the inadequacies of many of these extemporised arrangements, including an increased exposure to cyber risks, lack of hardware, limited connectivity, staff unfamiliarity with procedures, diminution of service quality etc. As the pandemic progressed this lived experience, captured by research, has shone a light on a lamentable, and fundamental, failure to effectively coordinate IT factors within pre pandemic resilience planning.
When it came to operational resilience and the function of IT, numerous organisations seem to have made the error of conflating Disaster Recovery and Business Continuity. This isn’t just semantics, splitting hairs over names if you like, but a fundamental weakness in the resilience approach. Disaster Recovery planning addresses the issues of technology reliability and availability, through measures such as systems redundancy, virtualisation, geographical spread, removal of single points of failure, automatic failover, back up data etc. These ‘proactive’ measures are designed to reduce the prospect of a significant disruption to the systems and data that support the delivery of services / products by engineering out failure. However, these arrangements are separate to the ‘reactive’ response required should the proactive measures fail, and unacceptable disruption threaten to occur. This reactive capability requires technology attentive Business Continuity, but recent BCI reporting suggests this is an all too often incomplete component of resilience and has identified a number of gaps we recognise only too well:
1. Firstly, too often there is a disconnect between IT departments and the Business Continuity discipline, so that planning occurs in isolation. IT resilience planning is not fully informed by the requirements (MBCOs and RTOs) of the critical services / products supported, with IT teams estimating what their customers require or, in the worst case, mandating what they have to accept. The resulting capability is often focused on individual systems (the technology components) rather than the IT services relied upon by functions and departments, each of which may incorporate several systems. Consequently, IT resilience planning can fail to match business priorities, or reflect risk tolerances, meaning successfully continuing to deliver services / goods in the event of a disruption may be subject more to chance rather than guaranteed by joined up planning.
2. Secondly, and linked to a lack of integration, functional or departmental Business Continuity plans may be based upon unproven assumptions as to the resilience of technology and the support available from IT departments. Frequently this results in little if any provision to cover gaps between total IT service failure and continuity being provided, further placing in doubt the ability to meet service / product MBCOs and RTOs.
3. Thirdly IT departments often ignore their wider services such as fault reporting, service desks, patching and other cyber security activities, configuration management, supplier liaison etc. These vital, and often people dependent, critical processes can fall between the cracks of resilience planning.
4. Fourthly, critical IT suppliers are largely unsighted to the business’s continuity requirements. Although contracts and SLAs exist to cover issues like BAU reliability or fault response times, they often omit any mention of (a) the continuity requirements of the supplied organisation or (b) ensuring the supplier’s resilience conforms to standards of acceptability (beyond simply having a plan).
So, what can be done to avoid these pitfalls? In the first instance ensure good communications and integrated planning between the Business Continuity discipline and the IT department. This requires an appropriately structured planning architecture, a clear lexicon that establishes precisely what is meant by Business Continuity and Disaster Recovery and the meaningful exchange of knowledge, covering requirements, risks (likelihoods and impacts) and capabilities. Secondly make sure it is clear who is responsible for setting the continuity requirements and how they are to influence IT resilience planning. Thirdly, establish the process and criteria, linked to impact thresholds, that govern how risk based enhancement decisions will be taken where current ‘reactive’ provisions are judged to be insufficient (that is when IT ‘proactive’ measure fail continuity requirements cannot be delivered). Test the reactive arrangements ultimately settled upon, doing so coherently and involving IT and organisational functions and departments, to prove they are fit for purpose. Fourthly, and finally, ensure that IT departments build and exercise Business Continuity plans for their own department, addressing all the critical services they provide and not just the technology systems they manage.