Reflections on Resilience
Resilience is such a great word, one that has been in the English language since the seventeenth century. In general, it is associated with toughness, to recover quickly from difficulties. In engineering terms, it is much more specific meaning the ability of an object to spring back into shape after being deformed. Of course, there is a point where that object might go beyond its elastic limit and ultimately fails.
This is what the regulators want to know about banks and other financial services. How resilient are they? How tough? What is that elastic limit?
In this piece, Andrew Frith shares his reflections on what resilience means in practice.
In the UK the FCA and the PRA have both introduced regulations for banks and other financial services regarding their operational resilience and material outsourcing due to high profile incidents and the increasing threat of cyber attacks. This is a key focus area for regulators, with the European Central Bank and the Federal Reserve both looking at similar approaches.
The first phase of the regulation requires financial services firms to undertake a number of activities:
- Define the firm’s Important Business Services (IBS)
- Define Impact Tolerances for those IBSs, beyond which tolerances customers are caused intolerable harm
- Test their IBSs with severe but plausible scenarios
- Submit a self-assessment to the regulators
As we near the end of this first phase I want to reflect on some of the challenges firms are seeing in this area.
Understanding Important Business Services
Some firms already have an understanding of their critical services from previous exercises or internal resilience work. Where some have challenges is understanding the critical path through their internal systems and processes to determine what is material to providing the service. This can be especially challenging in large, legacy estates while managing major in-flight change programmes. Having a detailed understanding of how data flows around the organisation will help determine the critical path that enables the IBSs.
This is important in a number of areas:
- Understand the utilisation patterns of IBSs to inform the definition of Impact Tolerances. Impact Tolerances must have a time element but can consider the volumes and values through services as well
- Informing scenario testing activities to ensure the right questions are being asked during paper-based scenarios
- Assist in prioritising remediation activities by targeting the systems with the largest impacts
- Feed into incident management and recovery planning allowing teams to focus recovery efforts on the critical systems, and their dependencies
Challenge Your View of Resilience
Many firms have considered resilience with long-standing approaches, using high availability (HA) to recover from server or application failures. The regulators are pushing firms to consider severe but plausible, up to and including widespread, catastrophic failures across their estate. The increasing threat of ransomware is bringing this into focus as well.
Firms must consider the end to end recovery of their systems, and the timelines to perform the recovery in to avoid breaching Impact Tolerances. Scenarios some firms have not previously considered as seriously, such as restoring data to recover from corruption or compromise, introduce new challenges when taking timelines into account.
Unfortunately for many firms the gap between recovery using high availability, and recovering multiple inter-related systems is large, and likely to be beyond the most generous Impact Tolerances.
Another area that firms are grappling with is understanding what the appropriate level of testing is. The regulations require all firms to undertake Scenario Testing of the IBSs, but there are different interpretations of this across organisations.
Some firms are leveraging their existing HA testing of components, alongside paper-based testing of more severe scenarios. Others are doing regular testing of system rebuilds and data recovery to build further confidence. Regular, and realistic testing isn’t just about the systems, it’s about training and rehearsing people and recovery processes.
Firms with complex, intertwined environments may shy away from live testing, but they need to ask themselves the following:
If it’s too risky to test, how do you have confidence it will work when you need it?
Regardless of the approach taken by firms, they need to have confidence that their testing is robust enough when they stand in front of the FCA and the Bank of England.
At the end of this month, all firms in the scope of the regulations must submit their self-assessments to the regulators. While it will take time for the regulators to review and comment on each individual submission, we would expect early themes to be apparent. Among the various positions that will be put forward, it will be interesting to understand regulators’ views and positions across the variety of submissions. While firms are making progress with their existing approaches, it is likely that course corrections will be required to satisfy additional regulator requirements.
This is the first stage of the Operational Resilience journey. From the end of March, firms have three years to remediate identified vulnerabilities, but they must think beyond this time frame. This is not a one-off exercise for the regulators, it is the start of ongoing resilience activities. There needs to be a cultural shift to embed Operational Resilience throughout organisations as it becomes part of business as usual.