Risk and the Technology Architect
What was it that first drew you into a career in technology? For me, perhaps like you, I was excited by the technology itself and the possibilities it created. The prospect of working with leading-edge software and hardware, and the application of them has always been a thrill.
As an architect, you are expected to have a greater overview of the elements and interactions of a system or sub-system than perhaps anyone else in the technology domain. Is it possible to have this higher-level view and have enough knowledge of the detail to understand the potential vulnerabilities or potential consequences of any given decision?
Over recent years, the profound, literally life and death, impact of technology has been prominent in the headlines from self-driving cars to the issues with the Boeing 737 Max.
Perhaps less life and death, and yet still very impactful have been various episodes of bank outage issues in the UK, to wide-scale theft of personal data. The usual is to refer to, “computer issues” as if the responsibility lies with an unknown and unexpected Gremlin that has somehow found its way into the company.
Whatever the impact, technology is at the heart of our lives and this will only increase. This in turn will increase the consequential risks, which in turn will increase the responsibility of the technology architect and the need to fully understand and manage the risks.
Within the world of technology, the risks that are usually forefront of our minds are typically related to budgets, timelines, downtime, service and capability. In a technology enabled world it may be useful to revisit the risk categories associated with software and systems that you may have oversight for: –
The risks associated with Quality of Service are perhaps the most familiar. The system does not operate as expected in obvious ways. The consequences of the failures have a low level of impact, perhaps late or incorrect delivery of information or products, slow response times, inconsistent behaviours, perhaps increased cost or poorer returns.
The risks associated with Disruption of Daily Life is almost certainly a fast-growing and complex area. Examples range from the unforeseen consequences of operating systems, personal data leaks, trading in personal data, and systemic biases in AI systems and higher-level decisioning. There are a number of high-profile stories recently where cognitive systems were shown to be biased in dimensions of race and gender, having a real-world impact on the financial health of individuals.
One, perhaps infamous, incident was the theft of personal data from the Ashley Madison website. The incident, as well as taking advantage of a security vulnerability, also highlighted the lack of security and protection of sensitive personal data.
At the highest level of consequential risks, there is Threat to Life. It is more than likely you have read about the consequences of a failure can have. Fly-by-wire systems have little or no physical connection between the control device and the outcome. There are cars with fly-by-wire throttle pedals. The throttle pedal drives a sensor, and it is the interpretation of the sensor input that creates the engine response. The failure of the electronic control unit could possibly create a scenario where the car failed to respond in a way that is less likely in a car with a mechanical connection.
As more and more vehicles are being tested with driver assists such as self-parking or lane-change modes, or, total self-drive modes, there is both increased risk due to failure, and complex ethical questions raised in the design of the self-driving algorithms.
Ultimately, all the questions associated with risk raise increasing ethical challenges and deep ethical questions for the overall system architects.
Another dimension in considering risk is that ‘systems’ as a whole, are now significantly more complex and interlinked than they have been in the past.
Mainstream systems have generally architected to respond to predictable risks based on the probability of threat, a suitable approach to solve an orderly problem that has predictable outcomes.
As a response to systems operating in more complex environments, system architectures need to be architected as complex-adaptive systems (CAS). A CAS system has properties that are designed to respond to unpredictable as well as predictable events.
Whilst this is an old problem, new approaches to systems design are evolving to address these factors in response to the rise of socio-technical systems (systems that have a direct impact on people and the environment). A variety of techniques that have been successfully used in heavy manufacturing, battlefield command and aerospace are now being considered used as a basis to better systems architecture.
The Role of the Architect
So, what is the role and responsibility of the technology architect in this world of real-life impacts on the one hand and increased complexity on the other?
These are a range of specific dimensions an architect should be ensuring is covered as part of their role as a minimum: –
- It is imperative that the ‘system’ is designed holistically, and all aspects of the use-case are fully defined.
- What are the stressors on the potential system and when will they come in to play?
- Architect solutions rather than rely on testing – for example, use a Zero Trust approach to architecting for security.
- The architect has a key role to play in working with systems assurance and testing to sign off the overall testing plans.
- Ensuring appropriateness of test data or AI training data to ensure real-world outcomes are understood.
- Validation of input parameters to ensure they are within expected limits.
- Robust and graceful recovery from failures. Employ the chaos monkey to create unexpected failures.
- Monitoring and logging principles and standards to identify and track issues/errors across systems/components.
- Consider how the system will behave rather than function under different conditions, especially external stressors.
- Is the system decoupled to a degree that will limit impacts of component failure?
- Does the system have diverse paths to continue to operate around a component failure, can the system perform exaptation in the face of the unexpected?
- Seeking independent review and deep critique at the design stage.
- An understanding of the factors external to your systems that can indirectly impact the system, such as human, geographical, legal and policy factors.
Risk is a very real part of being an architect and the unintended consequences of not effectively managing risks can have far-reaching and life-changing impacts.