Alternate Title: Will We Require A Few OT Security Controls And Claim Victory?
Nassim Taleb described the “Narrative Fallacy” in his book The Black Swan.
We like stories, we like to summarize and we like to simplify… what I call the narrative fallacy.
Something happens and our human minds create a psychologically satisfying story as to why it happened and how we can prevent it from happening again.
The story, the narrative fallacy coming from the Colonial Pipeline incident, the JBS incident and similar incidents is we need a set of mandated by regulations cyber security controls to address the risk to a region or the nation that a cyber incident could cause. An example of the neat and clean story is the Colonial Pipeline incident being initiated by remote access via VPN that did not require two factor authentication. So now we have Congress, Biden Administration officials and experts saying we need to mandate two-factor authentication.
Which leads to the second term, from Richard Clarke and R.P. Eddy in their book Warnings: Findings Cassandras To Stop Catastrophes:
Herbert Simon coined the term “satisficing” in 1947 to describe searching through the available remedies until an acceptable alternative is found that “addresses” the problem, but doesn’t solve it. This alternative is usually easy, not requiring significant resources or disruption.
Satisficing is common in governments and other bureaucracies. Information Sharing and public/private partnership efforts are a great example of satisficing that has been pushed every year in OT for nearly two decades. The set of critical infrastructure cybersecurity controls and related regulation that will try to be passed in the US by the current Congress are another.
Many of these mandated security controls, including two-factor authentication for remote access, are not wrong. They do address the problem and reduce the likelihood of an incident and corresponding risk. They don’t solve the problem of a cyber incident to critical infrastructure resulting in an unacceptably large consequence.
The recent incidents that have brought the government action / regulatory push to this point have been compromises of the enterprise network, not the OT / ICS. Do we believe that any set of security controls will eliminate breaches of the enterprise network? Even with the perfect set of controls, they are managed and maintained by humans so there will be an error rate. These systems reside on networks where there is Internet access to most web-sites, email with attachments and other open attack paths. Security controls can reduce the likelihood and shorten the detection time, but they are not going to prevent all incidents.
We need to avoid the narrative fallacy of
- this is how the adversary got into the system
- bad asset owner for allowing this
- we must mandate a security control to prevent this
- now we can sleep soundly
To be fair, many of the people who are pushing for regulations are also pushing to punish those who attack US systems by launching commensurate cyber attacks. However even if you combine the two of these we are only reducing the likelihood of an attack that causes unacceptably large consequence. We are not going to stop cyber incidents from occurring, particularly on the enterprise network of critical infrastructure companies.
Solving the problem is insuring the critical services and ability to provide critical products can be available in an acceptable period of time in the event of a cyber incident on a critical infrastructure entity. From an OT perspective, a starting point would be restoration in an acceptable time period of the service / product if the enterprise were compromised, since this is the case we are seeing and will continue to have the highest frequency. A second step could be restoration in an acceptable time period if the OT system were compromised.
Restoration doesn’t mean the cyber attack has been eliminated and all cyber assets have been restored. It means that the key product / service is delivered in the minimum required amount to the critical customers. This could be an alternate supply source, limited manual operations, or a number of other options that tend to be sector specific.
We have a great case study with Colonial Pipeline. What would have happened if pipeline operations using the SCADA system couldn’t have been restored in a week? What if the attacker had gotten into the SCADA system and bricked all of the PLC’s/Level 1 devices by uploading bad firmware and SCADA restoration would have taken 3 months? Again, the answers shouldn’t be limited to getting the SCADA system up and running. Are alternate means of transportation available, even at a more costly and degraded but minimally acceptable level? Are alternate sources available? The answers are going to be in the areas of redundancy and resiliency, not cybersecurity controls.
Hopefully we can hold more than one thought in our head. Yes, the use of good practice security controls should be improved and it may take regulatory action to make this happen. However, we should not create that the narrative fallacy that the right security controls will prevent cyber attacks that today can result in high consequence events. Solving the problem is when a cyber incident related to the delivery of critical products and services does not cause an unacceptable impact to society.
Note the Clarke and Taleb terms were brought to mind by Niall Ferguson’s book Doom: The Politics of Catastrophe, which I’m half way through. Readers might also be interested in Richard Clarke’s keynote at S4x18 where he address the “it’s never happened before” challenge and Cassandras.