If you will forgive yet another article inspired by the Colonial Pipeline incident … it does represent the oldest of the three must have OT Incident Response Playbooks.
Playbook 1 – Enterprise Network Compromised
Pending additional details (this is written Monday afternoon) this may be the playbook needed for the Colonial Pipeline incident. Scenario: The enterprise network has been compromised, and there is currently no evidence of a related compromise in OT.
For almost two decades many OT systems have had a formal or informal process to remove network connections between the enterprise and OT. This can be pulling network cables, powering down firewalls, or other measures. Problems can arise when this is an informal, almost casual, response to “what would you do if the enterprise is compromised?”. There are important questions to answer before being faced with this disconnection decision, such as:
- What are the factors to be considered in the decision to disconnect OT from the enterprise?
- Who has the authority to approve the disconnection?
- Is every network connection identified that if removed would contain the incident to the enterprise / isolate OT?
- How long can your ICS operate in this disconnected mode? and importantly,
- Have you tried this disconnection?
Other containment actions could be disconnecting the backup control center, moving failover servers to cold standby, and actions that would prevent a cyber attack from destroying the redundancy benefits.
OT isolation is a first step, but it is not where the unique parts of this playbook ends. Many asset owners who have had compromises on the enterprise network have found that while their ICS was unaffected, they could not continue operations. Why? Required supporting systems on the enterprise were unavailable, systems such as scheduling, recipes, mobile fleet, shipping, and billing. In the creation of this playbook you need to identify if any systems on the enterprise are required for ICS operations and how you will adapt if they are unavailable for a prolonged period of time. Again, there is no right answer to this. It is a business decision, and one that should be thought out and tested before it is needed.
Another critical step is determining if the enterprise compromise has already spread to OT. If OT systems are crashing and ransomware notes are found, it is a simple yes and incident response likely shifts to another playbook as this analysis phase is similar for many OT cyber incidents. Determining what, if anything, was compromised can be very difficult. Even if you don’t find any evidence it is hard to state with a certain, declarative sentence that there is no possibility that OT was compromised. I’ll defer to the incident response experts, such as FireEye who was brought in at Colonial Pipeline, to explain how to do this.
Playbook 2 – Ransomware in OT
The reality of ransomware in OT, not just on the enterprise, makes this a great playbook to use for the generic recovering from a compromise of OT that takes out all of your computers scenario. It also makes a great tabletop exercise (TTX). You can point to the headlines if someone protests it’s not possible.
Stepping through this scenario in playbook creation and TTX will highlight the key decision of when to start recovery and how to recover with confidence. The knee-jerk reaction in Operations when asked is often “immediately”, since availability is viewed as the primary goal. This begs the question of how you know what to recover and what clean backup can be used without analysis? Try to recover what is obviously compromised as fast as possible is one common, if perhaps unwise, approach.
This playbook also often highlights that the recovery capability is untested and unlikely to meet the time commitments made to management. The asset owner community has stepped up recovery scenarios post-Ukraine, but most are still based on a minor cyber incident where a few computers need to be rebuilt. The ransomware playbook assumes a much larger breach.
The playbook should also address the possibility that simply identifying the computers hit with obvious ransomware are not all of the cyber assets compromised, similar to the analysis phase discussed in the Enterprise Network Compromised playbook.
Playbook 3 – Operations Down …. Is It A Cyber Incident?
The most publicized examples of when this playbook was needed are Stuxnet and Triton. The attacks caused outages, and the cyber incident cause was not identified in the early outages. This is a challenging playbook, and the one we see least.
It would be a bad allocation of resources to launch an OT cyber incident response every time there is an issue with the physical system being monitored and controlled or with an OT cyber asset. To address this when we have been involved in creating this playbook, we have added a key decision point early in the detection phase.
- If the consequence is high and the cause unknown, then an OT cyber incident is declared.
- If the consequence is medium, and it is a repeated incident, than an OT cyber incident is declared.
OT cyber incidents initiated with this playbook will often be closed in the Analysis phase when no evidence of a cyber incident is found.
Please let me know if there are other OT cyber incident response playbooks that you believe need to be developed and periodically tested?