Since the early days of NERC CIP I have been unable to identify what I would do for OT Critical Infrastructure Cyber Security Regulations if I were omnipotent and could specify and enforce whatever I thought would work. After spending a week in Singapore this July watching government and industry grapple with this problem, and having listened to other experts, I have enough ego to finally propose a solution.
- Identify the country’s critical infrastructure companies / organizations where the OT is necessary for the critical product or service.
The key here is to avoid the temptation to say everything is critical, and this is a more difficult task with political ramifications in a large country. For example, the US has over 140,000 water facilities. It would be difficult to get enough talent to handle this number. What are the 100 or 1,000 that are critical?
2. Determine minimal effective operations required for each company identified in Step 1.
In a perfect world, everything would run at 100% of capacity at all times. We know this doesn’t happen, so there is usually excess capacity. In addition, the community can live in a degraded state without having an unacceptable quality of life, impact to the economy, or environmental damage.
The government and the critical infrastructure entity shall determine the minimal effective operations criteria. How much drinkable water is required? How much product must flow through the pipeline to which locations? How much product must be manufactured in what timeframe? How much power must be produced?
The ability in step 1 to limit the number of regulated critical infrastructure entities and the political will to approve degraded operations are key. Remember though, I’m omnipotent.
3. Determine required recovery time objective (RTO) for minimal effective operations in the event of a cyber incident.
Again in a perfect world the critical infrastructure never goes down, but in the real world it does go down for a variety of non-cyber reasons. For water, pipelines, refineries, manufacturing, and many other sectors the RTO may be days. The government and the entity need to discuss what the RTO needs to be, with the government regulator making the final decision.
It will take maturity and political will to not take the easy way out and set this at such a short time period so that no one is inconvenienced. For example, Colonial Pipeline was required to get operations running in seven days. They did it in six days. There were lines and panic, and the outage resulted in a loss of efficiency and commerce. Still it was not catastrophic. The seven days may be the right RTO.
4. Require the CEO to sign off annually that a high confidence plan to meet the RTO is in place that addresses a cyber incident.
Assume an adversary has administrative access and complete control of the ICS and OT environment, and the IT environment. This includes the ability to compromise all cyber assets. This includes the ability to cause any physical damage that this access and control allows. The entity needs to have a plan to meet the RTO if this happens. And the CEO needs to sign off that the plan will meet the RTO. (Note: this is why I see orchestration becoming more critical in OT).
The plan could be full manual operations, partial manual operations, partial restoration of cyber assets, spares strategy, business partner arrangements, and a variety of other approaches.
5. Require the CEO to sign off annually on current status of recommended OT security controls and risk assessment on any security controls not met.
Steps 1 – 4 are unrelated to protect / preventive security controls. In these steps we assume the adversary has defeated the security controls and affected the ability to provide minimal effective operations. This does not mean that security controls to reduce the likelihood of a cyber incident are unimportant.
The regulator provides a list of recommended security controls. Annually the entity has to report on whether each security control is met, and this report must be signed by the CEO. A risk-based rationale must be given for any controls that are not met.
This could be taken a step further where the high priority security controls are identified, e.g. multi-factor authentication for remote access and patching the externally accessible attack surface. The regulator could be required to sign off on any rationale for these high priority controls not being met.
That’s what I would do. What do you think? What would you do?