Last week I wrote that creating an asset inventory typically isn’t in the early actions of an OT security program prioritized by efficient risk reduction. And I received a number of questions of what is on the short list. I’m not going to provide a list because it can vary greatly and you need to do the work to create your prioritized list.
One area that doesn’t get early attention and should be considered is setting and meeting a Recovery Time Objective (RTO).
The RTO is a business decision and should be based on recovering the ability to deliver your products or services to your customers. This typically will require recovering, some but not all, cyber assets, systems or networks. You likely can operate without the normal redundancy. There may be parts of the systems that can run manually. There may be parts of the systems that provide nice to have information that helps with efficiency but is not strictly necessary.
Operations with smaller or monitoring only ICS often can continue with manual operations. I’ve encountered even medium size Operations and ICS that retained the ability to run manually for weeks. If this is the case, your RTO could be weeks. Make sure you ask a lot of questions to verify this is not a theoretical ability to operate manually. The best way is to see it, whether it is cause by an event or through an exercise.
The advise to set and meet a RTO can be approached in stages using an efficient risk reduction approach. Below is three phase process. Phase 1 is an early action for almost every asset owner. Phase 2 is not far behind in a typical prioritized list. Phase 3 is expensive and for a mature OT security program. I rarely see asset owners that can meet a RTO if multiple Level 1 devices are bricked.
- Set and take the necessary steps to meet an RTO for a compromise of everything with an IP address on the enterprise (IT) network. The Waterfall / ICSSTRIVE report data shows that ransomware on IT is the primary cause of outages in 2022 and 2023, and this appears to be the case for 2024. If you can’t say ransomware on IT will not cause an unacceptable delivery outage for your product or service to customers, why are you spending time or money on anything else?
- Next set and take actions to meet an RTO for a compromise of everything on OT with an IP address. This is the case where the OT security perimeter has been breached. Either everything is obviously compromised or you are in an uncertain state. The later is likely since forensics in ICS is minimal. It is easy for an attacker with moderate skills to hide in a PLC. Assume that everything with an IP address needs to be rebuilt from the OS / firmware up, including the applications, logic/program, and some of the data.
- Finally, as your program is more advanced set and take the necessary steps to meet an RTO when your PLC’s, controllers, protocol adapters, and other ICS specific hardware is bricked and needs to be returned to factory. Unfortunately this is not difficult to do with access. This often requires having sufficient spares for the essential systems and a contractual relationship with vendors to provide replacements within the set timeframe.
The beauty of setting and having confidence in your RTO is you can answer executive questions without hedging or hand waving. Yes Ms. CEO we are implementing security controls to stop are cyber adversaries, and we have a recovery capability so the maximum outage if we are hit with ransomware is 36 hours as you specified.