I’m finishing my 25th year focused on OT security (called SCADA security when I started, then ICS security, and now OT security). So many failures, successes, changed analysis, and lessons learned over that time. Here are 3 lessons that I wish hadn’t taken me so long to learn. Maybe this will speed your learning curve.
Consequence Reduction Is Key
I didn’t embrace this until ~year 15, and every year since it has raised in importance in my OT cyber risk reduction strategy and tactics.
Security professionals should listen to themselves when they are talking about security controls to reduce likelihood of incidents. Will this expenditure of time and money prevent all OT cyber incidents? The answer is almost always no. If the likelihood is already very small, it might reduce it to very, very small, but not to zero.
Consequence reduction efforts can reduce the maximum impact with certainty. Even if the adversary has complete control of OT they can’t do X (dump dangerous levels of chemical into the water supply, cause something to explode, …).
Here’s were consequence reduction fits in a three step approach:
- Basic security controls (effectively segmenting OT from IT and 3rd party networks, removable media / portable computing controls for walk around perimeter risk, patch OT exposed from IT attach surface, MFA for remote access) to reduce incident likelihood from high to very low.
- Consequence reduction in case the very low likelihood incident occurs. Get rid of high and catastrophic incidents that could be caused by an OT cyber incident.
- Mature analysis of likelihood and consequence reduction actions prioritized by efficient risk reduction.
I mostly skipped step two for my first 15 years.
Insecure By Design OT Protocols And Level 1 Devices Greatly Impact Security Control Effectiveness
This lesson only took 10 years to learn, and we went public with it with Project Basecamp at S4x12. I don’t remember if it was K. Reid Wightman or I who coined the term back then. And it is often misused as a lack of secure by design (however you define that) practices.
Insecure by design: The design team intentionally choose to allow anyone with access to the device the ability to manage and operate the device. The attacker does not need to exploit a bug. They use the documented features and functions to achieve their goal.
Most people in the OT security community understand the protocols and level 1 devices (and yes JW, level 0 devices) are insecure by design. The lesson that many have yet to learn is this significantly reduces the risk reduction achieved by many good security practices in OT zones.
Ask yourself, will this good practice security control markedly reduce the likelihood of an attacker from achieving an end goal of impacting the availability or integrity of the ICS given the insecure by design issue?
Insist On Two Metrics Before Allocating Any Resources
Almost everyone believes in metrics, and yet very few OT security projects and programs have them. I’m embarrassed that it took me 20 years to learn this lesson. To insist on having two different types of metrics before spending any resources.
Metric 1: How will we measure if we have implemented and are maintaining the security control or consequence reduction action properly?
This is the easy metric, and it is often ignored. Security products are deployed, hopefully properly, and then languish.
A great example of this metric is Jim Miller’s Rating Deployed OT Firewalls’ Effectiveness. Segmenting OT from IT was a basic likelihood reduction security control. On or near the top of everyone’s list. Do it. Check it off. And so many of these OT perimeter firewalls are not patched, have bad administrative settings, and rulesets that make them little more than speed bumps. Jim not only shows this type of metric, but he also shows how they automated it for a large number of sites.
Metric 2: How much risk reduction is being achieved / has been achieved through this resource allocation?
This is where consequence reduction shines because you can easily come up with hard, concrete numbers. It’s a bigger challenge for security controls for likelihood reduction, and it’s likely you won’t be happy with your metric. Don’t let this stop you. Create and track this metric, and improve it over time.
- How many OT protocol connection requests were blocked at my perimeter? How many pivots were stopped by our internal microsegmentation?
- How many remote access attempts by attackers were stopped due to MFA?
- How many legitimate attacks or malware were detected by EDR? Stopped by EDR?
- How many rogue control or administrative actions on level 1 devices were detected by our monitoring solutions?
- How long have we increased our ability to produce without IT based on a resilience project?
These metrics scare some in OT security because they might not like the resulting data. Do it anyway. Don’t wait 20 years.
One overarching lesson learned and hope. We need the growing number of talented and experienced OT security professionals to use their judgment, not the lengthening list of good OT security practices, to recommend, develop, and implement OT cyber risk management at each asset owner.
The best list of prioritized actions varies greatly by sector, size of asset owner, and even individual site. If you are recommending the same thing in the same order everywhere, take a look at your risk-based decision process. We need you, not a checklist.