The System Average Interruption Duration Index (SAIDI) is a reliability metric used in the electric sector. It’s a measure of the average annual outage time for a customer. It can be measured by company, state, or country. The US data is available here. (btw, the US compares poorly to most of the other developed countries in SAIDI).
The SAIDI number typically presented is SAIDI Without Major Event Days. A Major Event Day is a threshold calculation based on past data and simply stated is a day where a large percentage of customers were without power for a long duration.
It seems like cheating to remove Major Event Days outages from a reliability metric. One of the main reasons it’s done is no one can control the weather, see related Port of Nagoya article last week, and most Major Event Days are a result of a weather event. It can be argued that the performance of a utility in a bad weather year cannot be expected to be the same as a normal or good weather year.
Source: U.S. Energy Information Administration, Form EIA-861
In the chart above you see a 180-minute (3 hour) difference in average outage time for a customer in 2019 and 2021. With major events removed the difference is less than 4 minutes.
The self-referential definition of a major event seems to be cheating. Your SAIDI will never have a major increase unless you have a much larger number of smaller outages. Maybe the large scale outages are self-evident and are analyzed differently. Still I think there is something the OT security community can learn from this.
Perhaps we should be separating metrics, when we have them, based on the realistic ability of a defender to stop the attack, or the unique nature of the attack, or something else. Despite the common “sophisticated attacker” refrain, most attacks that have had an impact on OT operations are simple, well known attacks.
The recent Waterfall / ICSSTRIVE report said that 42 of the 57 publicly disclosed incidents in 2022 that had an impact on OT physical operations were criminal ransomware, mostly only infecting IT systems. These should be foreseen, and at a minimum had the consequence reduced via an incident response playbook for a common scenario. These attacks and the consequences of them should be in any OT cyber risk metric.
Stuxnet is the most prominent example on the other side. Can an asset owner be expected to do well against a highly resourced, long timeframe attack from the US, Israel and likely some other participants? I would say no.
The Triton incident in Saudi Arabia is a tougher call. Some of the basic security deficiencies that allowed the attacker in would argue against excluding this from metrics. The advanced nature of the attack on the SIS PLC argue for excluding it, especially if you believe the attacker would have found a more creative way to gain access if the easy way was unavailable.
Companies affected by the Solarwinds incident, although not affecting OT, would be warranted in excluding it from metrics in my judgment.
What cyber incidents affecting OT should be excluded from metrics? I don’t have the answer, and I’m pondering the question.