The community is very hungry for threat data. So little is available than we crave and devour any bit. Last year saw the resurrection of the BCIT incident database, or some facsimile of it, into the Repository of Industrial Security Incidents [RISI]. This is one of the best sources for threat data that the community has and hats off to Mark Fabro and Eric Byres for resurrecting it.

That said, we have to be very careful of the statistical significance of any trend calculations based on RISI data. A scary article on Dark Reading is a great example. Here is an excerpt:

Cybersecurity incidents in petroleum and petrochemical control systems have declined significantly over the past five years–down more than 80 percent– but water and wastewater have increased 300 percent, and power/utilities by 30 percent, according to the 2009 Annual Report on Cyber Security Incidents and Trends Affecting Industrial Control Systems.

There are so many problems with this analysis. One basic problem was the database was defunct for three years from 2006 to 2008 so any calculations over that time period are flawed. But a more basic problem is the data is ad hoc. It comes from whoever decides to submit it. If we had a set of thirty companies in a sector committed to submit data year over year we could begin to draw some statistics.

If we look back in history, Byres and the BCIT team was very active with the oil/gas community back in the 2000/2004 timeframe with the early work on Achilles. So he had more access and interest in getting incidents than he certainly had when the database is defunct and now with his broader focus with Tofino.

There are so many opportunities to fall into this bad analysis trap with limited, ad hoc data. The 300% increase in water incidents could just be one or two people who decided to become active in RISI from the water industry, or not. Or let’s say it was an effective presentation by RISI at a water sector event that got a bunch of water utilities excited and submitting to participate and gain access to the data.

Add to this the total number of incidents per year is usually too small to be statistically significant in total let alone when it is subdivided by sector. There were only 175 confirmed incidents in the entire database at the end of last year. To draw useful statistics we would either need a much broader base of incident contributors with a corresponding much larger number of incidents, or a consistent sample size for a sector or sectors.

The RISI is highly useful for showing possible attack vendors, real world examples of impact and other good awareness information. Great effort for the community that we fully support. As a community we need to be very careful about the validity of statistical conclusions that could drive decisions based on this data.