Someone elses problem

This post is part of a coordinated series of blog posts examining the details of version 5 of the NERC Critical Infrastructure Protection (CIP) standards. These posts, written by various individuals having direct experience with these standards, will point out security gaps, ambiguities, and areas that could prove challenging to audit. The purpose of the posts is to highlight areas for future improvement, and to draw attention to issues for which entities may wish to apply greater diligence than is currently required by regulation.

There’s a wonderful line in every NERC CIP regulation located in the Exemptions section, stating that the following are exempt from the NERC CIP standards:

Cyber Assets associated with communication networks and data communication links between discrete Electronic Security Perimeters.

This simple statement allows entities to automatically remove  a set of assets from NERC consideration that could otherwise be considered critical to the operation of the BES. The initial push for this requirement during the development of the original CIP standards was that entities often didn’t have any control of telecommunications networks, they only used them for communication. This is a valid, though poorly implemented, concern and was mainly in regards to contracts that would require re-negotiation and ensuring they weren’t trying to regulate non-electric entities.

The basic problem I have with this statement is simple: With the sweep of a pen, it removes an entire swath of potentially critical assets and communications from consideration without requiring commensurate protections to make up for the resulting vulnerability. It’s been years since the original CIPs, and the exemption has simply been imported without examining the underlying risks and vulnerabilities it causes.

And making the BES vulnerable is what this gap does. Attacks like man-in-the-middles, IP spoofing, rerouting traffic, and several others are possible if various parts of the ‘non-critical’ networks are controlled by malicious actors. And the use of automation protocols between these the discrete ESPs make it even worse; lack of authentication and authorization mechanisms in these protocols leave them open to attacks that can mislead both sides of the communication. INL was demonstrating this in late 2006, and the capability to do these attacks on real systems has increased in the 7 years that followed.  If you doubt me on this, take a look at mobdus-vcr, a plugin for Ettercap that conducts MITM on MODBUS connections. A lot of entities recognize this issue, and take appropriate steps to counter the risk, but many don’t. Others even use this clause as a way of slicing and dicing their systems to avoid compliance on certain assets, or of justifying not doing any encryption as it’s “not required”.

The CIP-005 protections for access points help mitigate risk, but only from attacks that don’t take advantage of this exemption. Use of ingress/egress policies by host and protocol aren’t effective if the access point can’t trust the exempted side of the communication. As an example, an attacker compromising a switch that sits outside the firewall can generate traffic that is trusted by the firewall, and could appear to originate from anywhere. This nullifies any security gained from having strict ingress/egress policies.

That being said, simply removing the exemption causes it’s own problems. Fundamentally, these networks can include hundreds of assets, across many physical and environmental areas, may be subject to 3rd party control, etc. So, simply using the NERC approach of identifying the assets and protecting them wouldn’t give us efficient risk reduction, and it’s generally not done in other industries either. Banking doesn’t have dedicated networks for wire transfers anymore, neither do credit card transactions, but they also don’t implicitly trust that the network they are using is safe.

So, what I’d propose to close this gap is that the NERC CIP-005 V5 (or V6, or whatever) standards require that entities implement technical mechanisms that eliminate the possibility that communications between discrete ESPs could be modified or altered in transit. I’m basically requiring encryption, the most basic security control for communications across untrusted networks. A basic precaution that is only mentioned in the context of information protection and Interactive Remote Access in the NERC standards.

Warning, personal opinion incoming: In the early days of the NERC CIP standard, there was a major fight regarding requiring the use of encryption over networks. Entities were concerned that use of encryption could cause delays and latency issues that could inhibit the reliable operation of a SCADA system. A valid concern, one that should have been investigated, quantified, and appropriate measures taken to ensure that entities could find a balance between security and operations while still meeting compliance obligations. But rather than do that, the need for encryption was simply written out of the regulations as untenable for entities.

Over the years, I’ve yet to hear a fully thought out, data driven, and scientific argument for latency and delay due to encryption being a major concern for all electric power SCADA networks. All are different, and have varying levels of latency and delay resistance. Most of the arguments I have heard are based on personal experience or an overabundance of caution.

As a young engineer, I remember thinking how convoluted this sounded; fundamentally the concern that latency and delay could cause serious problems was a problem in-and-of-itself, and not just a cyber security concern. I couldn’t imagine a system that required such strict latency requirements over systems that were vulnerable to a host of natural physical issues that could cause latency and delay. It seemed like a bad practice simply because of Murphy’s Law.  As a slightly older engineer, the practice of being sensitive to even slight changes in latency and delay makes me concerned that there is too little intelligence at the end site, and far too much control required from the head end, a risk by itself.

Am I saying that delay and latency aren’t a concern? No, I’m not saying that. What I am saying is that we need to remove the carte blanche exemption we’ve put in the standards, and put in case by case evaluations where latency and delay are a concern. Encryption is good practice, backed up by science and data. To NOT use encryption should be subject to intense scrutiny, not the other way around.