A lot to cover here so I’ll break this into parts.
Part 1 – Why Protocol Stack Testing
Achilles is a black box testing platform. For those new to testing, the term black box means the tester and tools have no internal knowledge of the device being tested. Achilles sends data to the device under test’s interface, evaluates the response, and verifies proper operation is maintained.
The initial Achilles Certification focuses on controllers – – PLC’s, RTU’s, IED’s and other field devices. Controllers have a bad history of falling over when any unexpected traffic is sent their way. We have seen this numerous times in assessments. In fact, many clients tell us to not even bother testing the controller because they know they will fail and have horror stories of broadcast or some other abnormal traffic causing problems in the past.
Simply stated – – the bar is set pretty low for controller security, and we see Achilles Certification as a near term way to significantly raise the bar.
Achilles Certification tests the protocol stack, such as Ethernet, TCP, HTTP, or Modbus TCP. A secure and reliable protocol stack is only one part of a secure implementation. The controller must be deployed with an appropriate security perimeter, support necessary security functions, be configured correctly, implement a least privilege methodology enforced by authentication and authorization, practice physical security and change control, and much more.
The factors in the previous sentence are what most of the security guideline and standards documents are attempting to specify. Look at NERC CIP, SP99 Part 4, IEEE P1686 and many others for examples. Once one or more standards are developed, with enough technical detail, it may be possible for a standards body to develop a certification program around these efforts. Developing these standards is not as easy as it seems as evidenced by the difficulty the IEEE P1686 team is having agreeing on a minimal set of security functions for IED’s. Probably the most promising technical standard / certification effort is ISA’s SP99 Part 4 and Automation Standards Compliance Institute, although both are in the early stages.
Even with proper configuration, change control, policy, … an attacker that can send a malformed packet that busts the protocol stack will be able to either crash or completely control of the device. This is why in the drawing below I have a secure and reliable protocol stack as the foundation in the many factors in device security.
Contributing Factors To Device Security
Developing a secure and reliable protocol stack is difficult, and performing quality assurance (QA) testing on the stack is very difficult. This is why controller vendors have been sending their products for Achilles testing for years now. Automated tools are needed to perform this QA, and there are a group of products going after this market.
Now put yourself in the asset owner’s shoes. How is the asset owner going to evaluate the security and reliability of the protocol stack? They could purchase expensive test tools and take man weeks to evaluate each the protocol stack in each potential product – – unlikely. They could evaluate each vendor’s QA program in this area. This is more likely, but still happens rarely and it is easy for a vendor to finesse these discussions if the asset owner is unwilling to spend many days reviewing QA records and results.
So an independent, third party certification of the protocol stack seems like a great first step to Digital Bond. It addresses an area with a history of serious security problems in controllers and other products, and it allows asset owners to focus on issues more under their control such as security features, configuration settings, policy and architecture.
I will go into more detail on the test cases for the various Achilles Controller Certification Levels in Part 3 as well as how an asset owner and vendor would use and benefit from the protocol stack Certification effort.
Digital Bond is a Wurldtech Partner
Part 2 – Testing Methodology and Coverage
As a developer, you look at the requirements and design specification and this dictates what the product or device must do. In the case of protocols, these specifications are in the form of standards issued by standards bodies or industry groups recognized as authoritative for that protocol.
Once the product is complete, it goes through QA testing to verify the implementation is secure and reliable. The question is of the total population of possible test cases, how many and which test cases should be performed?
The simplified diagram below shows the possible test coverage area. The small circle in the middle represents all possible test cases that verify the protocol works as specified. This is often called positive testing and most vendors perform positive testing.
One caveat on the state of current positive testing in the controller market. Many controllers only implement the subset of the protocol that they need for proper operation. So a portion of valid protocol messages are not supported or tested. The simplest and most common example in our experience is a controller that does not support broadcast traffic hangs and requires a reboot when it receives broadcast packets. There is nothing saying a controller needs to support broadcast traffic, but it should continue to operate properly when it receives broadcast traffic which leads us to negative testing.
The larger rectangle around the valid data circle represents all possible negative test cases. Negative testing sends data to the device that violates the protocol, and each test determines if the device properly processes (usually discards) the data and continues to operate properly. Mishandling of malformed packets and protocol fields is one of the most common cause of vulnerabilties in IT and control systems. For example, the recent ICCP vulnerabilities identified by Digital Bond and disclosed by US CERT where the result of malformed protocol messages.
The challenge is the population of potential negative testing is very large. There are a few common ways to approach this. Researchers like Digital Bond will typically analyze the protocol and device and estimate where design and implementation mistakes were likely to be made. A classic low-hanging fruit black box testing approach. This may be one of the best approaches to identifying ‘a vulnerability’ and mimics the hacker approach so perhaps it identifies the most likely found vulnerability. The problem is we get a very small amount of coverage.
As an example, lets look at the approach Lluis Mora from Neutralbit took related to OPC testing as presented at S4. Lluis identified and coded up 24 different test cases. These tests cases found implementation errors leading to security vulnerabilities in one-third of the OPC servers tested, so it was certainly effective from an awareness standpoint. Since he responsibly disclosed these to US CERT, we should see patches this year and a higher level of security.
But let’s look at the test coverage of Luis’s approach in the diagram below. The dots represent an individual test cases and the lines indicate paths where the researcher is trying a large number of illegal values for one or more fields. There is not much coverage, or therefore assurance, in this type of black box testing. In fact, these simplified diagrams grossly overstate the coverage provided by this ad hoc testing because the rectangle would be much larger.
The other problem besides limited coverage with expert or ad hoc black box testing is it requires a great deal of work to develop the test cases, and it must be automated to be effective in comparing products.
Wurldtech’s Achilles platform takes a different and unique mathematical approach using grammars. Grammar is a computer science term for a formalized and complete description of a protocol. The formal and complete description provided by the Achilles protocol grammars allow for provable claims of test and assurance coverage.
In fact, the Achilles grammar test case components use a special type of grammar called an attribute grammar. Achilles’ attribute grammar use attribute values to describe the protocol and create protocol tests based on the attribute grammar. The test cases produced from the attribute grammar can be proven to cover a defined portion of the protocol attack space.
I’m not going to try to explain the mathematics behind attribute grammars or the Chomsky hierarchy, but one of the items I am working with Wurldtech on is to have this explained in a series of white papers at different technical levels.
The important point is attribute grammars create large numbers of test cases, over 30 million in the entry level certification, with quantifiable and much larger coverage than other techniques as shown in the diagram below.
The test cases now cover area, everything within that polygon, rather than points or lines, and the attribute grammar is a concise and mathematical way of describing this coverage area. This large and quantifiable coverage is ideally suited for a certification effort as opposed to a set of ad hoc ‘expert determined’ tests.
Indepth testing of new protocols can be added to the tool in a matter of weeks rather than months because all that is required is a new grammar file and the Achilles engine generates the test cases.
Part 3 – Achilles Certification Levels
Level 1 Controller Certification is the base level certification and covers the common protocols in layers 2 to 4 in the OSI stack. These include Ethernet, ARP, IP, ICMP, TCP and UDP. There are over 30 million tests in level 1, and each test can consist of multiple packets. A failure of a single test causes the controller to fail the certification test. So a controller that passes Level 1 testing has proven a significant degree of security and reliability that is not found in the typical controller today.
In addition to the layer 2 to 4 Achilles testing, Level 1 Certification also includes a Nessus and nmap scan. Nessus and nmap are two of the most frequently used scanning tools by both IT Departments and hackers. The purpose of these scans is to determine if the controller under test can maintain proper operation while being scanned by these tools. There have been many examples where a controller has crashed during a well meaning scan by IT Department staff.
You may have noticed that no control system specific protocols were included in Level 1. They were not included in Level 1 because controllers support a variety of control protocols. It would not be fair to compare a DNP3 implementation to an Ethernet/IP implementation. However testing these control protocols is essential because they are much less likely to have undergone the same level of testing as an Ethernet or IP stack.
Achilles currently offers three Controller + certifications: standard Modbus/TCP, standard DNP3/IP, and the proprietary Vnet/IP. Test cases for additional control system protocols are under development and will be released in stages throughout 2007 and 2008.
A controller that has passed the Modbus/TCP test case family will be certified as Controller Level 1 + Modbus/TCP. An asset owner considering a new controller should look for a model that passed the core certification + the control system protocols they will be using.
The Level 1 Certification test cases and procedure were set in January 2007 and Achilles Controller Certification testing began on February 1. There are a number of products that have already achieved Level 1 Certification, but Certified Controllers will not be named until May 2007. This delay is to prevent any one company from having the only Achilles Certified Controller, and to give the early adopter vendors an opportunity to be in the first set of Certified Controllers. That said, I’m thrilled by the May date and see this as a big step forward for the community.
Work has begun on a Level 2 Controller Certification with a goal of specifying the test cases in 2007 and certification in early 2008. We are looking to include more complex storms (denial of service) testing at layers 2 to 4 and to cover the protocols commonly used to manage controllers such as ftp, telnet, http(s), snmp, and ssh. We would like feedback on what protocols should be covered in Level 2 so please either comment on the blog or send me an email.
In talking with some of the asset owners who have been long time supporters of Achilles and have required Achilles testing as part of their procurement process, it was interesting to hear that they would require Level 1 certification for most installations and perhaps Level 2 certification for their more critical installations.
Publishing and Publicizing Achilles Certifications
A list of Controllers passing the various Achilles Certifications Levels and + control protocols will be published on the Wurldtech site. The vendor, model, and firmware version tested will be included on the site. A MD5 hash of any modified configuration file required to achieve certification will also be included on the site. An example of the level of detail of the certification information that will be made public is shown in the screen shot below.
Results will be published only for Controllers that have passed the Achilles Certification test. Failed test results will remain strictly between Wurldtech and the vendor or entity that submitted the Controller for testing.
An awareness and recognition program is being developed so asset owners, vendors and others in the community will understand what Achilles Certification means and what controllers have achieved Achilles Certification.
Early Feedback and Questions For You
We had a great opportunity to get some feedback on the Achilles Certification Program at PCSF. Here are a couple of points:
How to handle the situation where a protocol is not present or disabled?
For example, some controllers may not support ARP which is in the Level 1 Certification. Our initial plan was to put a N/A with an explanation that N/A means the protocol was not present. The ARP tests still would be run, but there is no protocol to test.
A more interesting case is in the management protocols. As an example, let’s say a controller supported ftp, http and telnet. However the vendor decided their http implementation was weak or turned off by default. This is a case where N/A could be misleading. So we are looking to differentiate between protocol not present in the controller and protocol present in the controller but disabled in the test configuration.
Many of the asset owners requested that all protocols in the box be tested for certification, and this would be my recommendation to a vendor. In the end, it is the vendor’s decision on what is submitted for certification testing.
Possible Confusion with the + Control Protocol Certifications
A number of the attendees did not like the term Level 1 + Control Protocol such as Level 1 + Modbus TCP. They felt it might be misleading. A controller with Level 1 + Modbus/TCP and Vnet/IP (a proprietary protocol) would not necessarily be better than a controller certified as Level 1 + Modbus/TCP that did not support Vnet/IP.
From a pure linguistic viewpoint the + is accurate because it is Level 1 and additional tests. And we do want asset owners to look at the Achilles Certification and say we require Level 1 + the control protocols we use. That said, enough people have had the same comment so we are looking for alternative terms or presentation structure for the +.
Neither of the two issues above affect the testing. They affect the public presentation of the certification results, which is extremely important.
There are a number of areas we would like to get your feedback.
- Any suggestions on the two issues above.
- What protocols or other tests should be included in Level 2?
- Where should we focus the Achilles Certification awareness plan? What trade publications, industry or vendor events, etc.
- What would it take for you, as an asset owner, to include Achilles Certification in your next RFI or RFP?
You can comment on the blog for all to read or send comments directly to me at firstname.lastname@example.org.
The first Achilles Certified Controllers will be announced in May, and there will be controllers from multiple vendors in this first announcement.