Mining Malware - Generating Data For Searches - Dale Peterson: ICS Security Catalyst

The idea for mining malware for evidence of targeting automation came out of reading several papers on Stuxnet that discussed the methods used to intercept calls to the S7 PLC. To summarize, Stuxnet replaced the Siemens stock s7otbxdx.dll with a new version that watched the PC-PLC interactions, and either allowed the interaction to go through without modification, or made modifications to the interaction. All in all, I thought this was a rather clever method for one major point: an attacker didn’t need to write some complex library of commands to fully reprogram the S7, s/he could simply extend the existing functionality of the system.

The approach spoke to me because of simplicity and reliability reasons, and I wondered what other approaches might be similar… Would an attacker seek to register a malicious COM/DCOM automation control in place of a valid one? What about sit on an automation port, watching for automation traffic to meet certain conditions before firing off a ‘open all breakers’ command? Were there data points (literally, things like MW, MVARs, pressure, etc) that were basically standard in systems that a piece of automation specific malware could test for and use?

Unfortunately, this is a huge sample space for an individual researcher to go through, so I settled on demonstrating the concept under a specific set of conditions. After a little thought, I settled on a various OPC interfaces for a specific legacy generation DCS, the INFI-90 system. I pulled down from the internet a few OPC servers that are used in power generation, specifically those that interface with the INFI-90 DCS, an older ABB/Bailey system that is still in common use today on 1970-80s era coal fired units.

The INFI-90 system is unique. REALLY unique. Developed in the 80s, it had an proprietary interface between servers and devices over SCSI, though it could also use a lower throughput RS232 interface. Originally, this interface used a DEC VAX/VMS, but upgrades and cost cutting eventually put the functionality into a set of drivers called ‘SEMApi’ on Windows 95/98/NT/2000. The interface is low level, there are no standard Windows drivers that can handle it, and interactions with the DCS using modern OPC servers must either go through the proprietary ABB/Bailey SEMApi drivers, or another set of custom built drivers/APIs that still use SCSI or serial.

So, what we have is a crazy interface that you’re not likely to see outside of a power plant, coupled to a technology (OPC) that isn’t used much outside of automation, and locked down to a specific set of vendors who support it. I pulled down three OPC servers I knew were in use in generation. Two use the SEMApi method of interfacing, one is an alternate interface developed by the vendor. I installed the software on a virtual machine, pulled all the EXEs and DLLs out of the installed software, and then ran it through a ‘strings()’ parser. With all the different options available, I settled on looking for DLLs that were mentioned in the strings() data, going under an assumption that at some point they would reference a common set of DLLs, or interfaces.

I pulled all the DLL’s referenced together into a single list, and ranked them in order of uniqueness on a 1 to 5 scale. If a DLL was extremely common (like a standard Windows component) it was a 1. It the DLL was very unique, such as being a proprietary DLL, it was ranked a 5, with other DLLs falling somewhere in between. This constituted a good idea of what DLLs could be loaded by my OPC programs at runtime, and might be useful in determining if a malicious process was interfering with the function of those DLLs.

While I focused on DLLs in this search, there is a chunk of data that might be useful (people I’ve talked to that routinely look through virus data say “far more useful than DLLs”):

Registered OCX, COM, and other objects referenced by CLSID – CLSID’s are unique, and serve as a portable method of referencing objects between many systems. Malware will often register and use specific objects by CLSID.
MD5/SHA Hashes of Important Files – Malware will often use other files, some that it downloads and others that are already resident on the system.
Common IP traffic – Many of the newer virus searching platforms are using limited dynamic analysis through sandboxing (a’la CukooBox), so they can capture network traffic. This isn’t searchable right now though.
Simple Hashes of Automation Files – While not generally infected themselves, they are often submitted in bundles which can contain malware. Looking through files uploaded at the same time might be beneficial.

On a whim while writing this post, I entered all the the 5 ranked DLLs (14 in total, of 117) into malwr.com. Malwr.com is a site that is similar to VirusTotal, but gave me the capability to search specific files that were interacted with after being run in a sandbox without a private account. Only one returned a hit, asycfilt.dll, which was a DLL involved in a June, 2010 security update.

This goes to a single point, even when you pull together a large amount of data that you think will show something, you might still not find anything. Making searches quick, simple, and easy is a prerequisite to doing this type of research, making my ‘whim’ search valuable only if anyone can do it quickly.

title image by Jeffrey Beall

Mining Malware – Generating Data For Searches

Recent Posts

Podcast: Pwn2Own Miami

ICS Security Architecture