Identifying Protocol Parsers and Cryptographic Algorithms in Firmwares and Binaries

In my talk way back at S4x15 I briefly mentioned a few techniques at identifying interesting parts of an application for reverse engineers.

A lot of times we as reverse engineers load up a firmware or DLL or executable. We want to get hunting for bugs as quickly as possible. Hunting for bugs usually means rapidly identifying the data entrypoints into an application. In the case of networked equipment and networked software, data entrypoints mean network protocol parsers.

One of the techniques that I talked about was using CRC/hashing/crypto algorithm precomputed tables for identifying interesting functions. I thought a little post about this technique was worth writing.

Protocol Parsers

Obviously there is an easy button for some application software: searching for OS functions like recv() is a sure way to find your protocol receiver (and perhaps your parser as well). For deeply embedded systems this is a bit trickier though, as a disassembler will not necessarily know what function in the disassembly constitutes recv() or any other C library call.

Most protocols in the ICS space have a lot of legacy cruft hanging around, though, which can help us a lot. The ‘cruft’ parts of protocols are those that are no longer necessary in the Ethernet and IP worlds, but are left in the protocol to simplify the lives of implementers. Usually, the IP variants of protocols are just wrapped-up versions of the serial protocols. Serial protocols usually included CRCs, since the serial physical layer provides for no error detection or correction.

One of the neat tricks that was invented long ago was to partially precompute CRCs using a table. The algorithm, known as the Rocksoft Model CRC Algorithm, is documented in a nice paper, which includes the source code needed to generate the precomputation table given any CRCs exponents.

A precomputation table is generated, and uses (256*CRC_OUTPUT_SIZE) bytes of memory. In exchange for this little bit of storage, CRC calculation save a lot of processing time that is involved in multiplying and adding individual bits to build the CRC output — instead, they just look up a precomputed value for the current byte, then XOR with the running tab (bitshifting the current value up one byte).

This fact is also really helpful for reverse engineers. For example, searching for the byte sequence 00 00 36 5e 6c bc …. can help locate a protocol parser. These particular bytes make up the beginning of the Rocksoft Algorithm table for CRC-16-DNP (note: on some systems these sequences of bytes may be endian-swapped, so search for 00 00 5e 36 bc 6c, or even 5e 36 00 00 …). Assuming that you have a disassembly generated for your binary in a modern disassembler, e.g. IDA Pro, you can search for the byte sequence, and then quickly identify the code that refers to that byte sequence via a data cross-reference.

Now you’ve identified the portion of the code responsible for generating the CRC of a packet. The CRC generator will be called by the packet parsing functions (as well as the packet builder functions), so identifying which functions are part of the packet parser should be a lot easier in your big binary blob.

Most CRC implementations are these ‘fast’ versions. Building up a list of common CRC tables can help you quickly identify what CRC is being used in a proprietary protocol, as well as where the protocol parser is located in a binary when reversing it.

Crypto Implementations

Similar tricks can be used to identify cryptographic implementations.

Oftentimes you’ll be looking at a product that’s “got Crypto(tm)”, but has no key management, or has nonce generation that is odd and appears vulnerable to replay. You /just know/ that it is using a hardcoded key or other poor implementation, and you’d like to identify the crypto algorithm and the key itself.

Sadly, a lot of ICS software that supports cryptography actually ‘rolls their own,’ even when an already-installed OS library would provide a better implementation. Locating these cryptographic functions is something that we want to do.

The popular binwalk tool can locate keys that are formatted, for example if they are stored in an application in a known format like PEM. Binwalk can even identify areas of a file that contain high entropy, which is usually indicative of a hardcoded key that is just stored as a byte array.

Sometimes, hardcoded keys are poorly selected, and are either ASCII-printable, or simply lack entropy. Ironically it can be harder to quickly identify these keys without first locating the encryption algorithm. Fortunately there are similar techniques to the CRC trick above.

Quite a few cryptographic algorithm implementations also make use of tables. DES for example uses input permutation tables and substitution boxes. Both parts of the algorithm get implemented in the same way in every application that Labs has encountered to date: hardcoded tables of values representing bitmasks.

Scouring a binary for the sequence 0e, 04, 0d, 01, 02, 0f, … for example (corresponding to the first DES S-Box) is a great way to find a DESCrypt implementation.

AES is is also implemented, often using a precomputed S-Box (although, AES is still rare on ICS devices which support encryption; antiquated DCS still wins for adoption rate) . Even the blowfish algorithm uses substitution boxes, the constants of which can be found here.

These cheap tricks can save the beginner reverse engineer hours of effort in locating interesting parts of code, and let you focus on learning the assembler and finding bugs in the parser or crypto implementation. I hope that they help all of us find more bugs and poor software implementations, so that can continue building more robust control systems protocols.

Image by rubygoes

Identifying Protocol Parsers and Cryptographic Algorithms in Firmwares and Binaries

Recent Posts

Podcast: Pwn2Own Miami

ICS Security Architecture