Adding new security systems and making updates to the control system in the name of cyber security tends to have a ripple effect. Operational processes that were once nearly bulletproof have new or unknown steps, recovery efforts that were previously successful may not be, and troubleshooting problems will have new failure modes that haven’t been encountered by operations personnel before. Ironing out all the new wrinkles can take time, and that time investment should include documenting the problem, the root cause, and the associated fix or mitigation. Without an understanding of the problem and the solution, it’s likely to encounter the same issue in the future. Solving the same problem over and over involves extra cost, and could be time much better spent.
The problem with these types of issues is that they are difficult to explain in generic language, the sample space of problems is too large to simplify down to a good cause and effect problem. This is magnified in control systems, where the individuals with specific knowledge of the control system application are far fewer than the Windows, Cisco, etc experts out there. As a result, many of these issues end up being “Intellectual Currency” in interactions with consultants, vendors, and other interested parties. Those with the experience should tell those without the experience what is wrong and what to do. So, in the interest of donating some of this currency, here is an experience I’ve had with control system and security interactions, and a few possible fixes and mitigation.
I once found myself in a position where I had to restore a backup to a system that had been irrevocably damaged (no, I’m not telling you why and whose fault it was). This was on a relatively new control system, one that had a domain controller being used for some basic authentication (or so we thought). That backup was older than I was used to working with, since the system had not received updates in a while, and it was practice at the time to make a backup image only when a change had been made. The process of restoring the backup was swift, no issues there. However, I wasn’t able to log into the system, it gave me some generic error about not being able to contact the Domain. I logged in with a local administrator account, as the fallback practice was to use that account to restore functionality or troubleshoot. Certain programs ran, others didn’t.I was definitely having issues with anything that required any interaction involving the Domain, which included Windows file shares, OPC, and a few others.
After a bit of investigation, I found the root cause of the problem. Active Directory utilizes a ‘machine account’ which is basically like a user account, only it’s associated with a specific computer. It authenticates the client system to the Domain, and an incorrect password will result in communications with the Domain and certain communications to Domain members to fail. This account password is typically changed by the client system every 30 days, and the client system had likely changed password a few times since the backup had been taken. Once the problem was understood, I could fix it. I removed the client machine from the Domain, and re-added it. After the reboot, testing showed that all the problems with the programs were no longer happening, as I had full Domain access again.
Measures to prevent this issue again are many, and are based entirely on what level of risk the organization is willing to accept, and the cost of that mitigation. Some of the options I considered were:
- Updating the restore procedure for non-Domain controllers to include the extra step to remove and re-add to the Domain.
- Pros – Simple, cost effective, anyone could follow this extra step.
- Cons – Takes extra time ($$$), and what about multiple restores or restoration of a Domain Controller(which could bring every client system down)?
- Automate Backups with a better Backup System
- Pros – Solves the problem by keeping clients sync’ed with Domain controller, less personnel time
- Cons – More Expensive, adds complexity for site personnel to support, and the network based parts could cause problems with control system devices
- More Frequent Backups, on the order of at least once a month
- Pros – Less of a chance of the client not authenticating to the domain because of password changes
- Cons – Not effective enough preventing the condition, Extremely costly, as many places had at least 30+ systems being backed up at 1 hour per backup. You could keep a person busy for a week simply performing backups
- Change the Machine Account Password Change period, or disable it altogether
- Pros – Likely fixes the problem with minimal effort
- Cons – Way insecure, might violate one of those pesky security regulations
Be aware of ripple effects when you go to add cyber security to your control system. I won’t tell you which or how many of those options I would pick, but I’d be interested in hearing in what you’d do and why. We also accept donations of Intellectual Currency in the comments as well.
image by ka3vo