A few years back, the traditional IT world was debating the merits of virtualization. There were concerns about performance, security, vendor support, and a host of other issues. Fast-forward to today, however, and you’ll find virtual machines in use in nearly every data center. The number one reason virtual machines have revolutionized server-side computing, I believe, is cost savings. I can deploy a server in a fraction of the time I could before and, from a power consumption standpoint, operate it much more inexpensively. And then there are the business continuation benefits – I can quickly fail over or recover to virtual machine across the city or across the globe.
So what are the implications of this in the SCADA world? I think it’s just a matter of time before we see more widespread acceptance of VMware and other virtualization platforms in production control systems. The benefit here may be less about cost savings, though, and more about increased functionality. The ability to snapshot and clone machines for backup and testing, for example, is very attractive.
We’re going to examine this subject over a series of blog posts. Hopefully we’ll cover all the major topics – security, reliability, performance, serial communication issues, vendor support, and adoption rate, to name a few.
I look forward to your comments and opinions.
Before we get too deep in this discussion, perhaps we should make sure we’ve covered the basics for those who may not be familiar with the concept of virtualization. In theory, virtualization is an abstraction of any computing resource and has been around for forty-plus years. The most common use of the term now, however, refers to the practice of running one or more independent “virtual” machines (VMs) on a single physical machine. Generally speaking, there is a host operating system that runs the virtualization software that in turn runs any number of guest VMs. Each VM emulates a set of hardware (memory, disk, processor, network adapter, etc…) and allows the installation of an operating system. The VM then functions almost identically to a physical machine.
The most popular virtualization products are made by VMware, but there are others such as Microsoft’s Virtual Server and the open source product, Xen. In my experience, VMware is by far the leader in most IT shops as well as in the few control system applications I’ve seen or heard about.
As I mentioned in Part 1 of this series, cost savings is the major driver for IT shops to implement VM products. For control systems, though, the driver will likely be different. I believe if it is approached correctly, this technology has great potential for the industry.
For part 3 of this series, let’s dive into one of the major uses of virtualization – testing – and see if we can sort through how and where it applies to control systems. As we discuss this topic, we are assuming that the majority of servers connected to control systems are physical machines. If you’ve already embraced this concept and are running your production servers in a 100% virtualized environment, some of the conversation may not apply. For the rest of us, let’s assume that we want to use virtual machines to introduce or enhance testing ability in a system made up primarily of physical servers.
Let’s take a fairly typical system with a mixture of *nix and Windows machines, performing a variety of functions – HMI, Historian, SCADA server, ICCP server, etc… Using a tool like P2V (physical to virtual) from VMware, a virtual copy of each of these machines can be created and hosted, potentially, on a single server. We’ll save vendor support and licensing for a later discussion. Having this type of lab environment is something that has been a traditional use of VM in enterprise IT but is drastically underused in the control system world. You can use your imagination for the possibilities that exist when you have a virtualized replica of your production system, but for this discussion let’s focus on how we can use it for testing. For those in the electric industry, VM has the potential to provide some pain relief for CIP-007 R1 and others.
We discussed the snapshot capability in an earlier post – this one small feature is a huge value in any testing process. Here are some more specific examples of test scenarios:
1.) Patch and upgrade testing
This is perhaps the most obvious use of a virtualized test environment. Integrating this into your change control process can yield the ability to test to a level that may not have been previously available because of the quick rollback functionality. I am aware that this type of testing has limitations but contend that at least 95% of potential issues can be identified in the VM environment, and any others subsequently identified in a development or backup environment. The patch management process, for example, might include testing in the VM environment, a subsequent test on development or backup servers, and finally, production server implementation.
2.) Simulation
Virtualized environments are also ideal for simulation of various types of events – this can be used for a variety of functions including operator training or load testing. In the case of load testing, a new VM can be introduced that simulates a large volume of SCADA protocol or other traffic.
3.) Security testing
Because it offers a certain degree of isolation, a virtualized environment affords security administrators the ability to be much more liberal with security testing. Even in development and backup environments, there is often resistance to vulnerability scanning, let alone a full bore penetration test. A virtualized environment can provide an “air-gapped” place to do this more comfortably, without fear of production problems.
These are just a few examples of testing scenarios. Virtualization is not a panacea and there are certainly things that cannot be tested well in a VM environment. That said, I believe that any negative trade-off is minimal compared to the advancements that are possible.
I’m not naïve about the security implications, either. Like many things, increased functionality can be a double-edged sword. But sometimes increased functionality can have a serendipitous benefit to security – such is the case with virtualization, in my opinion. Stay tuned for a future post dedicated to security issues as well as a discussion of how the recovery benefits of VM can be used in a production implementation.
Our last post in this series covered the benefits of virtualization for testing in a lab or development environment. Today we are going to address some of the same features but with a different twist – this time we’re talking about using VM in a production environment.
It doesn’t take much exposure to virtualization before you realize one of the primary benefits is the ease of recovery. With a little planning and the snapshot ability that is available in nearly all of the VM products, a last known good configuration is only clicks or keystrokes away if something goes awry. But that’s really just the beginning because virtualization provides recovery benefit beyond that. In an industry obsessed with reliability, I believe we are compelled to investigate further.
Because a virtualized system is essentially a file or small set of files, new opportunity exists for easily replicating those files to secondary servers and remote locations. Virtualization vendors are latching onto this ability and creating mechanisms to make it easy for administrators to configure.
In a traditional redundancy scenario, the hardware had to be precisely duplicated. With the hardware independence that VM affords, though, recovery processes can be much more flexible.
So let’s assume for a moment that we’ve overcome the obstacles and have implemented virtualization for at least a partial list of the servers in our production system. What might this look like from a recovery perspective? We could have two physical machines at our primary location. Each one could host redundant pairs of a handful of Linux and Windows virtual machines. This helps eliminate any single point of failure issues from a physical hardware standpoint and it allows us to maintain the logical failover ability built into most control systems. At our backup site, we could have a third physical machine that gets regular backups of the virtual machines from the primary site.
In this example, if we lose our primary site, we have a backup that does not require waiting for tape recovery. We enforce a change control process that requires taking a snapshot before any update (plus regularly scheduled snapshots) so when a database load corrupts the system, we can simply revert. If the system is attacked, we can preserve evidence in a way that was not possible before and still be back up and running in a very short period of time. This is just the beginning – use your imagination and I challenge you to come of with a scenario where having virtualized servers does not ease the recovery process.
In the IT realm, the recovery options in virtualized systems have revolutionized business continuation planning. I believe it will have equal impact on control system design and implementation at some point. Will there be “law of unintended consequences” repercussions? Yes, I’m sure there are many – that’s why I want to start the conversation now.
UPDATE – Dale’s Two Cents – Recovery is what I see as the biggest benefit to virtualization for both upgrade failures and catastrophic recovery.
As an asset owner, you wait until the vendor certifies a patch, you test it in the lab, and then apply it. There have been many cases where the patch worked fine in the lab, but failed under load or with additional features or functions that were unique to your production system. VMware or other virtualization is an extremely fast and effective way to rollback.
This goes beyond patching. It applies to all change control. Rollback should be part of any change control procedure and many control system application upgrades offer no rollback short of reinstalling the entire system. Creating a VM for rollback could be an effective part of any change control process.
I’ve blogged before about the over reliance on failover for recovery. This works fine for hardware failures but fails when a worm or other cyber attack takes out all of the systems. The outage time to rebuild a system that has not been rebuilt in years could be days and require vendor participation. A VMware snapshot would recover from a catastrophic failure quickly, in a matter of minutes after the affected systems are removed from the network.
Finally, virtualization may be a good solution for a tertiary control center (or backup if you can’t afford one today). You could put a realtime server, historian, and HMI all on one system.