OT for IT Sec People
This introduction tries to explain for IT security people why OT security looks the way it often looks.
Work in Progress
It should link most of the final top 10 items, as they should be related.. and should provide an IT-centric storyline how all of those Top 10 items 'play together'
The OT World
All professional communities have their own work culture and mindset, and people only working in IT security may not even be aware of the cultural "waters" in which they "swim". OT is a separate culture and things work rather differently.
When IT security people look at OT scenarios, some miscomprehensions often occur, sometimes leading to suggestions like "why don't you just update components?", "why don't you replace these insecure protocols?" or, "just buy new hardware".
Sadly, the situation and requirements within the OT world are a bit different to typical IT requirements. Understanding those differences helps to understand how some of our OT Top 10 came to be. Some of the conceptional issues in the OT world are:
-
Long lifetime of products: decades rather than years. Remember computer security thirty years ago? This is our current security level in some OT setups.
-
Hard to Update: it's rather hard to update a powerplant. To do this you might need a spare power-plant to bridge the loss of power generated by the currently being-updated power plant. In other domains, e.g., life-sustaining medical devices, the update might not even be possible without endangering users.
-
Lack of Integrity-Protection or encrypted Protocols: due to the long lifetimes, many OT protocols neither support encryption nor integrity protection. Within some scenarios, e.g., on a manufactoring floor, encryption is actually not a wanted feature as it makes network monitoring more complex.
These are realities within the field with which we have to deal. Very often, we do not work in greenfield projects but have to adapt existing systems while improving security. In addition, we have security problems such as vendors not providing security updates or devices lacking proper access control semantics.
When you look into recent news, OT security incidents are on the rise. While it would be preferable to have a perfect secure solution, sadly we're currently only catching up and thus any (substantial) security improvement is welcome.
Devices must be protected
As shown before, in the OT world we often have to deal with potentially inherently insecure devices that might not be updated or upgraded too.
Typically this issue is mitigated by accepting some devices to be insecure and preventing attackers from accessing them in the first place: if we cannot make the devices themselves secure, we try to prevent attackers from accessing them in the first place.
We depend upon Mitigations
This is typically done through mitigations such as network separation or through adding physical security controls. Begin able to 'lock away' devices and their network also helps with devices that can only use network protocols without integrity or confidenciality protections.
These remediations are not a "get out of jail free" card and impose limitations as well as maintenance burden.
Please note that we would prefer to have secure devices (or even zero-trust enabled devices) but until then, often those mitigations are the only way of keeping critical infrastructure operational.
As risk is typically calculated by "Risk = Likelyhood x Impact" and this would reduce the Likelyhood to zero, we can close the case, right? Sadly, NO.
What happens if Mitigations fail?
What could go wrong with our mitigations? Just a couple of examples:
add incidents from the top 10 list
- human error
- stuxnet/USB disks
- missing network segmentation between factory floor and office networks
- problems with physical security, people being able to get on the factory floor and manipulating control systems
- etc.
The overall architecture of "protecting insecure devices by preventing attackers from accessing them" breaks down if attackers find a way to bypass these protections.
If this happens, we typically run into reoccuring major problems:
-
The insecure devices are, well, insecure and easy to exploit. Communication between devices can be intercepted and altered due to lack of encryption or integrity protection
-
Large blast zones: attackers that are able to bypass security mechanisms and are able to compromise devices, can use those devices to further propagate and privot into other 'secured' networks and devices.
Fragility is a concern so considering downstream implications if such mitigations fail is important. Monitoring to detect compromise and/or defense in depth may be needed in addition.
How do we react to an incident?
Compared to well-maintained traditional IT systems, OT systems often also display a lack of recoverability. Very often, compromised companies are not able to react to or recover from incidents.
add concrete examples of not being able to recover, also why can't we just reboot everything from scratch?
This is mostly due to:
- missing backups/disaster recovery
- Missing configuration backups for OT-Devices
- Undefined processes for alert reporting/handling
Fields of Compromise
We hope that this introduction explains why we often have to compromise to get some security done. In legacy settings, this mostly means depending upon mitigations such as network seggretation or physical access management. Hopefully, newer systems will alleviate these problems, but we still have to deal with legacy systems to keep existing critical infrastructure running.
Compromise has another meaning too: often we over-rely on mitigations and if those are not uphold, our systems become compromised. If there are not additional defense-in-depth measures in place, the potential fallout is immense as further compromise-able systems are often within the blast radius. This makes additional hardening even more important than within traditional IT systems.
Getting critical infrastructure secure will be quite a journey. But that is no excuse for not starting this journey right now.