OT for IT Security People

This introduction explains why OT security looks the way it often looks, specifically for IT security professionals who may be encountering operational technology environments for the first time.

Understanding the Cultural Divide

All professional communities have their own work culture and mindset, and people only working in IT security may not even be aware of the cultural "waters" in which they "swim". OT is a separate culture and things work rather differently than in traditional IT environments.

When IT security people look at OT scenarios, some misconceptions often occur, sometimes leading to well-intentioned but impractical suggestions like:

"Why don't you just update components?"

"Why don't you replace these insecure protocols?"

"Just buy new hardware!"

While these solutions work perfectly in IT environments, the situation and requirements within the OT world are fundamentally different from typical IT requirements. Understanding those differences helps to understand how many OT security challenges came to be.

The Reality of OT Systems

Some of the fundamental conceptual differences in the OT world include:

Decades, Not Years

Long lifetime of products: decades rather than years of operational life
Think about this: Remember computer security thirty years ago? Limited encryption, basic authentication, minimal network protections? This is our current security level in some OT setups today.

The Update Paradox

Hard to Update: it's rather hard to update a power plant. To do this you might need a spare power plant to bridge the loss of power generated by the currently-being-updated facility
In other critical domains, such as life-sustaining medical devices, updates might not even be possible without directly endangering users
Every downtime must be meticulously planned in advance, unlike IT systems that can be patched and rebooted quickly

Protocol Limitations by Design

Lack of Integrity-Protection or encrypted Protocols: due to the long lifetimes, many OT protocols neither support encryption nor integrity protection
Within manufacturing environments, encryption is actually not a desired feature as it makes network monitoring and troubleshooting significantly more complex
When a production line goes down, operators need immediate visibility into what's happening

These are operational realities within the field with which we have to deal. Very often, we do not work in greenfield projects but have to adapt existing systems while improving security. The alternative - replacing all legacy OT systems immediately - is neither economically feasible nor operationally safe.

Additional challenges include vendors not providing security updates for legacy systems and devices lacking proper access control mechanisms by original design.

When you look into recent news, OT security incidents are on the rise. While it would be preferable to have a perfectly secure solution, sadly we're currently only catching up and thus any substantial security improvement is welcome.

The Mitigation-First Approach

As shown above, in the OT world we often have to deal with potentially inherently insecure devices that might not be updated or upgraded. The fundamental approach becomes:

If we cannot make the devices themselves secure, we try to prevent attackers from accessing them in the first place.

This is typically accomplished through mitigations such as:

Network Separation

Air-gapped networks where possible
VLANs and network segmentation
Strict firewall rules between IT and OT networks

Physical Security Controls

Locked control rooms and equipment cabinets
Badge access systems and restricted access to industrial floors
Being able to 'lock away' devices and their networks helps compensate for protocols without integrity or confidentiality protections

These remediations are not a "get out of jail free" card and impose limitations as well as maintenance burden. Please note that we would prefer to have secure devices (or even zero-trust enabled devices) but until then, often those mitigations are the only way of keeping critical infrastructure operational.

The Risk Calculation Problem

As risk is typically calculated by "Risk = Likelihood × Impact" and these mitigations would theoretically reduce the likelihood to zero, we should be able to close the case, right?

Sadly, NO. This is where the fragility of the approach becomes apparent.

When Mitigations Fail

The overall architecture of "protecting insecure devices by preventing attackers from accessing them" has a critical weakness: it breaks down completely if attackers find a way to bypass these protections.

What could go wrong with our mitigations? Consider these real-world scenarios:

The Insider Threat

Your system depends upon no one introducing malware into an internal network, but users attach infected USB sticks
Stuxnet case study: Even air-gapped systems proved vulnerable when the attack vector came from within (stuxnet)

Network Segmentation Failures

Missing network segmentation between factory floor and office networks leads to numerous incidents
Many reported "OT incidents" are actually ransomware attacks that originated in attached IT infrastructure and spread due to poor network boundaries

Physical Security Breaches

Problems with physical security allow unauthorized personnel to access factory floors and directly manipulate control systems
Unlike IT systems protected by multiple authentication layers, OT devices often assume anyone with physical access is authorized

Management Interface Vulnerabilities

IT professionals use BMC for out-of-band administration
Similar remote management capabilities in OT networks can provide attackers with direct pathways from IT to OT networks, bypassing traditional security controls (if similar things are used within OT networks)

The Cascade Effect

If attackers successfully compromise the perimeter, we typically encounter recurring major problems:

Inherent Device Vulnerabilities

The insecure devices are, by design, easy to exploit
Communication between devices can be intercepted and altered due to lack of encryption or integrity protection
Default credentials and weak authentication are common

Large Blast Zones

Attackers can use compromised devices to further propagate and pivot into other 'secured' networks and devices
Trust relationships between systems assume all connected devices are legitimate
Network protocols designed for operational efficiency facilitate rapid compromise propagation

Operational Impact

Unlike IT systems, OT compromises can affect physical processes
Safety systems may be targeted or inadvertently affected
Production downtime has immediate financial and potentially safety implications

Fragility is a major concern, so considering downstream implications when mitigations fail is critical. Monitoring to detect compromise and defense in depth become essential additional layers.

Incident Response Challenges

Compared to well-maintained traditional IT systems, OT systems often display a severe lack of recoverability. Very often, compromised organizations are not able to effectively react to or recover from incidents.

This limitation stems from several factors:

Backup and Recovery Gaps

Missing backups/disaster recovery plans for critical control systems
Missing configuration backups for OT devices, making restoration difficult or impossible
Undefined processes for alert reporting/handling
Limited testing of recovery procedures due to operational constraints

The "Reboot" Problem

Traditional IT incident response relies on the ability to "easily" reboot systems, reimage machines, and restore from known-good backups. This assumption breaks down in OT environments.

You can reboot your computer or server, but rebooting a whole factory or connected power station requires:

Coordinated shutdown procedures
Safety system verification
Gradual restart processes
Extensive testing before returning to full operation

Example: A chemical plant cannot simply "restart" after a security incident. Each system must be carefully verified, safety interlocks tested, and processes gradually brought back online to prevent equipment damage or safety hazards.

The Path Forward

We hope this introduction explains why we often have to make compromises to achieve meaningful security improvements in OT environments. In legacy settings, this mostly means depending upon mitigations such as network segmentation and physical access management.

While newer systems will eventually alleviate many of these problems, we still have to deal with legacy systems to keep existing critical infrastructure running. The alternative - replacing all legacy OT systems immediately - is neither economically feasible nor operationally safe.

The Double Meaning of Compromise: Compromise has another meaning in this context. Often we over-rely on mitigations and when those are not upheld, our systems become compromised. If there are no additional defense-in-depth measures in place, the potential fallout is immense as further compromisable systems are often within the blast radius. This makes additional hardening even more important than in traditional IT systems.

Key Takeaway for IT Security Professionals: Understanding these unique constraints is the first step toward contributing meaningfully to OT security solutions that work within operational reality rather than against it.

Getting critical infrastructure secure will be quite a journey. But that is no excuse for not starting this journey right now.

The path to OT security requires bridging two worlds - the methodical, safety-focused culture of operations with the adaptive, threat-focused mindset of cybersecurity.