Attack Emulation

Attack Emulation

Emulation of APT attacks on ICS infrastructure

Hunting Invisible Threats: Building Better Datasets for Industrial Cybersecurity

The Problem: When Critical Infrastructure Gets Hacked

Critical national infrastructure such as power grids, water treatment plants, oil refineries, chemical facilities, transportation systems forms the backbone of modern society. These systems keep the lights on, ensure clean water flows from our taps, and maintain the supply chains that deliver everything from food to medicine. Governments recognize their importance: in most countries, CNI sectors are designated as strategic assets requiring special protection because their disruption would have cascading effects on public safety, economic stability, and national security.

Yet these systems face a growing threat from sophisticated cyberattacks. Unlike traditional cybercrime targeting financial data or personal information, attacks on industrial infrastructure can cause physical damage and endanger lives. The 2015 Ukraine power grid attack left 200,000+ people without electricity in winter for several hours. The 2021 Colonial Pipeline ransomware incident triggered fuel shortages across the US East Coast. More concerning are attacks like Stuxnet, which physically destroyed Iranian nuclear centrifuges by tampering with industrial controllers, or TRITON, which targeted safety systems at a Saudi petrochemical plant, systems designed to prevent catastrophic explosions.

These weren't conventional hacks. They were cyberattacks that crossed from the digital world into the physical one, manipulating the industrial control systems (ICS) that directly operate critical equipment. For governments and operators of critical infrastructure, this represents a fundamental security challenge: traditional cybersecurity approaches designed for IT networks don't translate well to industrial environments where availability and safety take precedence over everything else, and where operational technology has been running for decades, often without modern security controls.

The challenge? We're fighting these sophisticated threats with inadequate tools. Most cybersecurity research focuses on traditional IT networks—your office computers, servers, and databases. But industrial environments operate differently. They have three distinct layers: enterprise IT networks, operational technology (OT) that monitors industrial processes, and programmable logic controllers (PLCs)—specialized computers that directly control physical equipment like pumps, valves, and motors.

When attackers target industrial facilities, they move through all three layers. They might start with a phishing email to an office worker (IT), pivot to an engineering workstation (OT), and ultimately reprogram a PLC to cause physical damage. Understanding this complete attack chain is crucial, but here's the catch: we don't have good datasets that capture how attacks flow across these boundaries.

The Research: Creating Attack Scenarios That Tell the Whole Story

This thesis tackles a specific gap: generating high-fidelity datasets that show the complete picture of how industrial cyberattacks unfold from start to finish.

What Makes This Different?

Most existing research falls into two camps:

  • IT-focused datasets that track attack progression through enterprise networks but stop before reaching industrial systems
  • ICS-specific research that looks at individual attack techniques but lacks the complete chain of events

This project bridges that gap by simulating realistic multi-stage attacks in a controlled test-bed environment that spans all three layers—IT, OT, and PLCs.

The Technical Approach

The methodology involves three key components:

1. Realistic Test-bed Architecture
Building a miniature industrial facility using the Purdue Model (an industry-standard network architecture). This includes virtual machines for enterprise systems and engineering workstations, connected to actual Siemens PLCs and industrial equipment. Think of it as a scaled-down factory where attacks can be safely executed and studied.

2. Automated Attack Simulation
Using CALDERA—an adversary emulation framework developed by MITRE—to systematically execute attack techniques based on real-world malware like Stuxnet, Industroyer, and TRITON. Rather than manually running attacks, CALDERA automates the execution of standardized tactics, techniques, and procedures (TTPs), ensuring reproducibility.

3. Comprehensive Data Collection
This is where provenance data becomes critical. Provenance tracks the relationships between system events—which process created which file, which network connection led to which code execution. By collecting provenance data at the system level (using tools like Sysmon), network level (packet captures), and application level (PLC communications), the research captures not just what happened, but how events connected across the entire attack chain.

Why This Matters

For Researchers: The datasets generated become benchmarks for testing new detection systems. Think of them like standardized test sets in machine learning—if you've developed a new intrusion detection algorithm, you need quality datasets to evaluate whether it actually works.

For Industry: Understanding complete attack patterns helps organizations better defend critical infrastructure. When you can see how attackers traverse from IT to OT to PLCs, you can design better network segmentation, monitoring strategies, and response procedures.

For Detection Systems: Many intrusion detection systems struggle with industrial environments because they generate false alarms or miss sophisticated multi-stage attacks. Provenance-based approaches that understand relationships between events (not just individual anomalies) show promise, but they need quality training data—which this research aims to provide.

The Path Forward

The experimental phase involves executing 2-3 distinct attack scenarios over 10 weeks, each representing different attack methodologies:

  • Protocol manipulation attacks (mimicking Industroyer's approach to power grid systems)
  • Direct PLC logic manipulation (inspired by Stuxnet and TRITON)
  • Reconnaissance and data exfiltration campaigns (similar to Havex/Dragonfly operations)

Each scenario generates datasets showing exactly what normal operations look like versus what happens during an attack. These datasets then can be feed into provenance models that learn to detect anomalous patterns spanning IT>OT>PLC boundaries.

Conclusion

Industrial cybersecurity faces a fundamental challenge: attacks that cross between IT networks, operational technology, and physical control systems require datasets that capture this entire journey. By systematically simulating realistic attack scenarios in a controlled testbed and collecting comprehensive provenance data, this research contributes building blocks for the next generation of industrial intrusion detection systems.

The goal isn't just academic—it's about creating practical tools and knowledge that help defend the critical infrastructure we all depend on.


This research is being conducted at the University of Glasgow's Cyber Defence Lab. Under supervision from Dr. Marco Cook

Contact.

LET'S WORK

TOGETHER