DEF CON2025

Contextualizing and Correlating Security Events at Scale

DEFCONConference650 views63:456 months ago

This talk demonstrates a methodology for reducing false positives and improving incident response by grouping and deduplicating security logs using contextual enrichment and causal analysis. The speaker presents an approach to mapping disparate security events to the MITRE ATT&CK framework to form coherent attack stories rather than relying on isolated, noisy alerts. The method emphasizes using semi-supervised machine learning and lookup tables to provide business context, which significantly reduces data volume and improves investigation efficiency. The speaker also releases an open-source tool, the Attack Flow Detector, to implement these concepts.

Stop Chasing Ghosts: How to Build Causal Attack Stories from Noisy Logs

TLDR: Security teams are drowning in high-volume, low-fidelity alerts that lack context, leading to massive alert fatigue and missed compromises. By shifting from atomic alert detection to causal grouping and deduplication, you can reduce data volume by over 90% while simultaneously increasing investigation speed. This post breaks down how to use contextual enrichment and sequence modeling to turn fragmented logs into actionable attack stories.

Security operations centers are currently failing because they are fighting the wrong war. Most teams spend their time tuning atomic detection rules for individual techniques, like T1566-phishing or T1059-command-and-scripting-interpreter, only to be buried under a mountain of false positives. When an actual breach occurs, the signal is lost in the noise of thousands of daily events. The industry standard has become a race to ingest more data, paying massive premiums to vendors for storage and compute, while the actual ability to detect a multi-stage attack remains stagnant.

The Problem with Atomic Detection

Detection engineering is often treated as a search for a specific needle in a haystack. You write a rule, it triggers, and you investigate. But modern attackers do not operate in a vacuum. They move laterally, escalate privileges, and establish persistence. Each of these steps generates a different log entry in a different system. If your SIEM treats these as isolated incidents, you are not seeing an attack; you are seeing a series of disconnected, noisy events.

The real issue is that most detection logic is built on the assumption that an event is either "good" or "bad." In reality, most events are context-dependent. A PowerShell execution is benign in a dev environment but critical on a domain controller. Without business context—who is the user, what is the asset, what is the relationship between them—you are just guessing.

Moving to Causal Grouping

Instead of trying to find the perfect rule, you should focus on grouping related events. The goal is to move from "this alert fired" to "this user performed these five actions in this specific sequence." This is where causal analysis comes in.

By using MITRE ATT&CK as a common language, you can map disparate events to specific tactics. When you have a group of events that share a common entity—like a source IP, a username, or a process ID—and they occur within a logical temporal window, you can collapse them into a single "story."

This is not just about cleaning up the dashboard. It is about cost. Most SIEMs charge based on ingestion volume. If you can deduplicate redundant logs at the edge—before they hit your expensive storage—you save millions. The Attack Flow Detector is a tool designed to do exactly this. It takes raw, noisy data and uses semi-supervised clustering to identify sequences that represent actual adversary behavior.

Technical Implementation at Scale

To implement this, you need to stop relying on simple regex-based rules. You need a data pipeline that performs three distinct steps:

Enrichment: Every event must be tagged with its corresponding ATT&CK technique. If you don't know what a log entry represents in the context of an attack, it is useless.
Contextual Grouping: Use a clustering algorithm to find events that are highly related. Do not just look for the same IP; look for the same user context across different log sources.
Causal Chaining: This is the most critical step. You need to model the attack lifecycle. If an event maps to "Initial Access" and is followed by an event mapping to "Execution," you have a causal link.

The Attack Flow Detector uses this logic to build a graph of events. In a demo, you can see how thousands of alerts are reduced to a handful of coherent attack flows. The tool uses a lookup table approach to map entities—like mapping a specific machine ID to a business unit—which allows the model to understand that a lateral movement attempt from a workstation to a server is more suspicious than a standard admin login.

Why Pentesters Should Care

If you are a pentester or a bug bounty hunter, you need to understand how your activity is being aggregated. When you run a red team engagement, you are generating a trail of breadcrumbs. If you are testing against a mature blue team, they are likely using some form of this causal grouping. They aren't looking for your specific exploit payload; they are looking for the "story" of your movement through the network.

If you want to stay under the radar, you need to understand the "cost" of your actions. Every time you touch a new system, you are adding a new node to their attack graph. The more you can blend your activity into the "normal" noise of the environment, the harder it is for their clustering algorithms to group your actions into a single, high-fidelity alert.

The Future of Detection

We need to stop paying for the privilege of storing garbage. The industry is obsessed with "robust security postures," but we are ignoring the fact that our data pipelines are inefficient. By implementing causal grouping, you stop treating every log as a precious snowflake and start treating them as components of a larger narrative.

Stop writing more rules. Start building better stories. If you can explain the "why" behind an alert, you have already won half the battle. The next time you are in a SOC meeting and someone suggests adding another detection rule, ask them how that rule fits into the causal chain of an existing attack story. If it doesn't, it’s just more noise.

Talk Type

tool demo

Difficulty

intermediate