Security BSides2025

The Bitter Lesson for SOCs: Let Machines Be Machines

Security BSides San Francisco3,188 views44:5210 months ago

This talk demonstrates a methodology for building AI-assisted detection and response systems by leveraging LLMs as co-creators rather than black-box solutions. It focuses on integrating LLMs into existing security operations center (SOC) workflows to automate alert triage, investigation, and report generation. The speakers highlight the importance of using open-standard detection rules like Sigma and maintaining full transparency in the AI's reasoning process through investigation transcripts. The presentation showcases a prototype tool, CLUE (Claude Links Useful Evidence), which automates the investigation of security alerts.

Beyond the Hype: Building AI-Assisted Detection with CLUE

TLDR: Security teams are drowning in alerts, and traditional SOAR playbooks often fail to handle novel, complex attack patterns. This research introduces CLUE, a framework that treats LLMs as collaborative investigators rather than black-box decision engines. By integrating LLMs with external tools and maintaining full transparency through investigation transcripts, teams can automate triage while keeping human analysts in the loop.

Security operations centers have spent the last decade chasing the dream of "lights-out" automation. We bought the SOAR platforms, we wrote the playbooks, and we spent thousands of hours debugging regex-based alerts that break the moment an attacker changes a single parameter. The reality is that most automated detection systems are brittle. They work for the known-knowns, but they crumble when faced with the creative, non-linear movements of a real adversary.

The industry is currently obsessed with shoving LLMs into every possible security workflow, usually by treating them as magic boxes that ingest logs and output a "malicious" or "benign" verdict. This is a mistake. If you don't know how your model reached a conclusion, you aren't doing security; you’re just guessing with more expensive infrastructure.

The Problem with Black-Box ML

Vendors love to sell "AI-powered" detection as a finished product. You feed it data, it gives you a score, and you hope it’s right. The problem is that your environment is unique. A model trained on generic telemetry will inevitably fail when it encounters your specific naming conventions, your custom cloud architecture, or your unique user behaviors. When these models fall over, you’re left with no visibility into their reasoning, making it impossible to tune them effectively.

Instead of trying to encode every possible human investigative step into a rigid, static playbook, we need to shift toward a model where the AI acts as a co-creator. This means giving the model the ability to interact with the outside world through tools, allowing it to perform the same actions a human analyst would take during a manual investigation.

CLUE: A Framework for Collaborative Investigation

The CLUE (Claude Links Useful Evidence) prototype demonstrates this shift. Rather than asking an LLM to simply classify an alert, the system provides the model with a set of tools—such as the ability to query a data lake, check GitHub repositories, or pull logs from Slack—and asks it to build an investigative plan.

The mechanical process is straightforward:

Detection: An alert triggers, and the raw data is passed to the model.
Planning: The model proposes an investigative path based on the alert type and available tools.
Execution: The model calls specific functions to gather evidence.
Synthesis: The model compiles the findings into a structured triage report.

Crucially, the system maintains a full investigation transcript. This is the most important part of the architecture. If the model makes a mistake, you don't just see a wrong verdict; you see the exact query it ran, the data it received, and the logic it used to interpret that data. This transparency allows for rapid iteration. If the model is hallucinating or misinterpreting a specific log format, you can adjust the prompt or the tool definition, just as you would debug a script.

Practical Implementation and Tooling

For those looking to build similar systems, the focus should be on using open standards. Relying on proprietary, vendor-locked formats is a trap. Using Sigma for detection rules ensures that your logic remains portable and auditable. When it comes to the LLM’s ability to interact with your environment, the Model Context Protocol is the current gold standard for connecting models to data sources like Ghidra or internal databases.

The goal isn't to replace the analyst. It’s to handle the "noise" of low-confidence signals that would otherwise be ignored. In a typical environment, non-firing detections—those that don't quite meet the threshold for a high-severity alert—can outnumber actual alerts by a factor of 100 to 1. An AI-assisted system can batch these signals, perform meta-analysis, and present a summary to a human, effectively turning a massive pile of logs into a single, actionable insight.

The Defensive Reality

Defenders should stop looking for a single "AI" button that solves their detection gaps. The real value lies in building a modular system where you can swap out models as they improve, while keeping your tool definitions and data connectors stable. If you are currently relying on a vendor’s proprietary ML, start asking for the "transcript." If they can't show you exactly how the model arrived at a specific conclusion, you are operating with a massive blind spot.

Start small. Don't try to automate the entire incident response lifecycle on day one. Pick one repetitive, high-volume task—like verifying if a specific user activity is authorized—and build a tool-enabled model to handle it. Once you have a reliable, transparent process for that one task, you can expand. The future of detection isn't about building a smarter machine; it’s about building a better way for machines to work alongside the humans who actually understand the environment.

Talk Type

talk

Difficulty

intermediate