DEF CON 33 Recon Village - CTI Agent Automated Battlecards from CTI Reports - Mohamed Nabeel
Description
Mohamed Nabeel presents CTI Agent, an agentic system that automates the collection and synthesis of cyber threat intelligence from natural language reports into structured battlecards. The system leverages LLM agents and graph-based expansion to identify and update indicators of compromise (IOCs) before they are widely reported.
Automating the Front Lines: Building CTI Agentic Systems
In the fast-paced world of cybersecurity, information is the ultimate currency. However, the sheer volume of Cyber Threat Intelligence (CTI) reports published daily by vendors, researchers, and journalists has created a paradox: we have more data than ever, but less actionable intelligence. Most of these reports are trapped in 'unstructured' formats—PDFs, blog posts, and social media threads. By the time a human analyst extracts the Tactics, Techniques, and Procedures (TTPs) and updates a firewall with the listed Indicators of Compromise (IOCs), the threat actor has often already rotated their infrastructure. At DEF CON 33 Recon Village, Mohamed Nabeel presented a groundbreaking solution: the CTI Agent.
The Problem with Traditional CTI
Traditional CTI consumption is plagued by three main issues. First, the 'Natural Language Barrier' makes automation difficult; LLMs are needed to understand the nuance of human writing. Second, TTPs are often implicit. A report might describe a breach without explicitly tagging it with a MITRE ATT&CK ID, requiring analysts to possess deep domain expertise to 'read between the lines.' Third, and perhaps most critically, IOCs have a short shelf life. The gap between a campaign discovery and a blog post's publication means the IPs and domains listed are frequently 'dead' by the time they reach the public.
Architectural Deep Dive: The CTI Agent Workflow
Nabeel's CTI Agent isn't just a simple script; it's a multi-stage pipeline that combines unsupervised learning, agentic reasoning, and graph theory.
1. Strategic Clustering
Before an agent can build a profile (or 'battlecard') for an actor like Lazarus or Kimsuky, it must first aggregate all relevant reports. Nabeel discovered that standard document embeddings often fail because reports about different actors might use similar technical language. The solution? Use an LLM to summarize each report first, then embed and cluster the summaries. This 'Summary Embedding' approach, particularly when using the Gemini model, creates much cleaner clusters with higher silhouette scores, ensuring that the data fed into the agent is relevant to a specific threat actor.
2. The ReAct Agentic Pattern
The heart of the system is the 'Agent.' Unlike traditional automation that follows a linear path, an agentic system operates in a loop. Nabeel utilized the ReAct (Reason + Act) pattern. The process looks like this:
- Think: The LLM evaluates the current state and identifies missing information (e.g., 'I have the malware names, but I don't know which CVEs were exploited').
- Act: The LLM selects a tool, such as a Search Tool or an Extractor Tool.
- Observe: The LLM analyzes the output of that tool and updates its internal state.
To solve the problem of LLM output limits, the system uses a MapReduce strategy. One agent creates a battlecard for each individual report in parallel. Then, a 'Synthesis Agent' merges these cards into a single, comprehensive master battlecard. In a fascinating display of emergent behavior, Nabeel noted that the synthesis agent often writes its own Python code to merge the JSON structures to ensure data integrity.
3. Beating the Attacker with Graph Expansion
To solve the 'stale IOC' problem, the CTI Agent goes beyond the report. It takes the known domains and IPs from the battlecard and performs graph-based expansion. By looking for shared SSL certificates, common hosting providers, and similar registration patterns, the system builds a 'neighborhood' of related infrastructure. Using graph embeddings and clustering, the system can identify newly registered domains that haven't been seen in the wild yet but belong to the same attacker infrastructure. This allows defenders to block domains before they are used in a new campaign.
Practical Implementation for Blue Teams
For organizations looking to implement similar systems, the battlecard format is key. A structured battlecard should contain:
- Threat Actor Identity: Aliases, suspected location, and motivation.
- Campaign History: Start/end dates and targeted industries.
- Technical Artifacts: IOCs, malware binaries (MD5/SHA256), and specific tools.
- TTPs: Mapped directly to the MITRE ATT&CK framework.
By converting natural language reports into this JSON-structured format, security teams can automate the 'enforcement' layer—automatically pushing new TTP blocks to EDRs and updating SIEM correlation rules without manual intervention.
Mitigation and Defense Strategies
The primary takeaway for defenders is the shift from reactive to proactive detection. While the CTI Agent automates the 'hunt,' the mitigation strategy remains grounded in robust fundamentals:
- Automated Ingestion: Use tools to pull structured data into your TIP (Threat Intelligence Platform) immediately.
- Infrastructure Tracking: Don't just block a domain; pivot on its certificate to find its siblings.
- TTP-Based Hunting: Instead of just looking for IOCs (which change), look for the behaviors (TTPs) extracted by the agent, such as specific 'Living off the Land' techniques described in the reports.
Conclusion
The CTI Agent represents a significant leap forward in how we process threat intelligence. By offloading the 'reading and synthesizing' to LLM agents and the 'infrastructure discovery' to graph-based algorithms, human analysts are freed to focus on high-level strategy and incident response. As these models become multimodal, the next frontier will be the automated analysis of complex attack chain diagrams and video-based leak site evidence, further shrinking the window of opportunity for threat actors.
AI Summary
In this presentation from DEF CON 33 Recon Village, Mohamed Nabeel, a cybersecurity veteran from Palo Alto Networks, introduces 'CTI Agent,' an innovative agentic system designed to bridge the gap between unstructured threat intelligence reports and actionable security data. The primary challenge Nabeel addresses is the inherent difficulty in consuming public CTI reports, which are typically written in natural language, contain implicit Tactics, Techniques, and Procedures (TTPs), and often include outdated Indicators of Compromise (IOCs) due to the delay in publication cycles. The CTI Agent workflow consists of four primary stages. First, the system collects reports from trusted sources, including security vendors, investigative journalists, and reputable GitHub or social media repositories. Second, it utilizes unsupervised machine learning to cluster these reports by threat actor. Nabeel reveals a key technical finding: clustering based on LLM-generated summaries of reports significantly outperforms clustering based on raw document embeddings. Using the Gemini embedding model, the system achieved high silhouette scores, indicating distinct and accurate groupings of threat actors like Gamaredon, Kimsuky, and Scattered Spider. The third stage involves the core 'agentic' process. Using the ReAct (Reason + Act) pattern, the LLM agent uses custom-built tools—including a Report Collector, a Search Tool, and an Extractor Tool—to synthesize structured 'battlecards.' These battlecards include the threat actor summary, aliases, targeted industries, campaign timelines, IOCs, TTPs (mapped to the MITRE ATT&CK framework), and specific tools or malware used. To handle the output token limitations of LLMs while processing massive datasets, Nabeel implemented a MapReduce-inspired approach, where individual agents process specific reports before a final synthesis LLM combines them into a master battlecard. Interestingly, the synthesis LLM often autonomously generates code to merge these data structures deterministically. Finally, the system addresses the 'stale IOC' problem through graph-based expansion. By taking seed IOCs from the battlecards and analyzing infrastructure reuse patterns (such as shared IP addresses, certificates, or phishing kits), the system identifies new, un-indexed malicious domains. This proactive approach was demonstrated against actors like Lazarus and Revolver Rabbit, uncovering infrastructure that had minimal detection on platforms like VirusTotal at the time of analysis. The presentation concludes by highlighting the necessity of human-in-the-loop validation and the future potential for multimodal analysis of charts and attack chain diagrams.
More from this Playlist

DEF CON 33 Recon Village - Mapping the Shadow War From Estonia to Ukraine - Evgueni Erchov

DEF CON 33 Recon Village - How to Become One of Them: Deep Cover Ops - Sean Jones, Kaloyan Ivanov

DEF CON 33 Recon Village - Building Local Knowledge Graphs for OSINT - Donald Pellegrino

DEF CON 33 Recon Village - A Playbook for Integration Servers - Ryan Bonner, Guðmundur Karlsson
