DEF CON2024

Threat Modeling in the Age of AI

DEFCONConference1,022 views26:25over 1 year ago

This talk explores the application of traditional threat modeling methodologies to artificial intelligence and large language model (LLM) systems. It highlights the security risks associated with LLM integration, including data leakage, insecure output handling, and model theft. The speaker emphasizes the necessity of adapting threat modeling to identify boundaries and mitigate risks in AI-driven development pipelines. The presentation also critiques the current OWASP Top 10 for LLM Applications and advocates for more specific, actionable security testing.

Why Your LLM Threat Model Is Probably Missing the Point

TLDR: Most current threat models for Large Language Models (LLMs) focus on generic risks that apply to any software, ignoring the unique architectural vulnerabilities of AI pipelines. By failing to map out specific data flows and trust boundaries, teams are leaving critical gaps in their security coverage. This post breaks down how to move beyond generic checklists and apply rigorous threat modeling to identify actual, exploitable weaknesses in your AI-driven applications.

Threat modeling in the age of AI has become a game of buzzword bingo. Everyone is talking about the "AI Apocalypse" or "existential risk," but when you sit down to actually test a production application, those high-level fears don't help you find a single bug. If you are a pentester or a researcher, you know the drill: you need to find the specific, mechanical failure points in the architecture. The reality is that most organizations are treating LLMs like black boxes, assuming that if they just slap a prompt filter on the front, they are secure. That is a dangerous assumption.

The Problem with Generic Checklists

The current OWASP Top 10 for LLM Applications is a good starting point, but it is not a substitute for a real threat model. Many of the categories listed, such as insecure output handling or broken access control, are standard software security issues. While they are critical, they don't capture the unique ways an LLM interacts with your data and your infrastructure.

When you treat an LLM as just another API endpoint, you miss the nuances of how the model processes context, how it handles system instructions, and where it pulls its training data. A real threat model for an AI system must map the data flow from the user input, through the prompt engineering layer, into the model, and finally to the downstream systems that execute commands or return data. If you don't know where your trust boundaries are, you cannot effectively test for injection or data leakage.

Mapping the Trust Boundaries

To build a useful threat model, you need to stop looking at the LLM as a single entity and start looking at it as a series of interconnected components. You have the model itself, the vector database, the application logic, and the user interface. Each of these has its own set of risks.

For example, consider the risk of Prompt Injection. This isn't just about tricking the model into saying something it shouldn't. It is about manipulating the model into executing unauthorized actions in your backend. If your LLM has access to tools or APIs, a successful injection attack is essentially a remote code execution vulnerability. You need to ask: what happens if the model receives a malicious instruction that overrides the system prompt? Does it have the permissions to query your database? Can it trigger an API call that deletes user data?

Why You Should Stop Using Generic Models

Many teams are trying to use generic threat modeling frameworks like STRIDE without adapting them to the AI context. While the principles of Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege are still relevant, the implementation is entirely different.

In an AI system, "Information Disclosure" isn't just about a database dump. It is about the model leaking sensitive information from its training data or its context window. "Denial of Service" isn't just about flooding a server with requests. It is about crafting inputs that force the model to consume excessive tokens, driving up costs or causing latency that degrades the user experience. You need to be specific. If you are testing an application that uses an LLM to summarize internal documents, your threat model should focus on the risk of the model inadvertently revealing PII or trade secrets that were included in the context window.

Practical Steps for Your Next Engagement

If you are on a red team engagement or a bug bounty hunt, start by mapping the data flow. Where does the user input go? What system instructions are appended to that input? Does the model have access to external tools? Once you have that map, look for the gaps.

One of the most effective ways to test these systems is to write specific test cases for the LLM's behavior. Don't just throw random strings at it. Create a set of test prompts that attempt to bypass the system instructions and force the model to perform actions it shouldn't. If you are testing an application that uses an LLM to generate code, test it with inputs that attempt to inject malicious libraries or insecure patterns.

Also, pay close attention to the training data. If the application allows users to upload documents that are then used to fine-tune the model or are included in the RAG (Retrieval-Augmented Generation) pipeline, you have a massive attack surface. Training Data Poisoning is a real threat, and it is often overlooked in standard security assessments.

The Future of AI Security

We are still in the early days of securing AI systems, and the tools we have today are far from perfect. The Elevation of Privilege game is a great way to get your team thinking about threats, but it needs to be adapted for the AI era. We need to develop better ways to test these systems, more specific threat modeling frameworks, and a deeper understanding of the underlying mechanics of LLMs.

Do not wait for a vendor to tell you how to secure your AI. Start building your own threat models, start testing your own assumptions, and start sharing what you find. The only way we are going to get better at this is by being honest about the limitations of our current approaches and by pushing each other to think more critically about the systems we are building and breaking. If you are not already doing this, you are already behind.

Talk Type

talk

Difficulty

intermediate