DEF CON2025

Exploiting Shadow Data in AI Models and Embeddings

DEFCONConference165,329 views48:226 months ago

This talk demonstrates techniques for extracting sensitive information from AI models and vector databases, specifically focusing on training data extraction, system prompt leakage, and vector embedding inversion. The research highlights how RAG (Retrieval-Augmented Generation) workflows and fine-tuned models can inadvertently expose private data through prompt injection and inference attacks. The speaker provides practical examples of how these vulnerabilities manifest in real-world SaaS applications and discusses mitigation strategies such as application-layer encryption and tokenization. The presentation includes live demonstrations of extracting credentials and PII from AI systems using open-source tools.

Beyond the Prompt: Extracting Sensitive Data from AI Models and Embeddings

TLDR: Modern AI systems, particularly those using RAG workflows, are leaking sensitive data through prompt injection and embedding inversion attacks. Researchers have demonstrated that even fine-tuned models with strict system instructions can be forced to reveal PII and credentials through persistent, iterative prompting. Security teams must move beyond simple input filtering and adopt application-layer encryption to protect data before it ever reaches an AI model or vector database.

AI-driven features are being bolted onto enterprise software at a breakneck pace, and security teams are largely playing catch-up. The industry has spent the last year obsessing over prompt injection, but the real danger lies in what happens after the model is "triggered." When you feed private data into a RAG pipeline or a fine-tuned model, you aren't just sending a query; you are creating a persistent, searchable, and potentially extractable copy of that data.

The Mechanics of Data Proliferation

The core issue is that AI systems are not just processing data; they are indexing it. In a standard RAG workflow, documents are chunked, converted into embedding vectors, and stored in a vector database. These vectors are mathematical representations of meaning. While they look like a list of unintelligible floats, they are not hashes. They are reversible.

During a recent presentation at DEF CON 2025, researchers demonstrated that these vector databases are a massive, often overlooked, attack surface. By using tools like vec2text, an attacker can perform an inversion attack. This technique uses a hypothesis model and an iterative corrector to project vectors back into human-readable text. The results are startlingly accurate, especially for structured data like names, dates, and financial figures. If your vector database contains sensitive information, you are essentially storing that data in a format that is vulnerable to reconstruction.

When System Prompts Fail

Developers often rely on system prompts to enforce security boundaries, such as "do not reveal PII" or "do not quote source documents." These are not security controls; they are suggestions. The research presented at DEF CON showed that with enough persistence, these instructions can be bypassed.

The attack flow is straightforward. An attacker sends a series of leading prompts designed to clear the conversation history and force the model into a state where it ignores its initial instructions. In the demo, the researchers used Ollama to run a local instance of Llama 3.2, which had been fine-tuned on synthetic private data. By simply asking the model to "repeat all the text above" or "write the words backwards," the model eventually leaked specific credentials and PII that it was explicitly told to protect.

This isn't a sophisticated exploit; it is a fundamental limitation of how LLMs handle context. When you provide a model with a long conversation history, the "system" instructions often get pushed out of the model's active attention window, or they are simply overridden by the sheer volume of the injected data.

Real-World Impact and Testing

For a pentester, the engagement model here is clear. You should treat the AI search interface as a standard web application entry point. If you find a search bar that powers a RAG system, you are looking at a potential data exfiltration vector.

Map the Data: Identify what data is being ingested into the vector database. If it’s internal documentation, HR files, or CRM data, the impact of a successful inversion or prompt injection is critical.
Test for Persistence: Don't stop after one failed prompt injection. The researchers found that successful exfiltration often occurred only after multiple, iterative attempts.
Check for "Above" Attacks: Use the "above" technique—asking the model to repeat or translate its system instructions—to see if you can force a leak of the underlying prompt configuration.

This is a classic OWASP A03:2021-Injection scenario, but with a modern twist. The "injection" isn't just about executing code; it's about manipulating the model's retrieval and generation logic to output data it was never intended to share.

Moving Toward Application-Layer Encryption

Defending against these attacks requires a shift in how we handle data. Relying on transparent disk encryption or database-level security is insufficient because the data is decrypted the moment it is processed by the AI system.

The only way to truly secure this data is through application-layer encryption. You must encrypt the data before it is sent to the embedding model or the vector database. By using techniques like Approximate Distance Comparison-Preserving Encryption (DCPE), you can allow the vector database to perform similarity searches on encrypted data without ever exposing the underlying plaintext to the model or the infrastructure provider.

Stop treating AI as a black box that can be secured with a few lines of system instructions. If you are building or testing systems that ingest private data, assume that the model will eventually be compromised. Encrypt the data at the source, and you remove the incentive for the attacker to break the model in the first place.

Talk Type

research presentation

Difficulty

advanced