Black Hat2024

SpAlware & More: Advanced Prompt Injection Exploits

Black Hat26,407 views39:2212 months ago

This talk demonstrates advanced prompt injection techniques that enable persistent data exfiltration and command-and-control (C2) capabilities within large language model (LLM) applications. The research highlights how attackers can exploit vulnerabilities in LLM integrations, such as tool invocation, memory persistence, and insecure URL handling, to manipulate model behavior. The speaker showcases practical exploits against major platforms like Microsoft Copilot, Google AI Studio, and Anthropic Claude, emphasizing the lack of a deterministic security solution. The presentation concludes with a call for developers to adopt a 'zero-trust' approach to LLM outputs and to design systems that account for the inherent risks of AI agents.

Beyond Prompt Injection: Turning LLM Agents into Persistent C2 Infrastructure

TLDR: Modern LLM agents are not just chatbots; they are autonomous systems capable of executing code, browsing the web, and accessing sensitive user data. This research demonstrates how prompt injection can be used to establish persistent command-and-control (C2) channels, exfiltrate data via hidden Unicode characters, and bypass security filters in major platforms like Microsoft Copilot and Anthropic Claude. Pentesters must treat LLM agents as high-risk internal assets rather than simple text-processing interfaces.

The industry has spent the last year obsessing over basic prompt injection, treating it as a nuisance that causes a chatbot to output "I am a pirate" or ignore its system instructions. This perspective is dangerously outdated. We are no longer dealing with simple text-in, text-out interfaces. We are dealing with autonomous agents that have access to our files, our email, and our internal networks. If you are still testing LLMs by asking them to reveal their system prompt, you are missing the real threat. The research presented at Black Hat 2024 by Johann Rehberger shifts the focus from simple instruction overrides to the creation of persistent, stealthy, and functional C2 infrastructure inside the LLM ecosystem.

The Mechanics of Agentic Exploitation

The core issue is that LLM applications are increasingly "agentic." They are designed to be helpful, which means they are designed to take action. When an LLM is given access to tools—like a web browser, a code interpreter, or a file system—it becomes a programmable execution environment.

The attack flow demonstrated in the research is elegant in its simplicity. An attacker delivers a malicious payload, often through a seemingly benign document or an email, that contains instructions for the LLM to perform a series of actions. Because the LLM cannot reliably distinguish between developer-provided system instructions and user-provided data, it executes the attacker's commands with the same authority as the system's own logic.

For example, an attacker can instruct an LLM to:

Monitor the conversation for specific keywords.
When those keywords appear, trigger a tool invocation (like a web search or file read).
Exfiltrate the resulting data to an attacker-controlled server.

This is not just a prompt injection; it is a remote code execution equivalent for the AI era.

Stealthy Exfiltration via ASCII Smuggling

One of the most impressive techniques showcased is ASCII Smuggling. Attackers can hide malicious instructions within a string of text that appears perfectly normal to a human user but is interpreted as executable code by the LLM. This is achieved by using Unicode Tag characters, which are invisible in most UI elements but are processed by the underlying model.

By encoding instructions in these tags, an attacker can bypass simple content filters that only scan for plain-text keywords. When the LLM renders this text, it decodes the hidden instructions and executes them. This creates a scenario where a user might copy and paste a block of text from a website, unknowingly carrying a payload that, when pasted into an LLM-powered tool, triggers an unauthorized action.

Establishing Persistence

The most critical finding is the ability to achieve persistence. By leveraging the "memory" features now common in tools like ChatGPT, an attacker can inject a payload that remains in the model's long-term context. Once the payload is stored in the model's memory, it can influence every future interaction the user has with that agent.

In the demo, the researcher showed how a single interaction could force the model to adopt a new persona, track a counter, and continuously exfiltrate data to an external storage account. Because the model "remembers" these instructions across different chat sessions, the attacker effectively gains a persistent foothold in the user's workspace. This is a direct violation of OWASP A03:2021-Injection, but the impact is far more severe than traditional web-based injection because the "database" being manipulated is the model's own reasoning process.

Testing Your LLM Integrations

For those of us on the offensive side, the testing methodology must evolve. You need to map the agent's capabilities. Does it have access to a browser? Can it write files? Can it send emails? If the answer is yes, you are not testing a chatbot; you are testing a privileged user.

Map the Toolset: Identify every tool the agent can invoke. If it can browse the web, test for SSRF-like behavior. If it can execute code, test for sandbox escapes.
Test for Memory Persistence: Inject instructions that require the model to "remember" a state or a persona across sessions. If the model persists these instructions, you have found a persistence vector.
Check for Invisible Payloads: Use tools like the ASCII Smuggler to see if the application's UI or the LLM itself is vulnerable to hidden instruction injection.
Assume Compromise: Design your applications with the assumption that the LLM will eventually be compromised. Implement strict human-in-the-loop requirements for any action that modifies external state, such as sending an email or deleting a file.

Defenders should focus on "Instruction Hierarchy," a concept where system instructions are cryptographically or structurally separated from user data. While no deterministic solution exists, forcing the model to verify sensitive actions through a separate, non-LLM-controlled confirmation dialog is the most effective way to mitigate the risk of automated, unauthorized tool invocation.

We are in the early days of agentic security. The tools we use to build these systems are moving faster than our ability to secure them. Stop looking for simple prompt injections and start looking for the ways these agents can be turned against the very users they are meant to assist.

Talk Type

research presentation

Difficulty

advanced