Black Hat2024

The Double (AI) Agent

Black Hat2,669 views39:3811 months ago

This talk demonstrates a novel cyber attack technique called PromptWare, which exploits Large Language Models (LLMs) integrated into AI-powered applications to trigger malicious actions. The researchers show how an attacker can manipulate AI agents by injecting a jailbreaking prompt and a malicious payload, effectively bypassing safety guardrails. The presentation highlights the risks of AI agents with excessive permissions, such as terminal access or SQL database connectivity, which can lead to RCE, SQL injection, or ransomware. The authors also introduce the Advanced PromptWare Threat (APwT), a sophisticated attack that requires no prior knowledge of the target application's implementation.

Beyond Prompt Injection: How AI Agents Turn LLMs into Attack Vectors

TLDR: Researchers at Black Hat 2024 demonstrated that AI agents, which combine LLMs with external tools like SQL databases and terminal access, are vulnerable to a new class of attack called PromptWare. By injecting malicious instructions into these agents, attackers can force them to perform unauthorized actions, including data exfiltration and system compromise. This research proves that the risk is no longer just about model output, but about the dangerous capabilities we grant these agents.

The industry is currently obsessed with prompt injection, treating it as a nuisance that causes chatbots to leak their system prompts or generate offensive content. This focus is narrow and misses the real danger. When we move from simple chatbots to AI agents—systems that can execute code, query databases, and interact with APIs—the threat model shifts from "bad output" to "arbitrary code execution."

The research presented at Black Hat 2024 on PromptWare highlights that we are effectively building remote access trojans into our production environments. An AI agent is essentially a wrapper around an LLM that has been granted specific tools to perform tasks. If that agent has access to a terminal or a database, the LLM becomes the command-and-control interface for an attacker.

The Mechanics of the Advanced PromptWare Threat

The researchers introduced the Advanced PromptWare Threat (APwT), a technique that allows an attacker to compromise an AI agent without needing prior knowledge of its internal implementation. The attack flow is elegant in its simplicity. It relies on the fact that these agents are designed to follow instructions, and they cannot distinguish between a legitimate user request and a malicious payload embedded within that request.

The attack follows a classic kill chain:

Privilege Escalation: The attacker injects a jailbreaking prompt to bypass the agent's safety guardrails.
Reconnaissance: The agent is instructed to query its own environment, identifying available tools, database schemas, and system capabilities.
Damage: The agent is directed to use its tools to perform malicious actions, such as modifying product prices in a database or exfiltrating sensitive user records.

In a demonstration, the researchers showed an e-commerce chatbot that was integrated with a SQL database. By providing a crafted prompt, they forced the agent to generate and execute an UPDATE statement that slashed product prices by 10 percent. Because the agent was designed to "interpret user requests and determine actions," it treated the malicious SQL command as a valid business operation.

Why This Matters for Pentesters

If you are performing a penetration test on an application that uses AI agents, stop looking for simple prompt injection. Instead, map the agent's capabilities. What tools does it have access to? Does it have a run_command function? Can it query a production database?

The OWASP Top 10 for LLM Applications already identifies A03:2023 - Model Denial of Service and A01:2023 - Prompt Injection as critical risks. However, the APwT research shows that these categories are insufficient when combined. An attacker can use prompt injection to trigger a denial-of-service attack by forcing the agent into an infinite loop of tool calls, consuming massive amounts of compute and API credits.

During your engagement, test the agent's ability to handle nested instructions. If you can force the agent to "repeat all text between START and END," you have the potential to exfiltrate the system prompt, which often contains the API keys or database connection strings the agent uses to function.

Defensive Strategies for AI Agents

Defending against this is difficult because the vulnerability is architectural. If an agent needs to query a database to function, it must have the credentials to do so. The solution is to apply the principle of least privilege at the tool level.

Restrict Tool Permissions: An agent should never have write access to a database if it only needs to read product information. If it must write, use a stored procedure that limits the scope of the operation.
Human-in-the-Loop: For high-impact actions, such as modifying financial data or deleting records, require manual approval. The agent should propose the action, but a human should execute it.
Input Sanitization: While you cannot sanitize natural language, you can validate the output of the LLM before it is passed to the tool. If the LLM generates a SQL query, pass that query through a static analysis tool or a database firewall before it hits the SQL server.

The research on Morris-II, the first AI worm, serves as a warning that these systems are interconnected. When an agent can read an email, process it, and then send a reply, it can propagate its own malicious instructions to other agents. We are moving toward a world where the security of an application depends on the security of the LLM's reasoning process.

Treat every AI agent as a privileged user. If you wouldn't give a random user on the internet the ability to run UPDATE queries on your database, do not give that ability to an AI agent that accepts input from the internet. The speed at which these systems are being deployed is outpacing our ability to secure them, and the next major breach will likely involve an agent that was simply doing exactly what it was told to do.

Talk Type

research presentation

Difficulty

advanced