Security BSides2025

When AI Goes Awry: Responding to AI Incidents

Security BSides San Francisco215 views47:4910 months ago

This presentation explores the unique challenges of incident response for agentic AI systems, highlighting how traditional security controls often fail against novel attack vectors like prompt injection and data poisoning. It details the risks introduced by LLM agents, including unauthorized tool usage, credential leakage, and cross-contamination of data. The speakers provide a framework for incident responders to improve visibility, logging, and governance in environments where AI agents have autonomous access to sensitive data and internal tools.

When Your AI Agent Becomes the Insider Threat

TLDR: Agentic AI systems are being deployed into production environments with broad access to internal tools and sensitive data, creating a massive, unmonitored attack surface. Researchers demonstrated that these systems fail to distinguish between developer instructions and user-supplied prompts, allowing for indirect prompt injection that can lead to data exfiltration and unauthorized system commands. Security teams must treat agentic AI as a high-risk asset, implementing strict access controls, robust logging of all agent-tool interactions, and human-in-the-loop verification for sensitive operations.

Agentic AI is no longer a research project. It is being integrated into enterprise workflows at a breakneck pace, often bypassing the security scrutiny applied to traditional software. While developers focus on the efficiency gains of LLM-driven automation, they are ignoring the fundamental shift in the threat model. When an AI agent is granted the ability to read your emails, query your databases, and execute shell commands, it becomes the most powerful insider threat in your organization.

The Failure of Control and Data Plane Separation

The core vulnerability in current agentic architectures is the inability of the LLM to differentiate between the control plane and the data plane. In a traditional application, the code defines the control plane, and the user input is strictly data. With LLM agents, the prompt is both. When an agent processes a document or an email, it treats the content of that document as instructions. If an attacker can influence that content, they can hijack the agent's execution flow.

This is not a theoretical bug. It is a design flaw in how we currently build agentic systems. The agent receives a prompt from the user, which might include a document retrieved via a Retrieval Augmented Generation (RAG) pipeline. If that document contains a malicious prompt, the agent will execute it with the same authority as the original system prompt.

Exploiting the Agentic Toolchain

The research presented at BSides 2025 highlights how attackers can weaponize the tools an agent is allowed to use. Many agents are configured with access to Model Context Protocol (MCP) servers, which provide a standardized way for agents to interact with local or remote resources. If an MCP server is misconfigured or lacks granular permissions, an attacker can use prompt injection to force the agent to perform actions it was never intended to do.

Consider a scenario where an agent has access to a file system tool. An attacker can craft a PDF containing an invisible prompt that instructs the agent to run a destructive command.

# Example of a malicious command hidden in a document
# The agent is tricked into executing this via prompt injection
rm -rf /sensitive/data/directory

Because the agent is operating with the identity and permissions of the service account, the operating system sees a legitimate request. The agent does not "know" it is being malicious; it is simply following the instructions it received in the context window.

The Visibility Gap in Incident Response

Incident response for these systems is currently broken because our existing security stack is blind to the internal logic of an LLM. When an incident occurs, you cannot simply look at network logs to see what happened. You need to reconstruct the agent's decision-making process.

Most organizations are failing to log the full interaction history between the agent and its tools. Without this, you are left with a "black box" scenario where you know data was exfiltrated, but you have no idea which prompt triggered the action or what the agent's internal state was at the time.

To effectively triage these incidents, you need to capture:

Full Request/Response Telemetry: Every prompt sent to the model and every tool output returned to the model.
Tool Usage Logs: Which specific tools were invoked, with what arguments, and by which agent instance.
Identity Context: Mapping every action back to the specific user or external trigger that initiated the agent's chain of thought.

Securing the Agentic Perimeter

Defending against these attacks requires a shift toward a zero-trust architecture for AI. You must assume that any data the agent processes is potentially malicious. This means implementing strict input validation and sanitization at the agent level, not just at the application gateway.

Furthermore, you need to implement "human-in-the-loop" controls for any action that modifies the state of your production environment. If an agent wants to delete a file, update a database record, or send an email, it should require an explicit, out-of-band approval from a human operator.

Finally, inventory your AI assets. You cannot secure what you do not know exists. Many organizations have "shadow AI" agents running on developer machines or in isolated cloud environments that have access to production credentials. Use your existing endpoint detection and response (EDR) tools to identify processes that are making frequent calls to LLM APIs or interacting with MCP servers.

The rush to deploy agentic systems is creating a massive security debt. If you are building or deploying these agents, stop and ask yourself: if this agent were compromised, what is the worst thing it could do? Then, build your controls to prevent that specific outcome. The era of "move fast and break things" is over; in the age of agentic AI, moving fast just means you are breaking your own security.

Talk Type

research presentation

Difficulty

intermediate