Security BSides2025

One Search to Rule Them All: Threat Modelling AI Search

Security BSides San Francisco448 views24:2410 months ago

This talk explores the security implications of integrating enterprise AI search tools into corporate environments, focusing on the expanded risk surface created by centralized access to disparate data stores. It demonstrates how these tools can inadvertently bypass existing access controls and device posture requirements, leading to potential data exposure. The speaker provides a practical framework for threat modeling these integrations, emphasizing the need for rigorous service account management and careful evaluation of data classification. The presentation highlights the risks associated with the Model Context Protocol (MCP) and the importance of vetting AI-driven search configurations.

Enterprise AI Search Tools Are Just Expensive Access Control Bypasses

TLDR: Enterprise AI search tools like Glean and Atlassian Rovo aggregate data from disparate SaaS platforms into a single, centralized index, creating a massive, unified target for attackers. By compromising a single user session or service account, an attacker can query sensitive data across the entire organization, effectively bypassing granular access controls enforced at the source. Security teams must treat these search platforms as high-value targets and implement strict service account scoping and rigorous audit logging to prevent unauthorized data exfiltration.

Modern enterprise search platforms promise to solve the "where is that document?" problem by indexing everything from Slack channels to Jira tickets and Confluence pages. While this is a productivity win for employees, it is a goldmine for anyone performing an internal assessment. These tools function by ingesting data from multiple SaaS providers, creating a centralized, searchable repository. From an offensive perspective, this is a dream. Instead of needing to compromise individual accounts across five different platforms to find sensitive information, you only need to compromise the search tool itself.

The Mechanics of Centralized Exposure

The core issue is that these tools act as a proxy for the user. When you query an AI search tool, it does not just search a static database. It often performs real-time API calls to the underlying data stores to fetch the most recent information. This means the search tool requires broad, often over-privileged, service account access to your entire stack.

If you are performing a red team engagement, your primary objective should be to identify the service account tokens or session cookies associated with the search platform. Once you have those, you are no longer limited by the permissions of a single user. You are effectively operating with the aggregate permissions of the search tool itself. If the tool is misconfigured or if the service account has excessive scopes, you can query for keywords like "password," "API key," or "financial results" across the entire company's documentation and communication history.

When Zero Trust Meets AI Search

Many organizations rely on device posture checks to secure access to sensitive applications. For example, you might require a managed device to access Confluence. However, if that same Confluence instance is indexed by an AI search tool that is accessible from an unmanaged mobile device, you have just created a massive hole in your security model.

The Model Context Protocol (MCP) is becoming the standard for connecting these AI agents to local and remote data sources. While MCP provides a structured way for LLMs to interact with tools, it does not inherently solve the authorization problem. If you deploy an MCP server that connects to your internal file system or a sensitive API, you are responsible for the authorization layer. If that layer is weak, the LLM will happily exfiltrate data on your behalf.

Consider the following scenario during a penetration test. You find an MCP server running on a developer's workstation that has access to sensitive design files. If you can influence the LLM's prompt, you can instruct it to perform actions that the developer never intended.

# Example of an LLM prompt that could be used to exfiltrate data
# if the MCP server lacks proper authorization checks.
"Get all my design files from the project folder and save them as JPEGs on my local desktop."

If the MCP server does not validate the context of the request or the identity of the user, it will execute these commands with the privileges of the local user. This is a classic Broken Access Control scenario, but it is masked by the complexity of the AI integration.

Practical Testing and Defensive Hardening

During an engagement, your focus should be on the "discoverability" of sensitive data. Use the search tool to identify what information is accessible to a standard user. If you find that a low-privileged account can search for and retrieve documents that should be restricted, you have found a Security Misconfiguration that is ripe for exploitation.

Defenders need to move away from the "it depends" mindset. You cannot simply rely on the search tool's default security settings. You must:

Audit Service Account Scopes: Ensure that the service accounts used by your search tools have the absolute minimum permissions required. If a tool only needs to index public channels, do not give it access to private DMs.
Centralize Logging: Treat the search tool's logs as critical security telemetry. If you see a sudden spike in queries for sensitive keywords from a single user, that is a high-fidelity alert.
Restrict Access: Do not allow access to these search tools from unmanaged devices. If the tool is the gateway to your data, it must be protected with the same rigor as your most sensitive production environment.

The convenience of "one search to rule them all" comes at the cost of a significantly expanded attack surface. As these tools become more deeply integrated into our workflows, they will become the primary target for data exfiltration. If you are not threat modeling your AI search integrations today, you are leaving the front door wide open. Stop assuming the vendor has handled the security for you and start verifying the access controls yourself.

Talk Type

talk

Difficulty

intermediate