Black Hat2024

15 Ways to Break Your Copilot

Black Hat3,369 views39:0711 months ago

This talk demonstrates multiple security vulnerabilities and misconfigurations in Microsoft Copilot Studio, including insecure default settings that allow unauthenticated access and data leakage. The researcher highlights how common practices like over-privileged sharing, lack of tenant isolation, and reliance on AI for decision-making create significant attack surfaces. The presentation provides a practical guide for security professionals to harden their Copilot environments and introduces a new tool for reconnaissance. The session concludes with a demonstration of how these flaws can be exploited to extract sensitive information from internal SharePoint sites.

How Misconfigured Microsoft Copilot Studio Bots Expose Internal Data

TLDR: Microsoft Copilot Studio allows users to create AI assistants that, by default, may be publicly accessible and share the creator's internal identity. This research demonstrates how these misconfigurations lead to unauthorized data access, including PII and sensitive internal documents, via prompt injection and over-privileged service accounts. Security teams must audit their Copilot Studio environments, disable unneeded public channels, and enforce strict authentication policies to prevent data leakage.

Enterprise adoption of generative AI is moving faster than the security controls meant to govern it. While organizations scramble to deploy internal assistants, they are often ignoring the underlying configuration of the platforms hosting these models. Microsoft Copilot Studio, formerly known as Power Virtual Agents, is a prime example of a platform where convenience for business users has created a massive, often overlooked, attack surface for security researchers and adversaries alike.

The Mechanics of the Exposure

At the core of the issue is how these bots are architected within the Power Platform. When a user creates a bot, the platform provides a "no-code" environment that simplifies the integration of data sources like SharePoint, OneDrive, and Dataverse. The problem arises because the default configuration for these bots is often insecure.

Many of these bots are deployed with unauthenticated public access enabled. An attacker does not need a corporate login to interact with them. Furthermore, these bots are often configured to use the creator's identity to fetch data. If a user with high-level access creates a bot and shares it with the entire tenant, every user—and potentially anyone on the internet if public access is toggled on—effectively inherits the permissions of that creator when querying the bot.

This is not just a theoretical risk. By using tools like Copilot Hunter, researchers can enumerate these bots across a tenant. Once a bot is identified, the interaction becomes a standard Prompt Injection exercise. Because the bot is designed to answer questions based on internal documentation, it will often happily summarize or leak the contents of those documents if prompted correctly.

Exploiting the Trust Boundary

The most dangerous aspect of this configuration is the "Shared Identity" model. When a bot is built, it is often granted access to internal data via a service connection. If that connection is not scoped correctly, the bot acts as a proxy for the creator's access.

Consider a scenario where a bot is built to answer HR questions. It is connected to a SharePoint site containing sensitive salary data. If the bot is not configured to require authentication, or if it is shared with "Everyone," an attacker can simply ask the bot to list available documents or summarize specific files. The bot, acting on behalf of the creator, retrieves the data and presents it to the attacker.

The following command demonstrates how an attacker might begin the reconnaissance phase to identify these bots:

python copilot-studio-hunter.py doop-scan --domain yourcompany.com

This tool automates the discovery of bots by leveraging the predictable URL structures used by the Power Platform. Once the bot is found, the attacker can use the chat interface to perform data exfiltration. The lack of robust Identification and Authentication Failures means the bot has no way of verifying that the person asking for the "2024 Layoff Plan" is actually authorized to see it.

The Fallacy of Data Loss Prevention

Many organizations rely on the built-in Power Platform Data Loss Prevention (DLP) policies to secure these environments. However, it is critical to understand that these policies are governance tools, not security boundaries. They are designed to prevent accidental data movement between connectors, but they do not prevent a user from intentionally or accidentally misconfiguring a bot to leak data.

The "DLP" toggles are essentially governance switches. They can prevent a bot from connecting to a public website, but they do not stop a bot from reading a sensitive file on SharePoint and outputting its contents into a chat window. Relying on these policies to secure sensitive data is a mistake. You must treat the bot's configuration as a high-risk asset.

Hardening Your Environment

Defenders need to move beyond the default settings immediately. First, audit all existing bots in your tenant. If a bot does not need to be public, ensure that public access is disabled. Second, review the permissions of the service accounts used by these bots. If a bot only needs access to a specific folder in SharePoint, do not grant it access to the entire site.

Finally, monitor the audit logs for unusual query patterns. If you see a single user or an external IP address querying a bot for a large volume of information or attempting to list all available documents, that is a clear indicator of an ongoing exfiltration attempt.

The speed at which these bots can be deployed is their greatest strength and their greatest weakness. As a researcher or pentester, your goal should be to identify these "shadow" bots before they are used to leak your organization's most sensitive data. Start by mapping your tenant's bot landscape and testing the authentication boundaries of every assistant you find. The path of least resistance is usually the one that leads directly to your internal data.

Talk Type

research presentation

Difficulty

advanced