DEF CON2024

Your AI Assistant Has a Big Mouth: A New Side-Channel Attack

DEFCONConference35,757 views39:58over 1 year ago

This talk demonstrates a novel side-channel attack that exploits token-length variations in streaming responses from Large Language Models (LLMs) to reconstruct encrypted traffic. By analyzing packet sizes and bandwidth patterns, an attacker can infer the content of private AI assistant interactions. The researchers show how to train a custom model to translate these token-length sequences into plain text, achieving significant success rates. The presentation includes a proof-of-concept tool, GPT-Keylogger, and discusses effective mitigations like random padding and response buffering.

How Token-Length Side Channels Expose Encrypted AI Assistant Traffic

TLDR: Researchers at Ben-Gurion University have demonstrated a side-channel attack that reconstructs encrypted LLM responses by analyzing packet sizes and bandwidth patterns. By training a custom model to map token-length sequences to plain text, attackers can bypass TLS encryption to read private AI interactions. This vulnerability affects major AI vendors and highlights the critical need for response buffering and padding in streaming AI services.

Security researchers often assume that if a connection is wrapped in TLS, the payload is opaque to anyone sniffing the wire. We focus on the handshake, the certificate validation, and the cipher suites, trusting that the encryption handles the rest. The research presented at DEF CON 2024 by Yisroel Mirsky and his team at the Offensive AI Research Lab shatters that assumption for streaming AI assistants. They proved that even when the traffic is encrypted, the metadata—specifically the packet sizes—leaks enough information to reconstruct the entire conversation.

The Mechanics of the Leak

The vulnerability stems from how LLMs stream their output. To provide a responsive user experience, services like ChatGPT or Microsoft Copilot do not wait for the entire response to be generated before sending it to the client. Instead, they stream tokens as they are generated.

The researchers discovered that for many vendors, these tokens are sent in individual packets. Because these tokens are not padded, the size of each packet directly correlates to the length of the token being transmitted. By capturing this traffic, an attacker can build a sequence of packet sizes that represents the token-length sequence of the response.

This is not just a theoretical exercise in traffic analysis. The team built a tool called GPT-Keylogger, which automates the process of sniffing, filtering, and extracting these sequences. Once the sequence is extracted, the attacker uses a secondary model—trained on a large corpus of LLM responses—to translate those lengths back into the original text.

From Packet Sizes to Plain Text

The technical brilliance of this attack lies in the training phase. The researchers did not need to break the underlying encryption. Instead, they treated the problem as a translation task. They gathered a large dataset of LLM responses, converted them into token-length sequences, and trained a model (using T5-flan) to perform the inverse operation.

When the network is quiet, the correlation between packet size and token length is nearly perfect. However, real-world networks are noisy. The researchers encountered two primary challenges: message size limits (MTU) and message buffering. When a response is too large, it gets fragmented; when the network is congested, the server buffers multiple tokens into a single packet.

To handle this, the team implemented a clever heuristic. They identified that the first sentence of an LLM response often contains the core intent or "essence" of the answer. By focusing their training on these first sentences, they achieved an attack success rate of over 50 percent. Even when the model failed to get the exact wording, it often captured the semantic meaning, which is more than enough for an attacker looking to exfiltrate sensitive data.

Real-World Implications for Pentesters

If you are performing a red team engagement or a security assessment, this technique changes how you view "encrypted" traffic. During a man-in-the-middle scenario, you are no longer limited to just seeing the destination IP or the volume of data. You can now potentially infer the content of the user's interaction with an AI assistant.

Imagine a user in a corporate environment asking an AI to debug a proprietary codebase or summarize a confidential document. Even if the traffic is encrypted, an attacker on the same local network—perhaps in a coffee shop or a compromised office segment—can sniff the traffic and reconstruct the AI's response. This falls squarely under OWASP A02:2021 – Cryptographic Failures, as the implementation fails to protect the confidentiality of the data in transit despite the use of encryption.

Mitigating the Side Channel

Defending against this is surprisingly straightforward, provided the vendor is willing to trade a small amount of latency for security. The researchers worked with several vendors who have already begun implementing fixes. The most effective defense is response buffering. By forcing the server to wait and send larger, uniform chunks of data rather than individual tokens, the vendor destroys the correlation between packet size and token length.

Other effective measures include:

Random Padding: Adding junk data to packets to normalize their sizes.
Padding to the Nearest Value: Rounding packet sizes to a fixed interval (e.g., multiples of 100 bytes), which makes the delta between packets useless for an attacker.

The era of AI integration is moving faster than our ability to secure it. This research serves as a reminder that we cannot rely on transport-layer security to mask application-layer side channels. As you integrate AI into your own workflows or test applications that do, look for these patterns. If you see a stream of packets that vary in size in direct proportion to the text being generated, you are looking at a potential leak. The next time you are on an engagement, don't just look for cleartext; look for the patterns in the noise.

Talk Type

research presentation

Difficulty

advanced