Security BSides2025

Data Splicing Attacks: Breaking Enterprise DLP from the Inside Out

Security BSides San Francisco84 views44:035 months ago

This talk introduces 'Data Splicing' attacks, a novel technique that bypasses enterprise Data Loss Prevention (DLP) solutions by manipulating data at the browser level before it reaches network-based inspection points. The researchers demonstrate how attackers can use data sharding, ciphering, transcoding, and insertion to evade regex-based DLP policies that rely on predictable data patterns. The presentation highlights the inherent limitations of endpoint and proxy-based DLP in modern, browser-centric enterprise environments. The speakers also release the 'Angry Magpie' open-source toolkit to help security professionals test their own DLP configurations against these bypass techniques.

How Data Splicing Attacks Bypass Enterprise DLP Solutions

TLDR: Modern enterprise Data Loss Prevention (DLP) solutions often rely on regex-based inspection of network traffic, which fails to account for data manipulation occurring within the browser. By using techniques like data sharding, ciphering, and transcoding, attackers can exfiltrate sensitive information while remaining invisible to these security controls. Security professionals should use the Angry Magpie toolkit to audit their own DLP configurations and identify these blind spots.

Enterprise security teams have spent years building complex architectures to prevent data exfiltration. They deploy Secure Web Gateways (SWG), Cloud Access Security Brokers (CASB), and endpoint agents, all designed to inspect traffic for sensitive patterns like credit card numbers or PII. The assumption is that if you can inspect the network stream or the file upload, you can stop the leak. This research proves that assumption is fundamentally flawed.

The Blind Spot in Browser-Centric Workflows

Most DLP solutions operate on a simple premise: they look for specific patterns in transit. If a user uploads a file to Google Drive or sends a message via a SaaS application, the proxy or endpoint agent intercepts the data, runs a regex check, and blocks the action if a match is found. However, this approach ignores the fact that the browser is a full-fledged compute environment.

Attackers do not need to send raw, sensitive data over the wire. They can manipulate the data within the browser's memory or via client-side scripts before the DLP solution ever sees it. This is the core of the Data Splicing attack. By transforming the data into a format that does not match the expected regex patterns, the attacker effectively blinds the security control.

Mechanics of the Bypass

The research identifies five primary techniques for data splicing: data smuggling via alternate channels, data sharding, data ciphering, data transcoding, and data insertion. Each method exploits the fact that DLP engines are looking for static, predictable patterns.

Data Sharding

Regex-based DLP policies are often configured to look for specific, contiguous strings. If a policy is looking for a 16-digit credit card number, it expects to see that sequence in a single block. By breaking the file into small shards—each smaller than the detection sequence—the DLP engine never sees the full pattern. The attacker then reconstructs the file on the destination server.

Data Transcoding

This is perhaps the most elegant bypass. By encoding data using Base64 or other common formats before the upload, the DLP engine sees a string of seemingly random characters rather than the sensitive data it is programmed to detect. A simple JavaScript snippet can be used to intercept the file upload, encode it, and then decode it on the server side.

// Simple example of intercepting a file upload to Base64 encode it
const fileInput = document.querySelector('input[type="file"]');
fileInput.addEventListener('change', (e) => {
  const file = e.target.files[0];
  const reader = new FileReader();
  reader.onload = (event) => {
    const base64Data = btoa(event.target.result);
    // Upload the base64Data instead of the raw file
  };
  reader.readAsBinaryString(file);
});

Data Insertion

When a DLP policy is too rigid, it can be broken by simply inserting noise. If a policy is looking for a specific sequence of characters, adding a single character or a hidden HTML tag between every real character in the sensitive string will cause the regex to fail. This is particularly effective against OWASP-defined data leakage scenarios where the policy is not sufficiently robust to handle obfuscation.

Real-World Applicability

For a pentester, these techniques are gold. During an engagement, you are often faced with a locked-down environment where you cannot install custom tools. However, you almost always have access to the browser's developer console.

If you are testing an organization's DLP, start by identifying the SaaS applications they use for file sharing. Then, attempt to upload a file containing dummy credit card numbers. If the upload is blocked, you have your baseline. Now, apply one of the splicing techniques. Use the Angry Magpie toolkit to automate this process. You will likely find that the DLP solution is completely unaware of the exfiltration, as it only sees an encoded or fragmented stream of data.

The Defensive Reality

Defenders are in a tough spot. You cannot simply block all encrypted or encoded traffic, as that would break the modern web. The solution is not to build a better regex; it is to move away from pattern-based detection toward behavioral analysis.

If your DLP solution is only looking for patterns, it is already behind. You need to monitor for anomalous browser behavior, such as the execution of unauthorized scripts or the use of non-standard communication protocols like WebSockets or WebRTC for file transfers. These protocols are often used to bypass traditional proxies, and they are rarely inspected with the same rigor as standard HTTP POST requests.

The browser is the new perimeter, and it is currently the most poorly defended part of the enterprise. If you are not testing your DLP against these types of client-side manipulations, you are not testing it at all. Stop relying on static patterns and start looking at what the browser is actually doing with your data.

Talk Type

research presentation

Difficulty

advanced