DEF CON2025

Regex for Hackers

DEFCONConference7,261 views50:086 months ago

This talk demonstrates how to leverage regular expressions for effective bug hunting, specifically for identifying vulnerabilities like XSS, RCE, and SSRF. It covers the practical application of regex in code reviews and OSINT asset discovery to find sensitive information in public repositories. The presentation highlights common pitfalls in regex implementation that lead to security flaws and provides actionable patterns for pentesters. It also includes a demonstration of using regex to parse heap dumps and identify hardcoded secrets.

Regex for Hackers: Turning Pattern Matching into Bug Bounties

TLDR: Regular expressions are often treated as a chore, but they are one of the most powerful tools for finding vulnerabilities at scale. This post breaks down how to use regex to identify XSS, RCE, and SSRF by parsing source code and heap dumps. By moving beyond basic string matching, you can automate the discovery of hardcoded secrets and insecure configurations across massive codebases.

Most security researchers treat regex as a necessary evil for filtering logs or writing simple Burp match-and-replace rules. That is a mistake. When you treat regex as a search pattern language rather than just a validation tool, you unlock the ability to find complex vulnerabilities in massive, distributed codebases that would otherwise be impossible to audit manually.

The Power of Pattern-Based Auditing

Effective bug hunting requires finding the needle in the haystack. When you are staring at a repository with thousands of files, you cannot grep for every possible sink. You need to look for patterns that indicate developer intent—or, more accurately, developer failure.

Consider the common vulnerability of reflected XSS. A developer might use a simple echo statement to output user input. A naive search for echo $_GET will return too much noise. Instead, you can use regex to target specific, dangerous patterns. By constructing a regex that looks for the intersection of input sinks and output sources, you can filter out the noise and focus on the code that actually processes user-controlled data.

The same logic applies to RCE. When auditing PHP applications, you are looking for user input reaching system-level execution functions. A pattern like system\s*\( is a starting point, but it is easily bypassed. A more robust approach involves matching the function call while ensuring the argument is derived from a superglobal like $_GET or $_POST.

Practical Regex for Code Review

When you have access to source code, you should be using regex to map out the application's attack surface. The goal is to identify where the application handles sensitive data or interacts with the underlying system.

For instance, if you are looking for potential SSRF, you need to find where the application makes outbound requests. You can use grep with a regex pattern to find instances where a URL is passed to a request library.

grep -rE "curl_exec\s*\(\s*\$_(GET|POST|REQUEST)" .

This command is simple, but it is effective. It immediately narrows your focus to code paths where the URL is directly controlled by the user. If the application does not implement strict validation, you have a clear path to an SSRF vulnerability. You can find more information on how to test for these flaws in the OWASP SSRF documentation.

Parsing Heap Dumps for Secrets

One of the most overlooked areas in bug hunting is the analysis of memory dumps. When you find an Actuator endpoint in a Java application, you often get access to a heap dump. This file is a goldmine of information, but it is massive and unreadable to the naked eye.

Instead of trying to load these files into a GUI tool, use strings piped into grep with a targeted regex. You are looking for patterns that match common secrets, such as AWS access keys or JWTs.

strings heapdump.bin | grep -E "AKIA[0-9A-Z]{16}"

This regex targets the standard format for AWS access keys. Because the heap dump contains the entire state of the application's memory, you are not just looking at configuration files; you are looking at active sessions, database connections, and cached credentials. If you find a JWT, you can use a regex to extract it and then verify its structure.

The Danger of Over-Permissive Regex

The most critical lesson for any researcher is that regex is just as dangerous for the developer as it is useful for the attacker. Developers frequently use regex to "secure" their applications, and they almost always get it wrong.

The most common mistake is failing to use anchors. If a developer writes a regex to validate a URL, they might use https://trusted.com. Without the $ anchor, an attacker can register https://trusted.com.attacker.com and bypass the check entirely. This is a classic broken access control scenario.

Another frequent failure is the misuse of the period character. In regex, . matches any character. If a developer intends to match a literal dot in a domain name but forgets to escape it, they create a wildcard that allows any character to be injected. This is how many open redirect vulnerabilities are born.

Building Your Own Recon Pipeline

Stop relying on generic tools that everyone else is using. The best bug hunters build their own pipelines. By using regex to scrape GitHub for specific patterns, you can identify unique attack vectors that automated scanners miss.

Create a list of common patterns for the frameworks you target. If you are hunting on a platform that uses Flask, build a regex that identifies all defined routes. If you are targeting Laravel, look for the route definitions. Once you have these routes, you can use them to build a custom wordlist for your directory brute-forcing tools.

This approach turns the target's own code against them. You are not guessing where the endpoints are; you are extracting them directly from the source. When you combine this with a deep understanding of how regex can be abused, you stop being just another user of automated scanners and start being a researcher who can find the bugs that matter.

Regex is not just for filtering; it is for finding. Start building your own patterns today, and you will find that the most interesting bugs are often hidden in plain sight, waiting for a well-crafted expression to bring them to light.

Talk Type

workshop

Difficulty

intermediate