DEF CON2025

Context Aware Anomaly Detection in Automotive CAN Without Decoding

DEFCONConference324 views18:416 months ago

This talk demonstrates an anomaly detection technique for automotive Controller Area Network (CAN) bus traffic using Long Short-Term Memory (LSTM) and Variational Autoencoder (VAE) models. By treating raw CAN logs as time-series data, the approach identifies malicious injections without requiring prior protocol decoding or reverse engineering of specific vehicle message structures. The method effectively detects replay attacks, fuzzing, and cross-ECU message spoofing by identifying deviations from learned normal sequential patterns and entropy spikes. The presenter provides a practical data pipeline and model architecture for implementing this detection in automotive security operations centers.

Detecting Automotive CAN Bus Anomalies Without Reverse Engineering

TLDR: Traditional automotive intrusion detection systems rely on brittle, signature-based rules that fail when faced with novel attacks or encrypted traffic. This research introduces a context-aware approach using Long Short-Term Memory (LSTM) networks and Variational Autoencoders (VAEs) to identify malicious CAN bus injections by analyzing traffic as time-series data. By focusing on sequence patterns and entropy rather than payload decoding, this method detects replay attacks and fuzzing without needing prior knowledge of proprietary vehicle protocols.

Automotive security remains a game of cat and mouse where the cat is often blind. Most current intrusion detection systems for Controller Area Network (CAN) buses are stuck in the past, relying on static, rule-based logic that breaks the moment a manufacturer changes a message ID or implements a new proprietary protocol. If you have ever spent time reverse engineering a vehicle, you know the pain of manually decoding thousands of frames just to find the one that controls the door locks or the steering angle. This research presented at DEF CON 2025 shifts the focus from decoding individual messages to understanding the temporal context of the entire bus.

The Failure of Signature-Based Detection

The fundamental flaw in most automotive security tools is the assumption that you can define "normal" through static rules. CAN bus traffic is inherently noisy and bursty. When you rely on simple thresholding or signature matching, you inevitably trigger false positives during legitimate events like diagnostic sessions or firmware updates. Furthermore, attackers are increasingly using techniques that mimic valid traffic patterns, making it trivial to bypass systems that only look for specific, known-bad message IDs.

The research demonstrates that we should stop treating CAN frames as isolated packets and start treating them as a continuous time-series stream. By capturing the frequency of CAN IDs, the entropy of the payload bytes, and the timing deltas between messages, you can build a model that understands the "rhythm" of the bus. When an attacker injects a malicious frame, they disrupt this rhythm. Even if the injected frame looks perfectly valid in isolation, the model flags it because it breaks the established sequence.

Implementing the Pipeline

The practical implementation of this technique relies on a straightforward data pipeline. You start by collecting raw logs using candump, which is the standard utility for capturing CAN traffic on Linux-based systems. Once you have your normal.log and attack.log files, the preprocessing step extracts the essential features:

# Extracting features for the model
# 1. Timestamp (for calculating delta_t)
# 2. CAN ID (hexadecimal)
# 3. Payload bytes (up to 8 bytes)

The model uses an LSTM-VAE architecture. The LSTM component excels at learning the sequential dependencies of the bus, while the VAE adds a layer of probabilistic inference. Instead of just outputting a binary "anomaly" flag, the VAE learns a latent distribution of what normal traffic looks like. When the model encounters a sequence that deviates from this learned distribution, it produces a high reconstruction error. This is the "anomaly score."

Real-World Attack Detection

During the demonstration, the researcher used caringcaribou to perform fuzzing on a target CAN ID (19B). The model successfully identified the fuzzing activity by flagging the resulting entropy spikes. Because the model is trained on the normal sequence of messages, it doesn't matter if the attacker is using a known exploit or a novel injection technique. If the injection causes the timing or the sequence to drift from the baseline, the model catches it.

This approach is particularly effective against three common attack vectors:

Replay Attacks: The model detects the timing shift when an attacker replays a captured sequence at the wrong interval.
Fuzzing: The model flags the sudden increase in payload entropy as the attacker injects random data.
Cross-ECU Spoofing: The model identifies that a message ID is being sent from an unexpected source or in an invalid context, which is a classic sign of a compromised ECU.

Moving Beyond Pcap Analysis

For a pentester, this research changes the workflow. Instead of spending days or weeks manually reverse engineering a proprietary protocol to find the "unlock" command, you can focus on identifying the baseline behavior of the vehicle. Once you have a clean capture of normal operation, you can train a model to flag any deviation. This allows you to identify the most interesting parts of the bus traffic—the anomalies—without needing to understand the underlying message structure.

Defenders should look to integrate these models into their Security Operations Centers (VSOC). While the research is currently focused on detection, the ability to provide an "anomaly score" rather than a simple alert is a massive improvement for incident response. It allows analysts to prioritize the most suspicious traffic rather than chasing down thousands of false positives generated by legacy signature-based systems.

If you are working on automotive security, stop trying to decode every single bit. Start looking at the timing and the sequence. The next time you are on an engagement, try capturing a long-duration log of normal vehicle operation and see if you can build a baseline. You might find that the most effective way to secure the vehicle is not by understanding its language, but by recognizing when it starts speaking out of turn.

Talk Type

research presentation

Difficulty

advanced