Black Hat2023

E-Meet (or Emit)? My Keystrokes: How Benign Screen-sharing Meetings Could Leak Typing Behaviors

Black Hat831 views41:37over 2 years ago

This talk demonstrates a novel side-channel attack that reconstructs a user's keystrokes and typing patterns from screen-sharing video recordings. By tracking the movement of the text cursor and isolating character bounding boxes, an attacker can infer inter-key delay timings to bypass keystroke-based biometric authentication. The research highlights how seemingly benign screen-sharing sessions can lead to credential exposure without the need for installing malicious software on the victim's machine. The speaker introduces the 'Camstroke' technique and uses an LSTM-based model with beam search to predict passwords from the leaked typing patterns.

Your Screen-Sharing Habit Is Leaking Your Passwords

TLDR: Researchers have developed a side-channel attack that reconstructs keystrokes from screen-sharing video recordings by tracking the text cursor and character bounding boxes. This technique, dubbed Camstroke, allows an attacker to infer inter-key delay timings and bypass keystroke-based biometric authentication without installing any malware. For security professionals, this highlights a critical, often overlooked risk in remote work environments where sensitive data is typed while sharing a screen.

Screen-sharing is the backbone of modern collaboration, but it is also a massive, unmonitored attack surface. We spend hours in Zoom, Microsoft Teams, and Discord sessions, often treating our shared screens as private spaces. The research presented at Black Hat 2023 by Chrisando Ryan and Andry Chowanda shatters that illusion. They demonstrated that an attacker does not need to compromise a machine with a traditional keylogger to capture sensitive input. Instead, they can simply record the video stream of a screen-sharing session and use computer vision to reconstruct what the victim is typing.

The Mechanics of the Camstroke Attack

The core of this research relies on the fact that typing behavior is unique. This is the foundation of keystroke dynamics, a biometric method used to verify identity based on the rhythm, hold latency, and inter-key delay of a user. While most people assume this data is only accessible via kernel-level hooks or hardware implants, this attack proves that the visual representation of typing is sufficient.

The attack flow is straightforward but technically demanding. First, the attacker records the victim's screen. Using OpenCV, the attacker tracks the movement of the text cursor. The researchers identified that the most recent typing activity almost always occurs immediately to the left of the blinking cursor. By isolating the area around the cursor, the attacker can extract character bounding boxes. Even when the input is masked by bullet points or asterisks, the timing of these visual updates—the appearance of a new character box—reveals the inter-key delay.

Once the inter-key delays are extracted, the attacker has a profile of the victim's typing rhythm. The researchers used an LSTM (Long Short-Term Memory) neural network combined with a beam search algorithm to map these timings to specific character sequences. Because the model is trained on common datasets like those found in RockYou, it can effectively brute-force the most likely password candidates that match the observed rhythm.

Why This Matters for Pentesters

For those of us conducting red team engagements, this is a game-changer. Traditional input capture requires administrative privileges or physical access, both of which are high-friction hurdles. This side-channel attack is entirely passive and permissionless. You do not need to drop a payload or exploit a zero-day. If you are invited to a meeting where the target shares their screen, you have everything you need.

During a test, you would record the session, process the video to isolate the cursor, and run the extraction script. The impact is significant. If the target organization uses keystroke-based biometric authentication for their internal portals, this technique provides a viable path to bypass those controls. It turns a standard, benign meeting into an intelligence-gathering operation.

Defending Against Visual Side-Channels

Defending against this is difficult because it exploits human behavior rather than a software bug. The most effective mitigation is strict operational security. Users must be trained to pause screen sharing whenever they enter credentials, regardless of whether the input is masked.

From a technical perspective, some researchers are exploring ways to inject noise into the typing process. Projects like Kloak attempt to obfuscate keystroke timing by introducing random delays at the kernel level. While this can disrupt the precision required for a successful Camstroke attack, it is not a silver bullet. It is a reminder that as we move toward more advanced biometric authentication, we are also creating new, subtle side-channels that attackers are already learning to exploit.

The next time you are on a call and need to log into a sensitive system, take the extra three seconds to stop your screen share. The data you are leaking is more than just a password; it is the unique rhythm of your digital identity.

Talk Type

research presentation

Difficulty

advanced