DEF CON2024

Using AI Computer Vision in your OSINT data analysis

DEFCONConference6,998 views52:39over 1 year ago

This talk demonstrates the application of AI-driven computer vision, specifically Optical Character Recognition (OCR) and Object Detection, to automate the analysis of large OSINT datasets. By leveraging cloud-based services like AWS Rekognition and Azure Computer Vision, the speaker shows how to identify objects, license plates, and text within images and videos to extract actionable intelligence. The presentation highlights how these techniques can be used to track military movements and validate narratives in real-world scenarios. The speaker also discusses the importance of iterative training and human-in-the-loop validation to improve accuracy and reduce false positives.

Automating OSINT Analysis with Computer Vision

TLDR: Manual sifting through massive OSINT datasets is a losing battle that wastes time and misses critical evidence. By integrating AI-driven computer vision services like AWS Rekognition and Azure Computer Vision, researchers can automate the identification of objects, text, and license plates at scale. This approach transforms raw, unmanageable data into actionable intelligence, allowing for rapid validation of narratives and tracking of physical assets.

Open-source intelligence gathering has hit a wall. The sheer volume of data generated by social media, RSS feeds, and public databases has turned the investigative process into a chore of sifting through digital landfills. Most researchers spend ninety percent of their time manually reviewing images and videos, hoping to spot a single, relevant artifact. This is not just inefficient; it is a structural failure in how we approach modern investigations. If you are still relying on human eyes to scan thousands of frames for a specific vehicle or document, you are missing the signal in the noise.

Moving Beyond Manual Review

The shift from "OSINT" to what we might call "OS-INFO" is the primary challenge facing the community today. We are drowning in information but starving for intelligence. The solution lies in offloading the heavy lifting to computer vision models. These tools do not get tired, they do not suffer from confirmation bias, and they can process a thousand-image dataset in the time it takes you to grab a coffee.

The core of this workflow involves two primary techniques: Optical Character Recognition (OCR) and Object Detection. OCR allows you to extract text from signs, documents, and license plates, while object detection identifies specific items—like weapons, military vehicles, or even specific types of infrastructure—within a scene.

Implementing the Pipeline

When you are dealing with a large corpus of images, the goal is to reduce the dataset to a manageable size before human analysis begins. You can use cloud-based APIs to tag images automatically. For instance, if you are tracking military movements, you can train a custom model to recognize specific tank models or vehicle markings.

Consider the following workflow for processing a batch of images:

Ingestion: Collect your raw data from sources like X or Discord.
Tagging: Pass the images through an API like Azure Computer Vision to generate metadata.
Filtering: Use the generated tags to discard irrelevant content.
Visualization: Feed the remaining, high-confidence data into a tool like Power BI to map the findings geographically or chronologically.

The power of this method is in the iterative training. You do not need to be a machine learning engineer to get results. You upload a small set of images, manually tag the objects of interest, and let the service train the model. If the model misidentifies a car as a tank, you correct the tag and retrain. This "human-in-the-loop" process is what makes the system robust.

Real-World Intelligence Extraction

During the conflict in Ukraine, researchers faced the massive task of tracking military vehicle movements across thousands of social media posts. Manual tracking was impossible. By using computer vision to extract license plate numbers and vehicle serials from images, investigators could correlate sightings across different locations and times.

This data, when plotted on a map using Google Maps, revealed patterns that were invisible to the naked eye. The same logic applies to non-military investigations. Whether you are tracking a person of interest or verifying the location of a specific event, the ability to extract text from the background of a photo—like a street sign or a storefront—provides the "last mile" of evidence needed to confirm a location.

The Defensive Reality

Defenders and security teams should recognize that this level of automated analysis is now accessible to anyone with a cloud account. If your organization has a physical footprint that is frequently captured in public imagery, you are already being indexed by these tools. Understanding how your assets appear in open-source data is no longer optional. You should audit your public-facing imagery for sensitive information, such as visible badges, internal documents, or unique equipment identifiers that could be easily scraped and analyzed by an adversary using these exact techniques.

What to Do Next

Stop treating your OSINT data as a static pile of files. Start treating it as a database. If you are not using automated vision tools, you are essentially working with one hand tied behind your back. Pick a specific, narrow use case—like identifying a specific type of document or vehicle—and build a small, iterative model. You will find that the time saved on the initial sifting allows you to focus on the actual analysis, which is where the real value is created. The tools are ready, the APIs are cheap, and the data is waiting. The only thing left is to start building the pipeline.

Talk Type

research presentation

Difficulty

intermediate