Cheap AI “video scraping” can now extract data from any screen recording

In this experiment, a researcher needed to add up some numeric values scattered across twelve different emails. He made a screen recording of himself scrolling through the emails. He then got Google Gemini to extract the numbers from his screen recording into a CSV file for use in a spreadsheet.

While this is a simple example, the implications of the ability to video-scrape screencasts are significant. It means anything you can display on your screen (websites, apps, e-learning, etc.), and anything that can be captured as video from a phone or camera (books on a bookshelf, panoramic displays), has the potential to become usable input for AI.

Although several major models, including those from OpenAI and Anthropic, have research previews that demonstrate the ability to accept video as input, only Google Gemini has released this feature. This is probably because the computation costs of processing video are so high. However, computation costs will inevitably fall, so expect video as input to be widely available in the near future.

Cheap AI “video scraping” can now extract data from any screen recording

Quick summary

Why it matters

Don't fall behind

AI News

Jan 30, 2025

Chinese firm DeepSeek shakes up the AI industry

Jan 16, 2025

OpenAI plans to become a for-profit corporation

Oct 18, 2024

Cheap AI “video scraping” can now extract data from any screen recording

Aug 15, 2024

Access to AI boosts writing creativity and usefulness

Jun 9, 2024

Illuminate turns technical documents into podcasts