back to the log

2026-06-17

Real screenshots, not stock slop — headless source-page capture

Stock footage screams content farm. Capturing the actual source — the repo, the docs, the release notes — with a headless browser makes the output look like it came from someone who read the thing.


Nothing says "low-effort content" like generic stock B-roll laid over a tech story. Glowing blue circuit boards, a stock-footage hacker in a hoodie. The moment it appears, the viewer knows nobody read the source.

The fix is nearly free: show the actual thing. The GitHub repo. The docs page. The release notes. The benchmark table from the paper. A headless browser makes that cheap enough to do every single time.

The capture

Playwright drives a real Chromium, navigates to the source, and screenshots it:

from playwright.sync_api import sync_playwright

def capture(url: str, out: str, width=1280, height=1600):
    with sync_playwright() as p:
        browser = p.chromium.launch()
        page = browser.new_page(
            viewport={"width": width, "height": height},
            device_scale_factor=2,  # retina-crisp — non-negotiable for video
        )
        page.goto(url, wait_until="networkidle")
        page.screenshot(path=out, full_page=False)
        browser.close()

The touches that separate it from a scrape

A raw screenshot still looks scraped. Four things make it look intentional:

  • device_scale_factor=2. Capture at 2x and your screenshots are crisp when a video scales them up. A blurry screenshot is worse than no screenshot.
  • Wait for the real content. wait_until="networkidle" — or better, wait for a specific selector — so you don't capture a half-loaded skeleton.
  • Kill the cookie banners. A GDPR overlay in your shot screams "I scraped this." Inject CSS to hide [id*="cookie"], [class*="consent"], or click the dismiss button before you shoot.
  • Crop to what matters. Don't screenshot the whole page and zoom in post. Grab the element: page.locator("table.benchmarks").screenshot(path=out). Frame the diff, the table, the one paragraph you're talking about.

Then, in the video, a slow sub-pixel zoom across the screenshot (a Ken Burns move) reads as deliberate instead of a static slide.

Be a good citizen

Headless capture is scraping, so behave like it:

  • Set a real user_agent and don't hammer — you're grabbing one image, not crawling a site.
  • Respect robots.txt and terms of service. Public docs and repos are fair game; gated or auth-walled content is not.
  • Cache captures. The same release-notes page doesn't need re-shooting every run.

The difference between content that looks researched and content that looks generated is often a single question: did you show the actual source? Headless capture makes the honest answer — yes — cheap enough that there's no excuse not to.