Browser Automation for AI Agents: What Actually Works

May 28, 2026

Most agent demos that involve a browser are shot in one take for a reason. The moment you try to make browser automation reliable — running unattended, across sites you don't control, hundreds of times — it stops being a demo and starts being an engineering problem. I've spent a lot of time on that problem building the browser layer inside Froots, and a handful of patterns made the difference between "works in the video" and "works at 3am while I'm asleep."

Prefer structured verbs over raw eval

It's tempting to give the agent one giant escape hatch: run arbitrary JavaScript in the page and parse whatever comes back. It works right up until it doesn't, and when it fails it fails opaquely.

A small vocabulary of structured commands beats one omnipotent one:

navigate <url> click <selector> fill <selector> <value> type <selector> <value> # contenteditable-safe; composers ignore plain fill text <selector> # read innerText back attr <selector> <name> # read an attribute

The point isn't that eval is useless — it's the fallback, not the default. Structured verbs give you predictable error messages ("selector not found" beats a stack trace from inside a minified bundle), and they make the agent's intent legible in the timeline.

Kill the sleep instinct — wait on conditions

The single biggest source of flakiness is sleep(2000). It's wrong in both directions: too short and you act before the element exists, too long and every run wastes seconds that add up. Replace time with conditions:

wait_selector <selector> [timeout] # poll until it exists wait_gone <selector> [timeout] # poll until the spinner is gone wait_url <substring> [timeout] # poll until navigation lands

An agent that waits on the thing it actually needs is both faster and dramatically more reliable than one that guesses at timing.

Always read something back

This is the lesson I learned the hard way. Early on, a command would return success and I'd assume the work was done. Then I'd find the agent had been talking to a pane that wasn't there — every call "succeeded" by doing nothing.

The fix is a discipline: a write should be confirmed by a read. After you fill a field, read it back. After you click submit, wait for the URL or a success node. Silent success is not the same as success, and an agent that never reads back its own actions will confidently report that a broken pipe is working.

I now treat any "site X is broken" report from an agent with suspicion until I've confirmed the bridge actually completed the round trip. Nine times out of ten the page was fine and the connection was the no-op.

Use the session's own cookies for reads

A lot of useful data sits behind a login. Rather than scraping a login wall, do an in-page fetch with credentials: 'include' from the right origin — you reuse the user's existing session instead of re-authenticating or storing credentials. Navigate to the target origin first to avoid CORS surprises, and probe for a login cookie before you reach for authenticated data, so you can ask the human to sign in rather than silently scraping an error page.

Screenshots are the honest fallback

When the DOM is hostile — shadow roots, canvas-rendered UIs, aggressively obfuscated class names — stop fighting selectors and take a screenshot. A vision model reading a picture of the page is sometimes the most robust path, and it's a good last resort to keep in the toolbox.

The meta-lesson

Reliable browser automation is less about clever selectors and more about closing the loop: act, observe, confirm, and never trust a result you didn't verify. That's the same principle that makes the rest of Froots trustworthy — pair it with memory that survives restarts and you have an agent that can actually be left alone to work.

If you want a browser your agent drives with all of this built in, that's what Froots is for.

Dylan Worrall is the founder of Froots and Soshi Labs, building always-on autonomous AI agents.

GitHub
LinkedIn