Skip to main content
browseruse-bench is a unified evaluation framework for testing AI browser agents across multiple benchmarks. It provides standardized interfaces to run and evaluate different agents on various web interaction tasks.

Key Features

Multi-Agent Support

Unified interface for Agent-TARS, browser-use, and more

Multi-Benchmark

LexBench-Browser, Online-Mind2Web, BrowseComp

Cloud Browser

Lexmount cloud browser integration for scalable testing

Auto Evaluation

GPT-4 powered evaluation with detailed metrics

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    browseruse-bench                      │
├─────────────────────────────────────────────────────────┤
│  Scripts Layer     │  run.py  │  eval.py  │  leaderboard │
├─────────────────────────────────────────────────────────┤
│  Agents            │  browser-use  │  Agent-TARS  │ ... │
├─────────────────────────────────────────────────────────┤
│  Benchmarks        │  LexBench  │  Mind2Web  │ BrowseComp│
├─────────────────────────────────────────────────────────┤
│  Browser Layer     │  Local Chrome  │  Lexmount Cloud   │
└─────────────────────────────────────────────────────────┘

What’s Next?

1

Install

Follow the Quick Start guide to set up your environment
2

Run Benchmark

Execute your first benchmark with any supported agent
3

Evaluate

Use the evaluation scripts to measure agent performance
4

Compare

View results on the leaderboard and compare agents