Benchmarks Overview

browseruse-bench integrates multiple mainstream browser evaluation benchmarks, covering various types of web interaction tasks.

Supported Benchmarks

LexBench-Browser

Online-Mind2Web

Online evaluation based on the Mind2Web dataset, testing agents’ navigation and interaction capabilities on real websites.

BrowseComp

Browser operation competition tasks, evaluating agents’ comprehensive browser operation capabilities.

Feature Comparison

Benchmark	Tasks	Language	Evaluation	Login Required
LexBench-Browser	340	Chinese	WebJudge	Partial
Online-Mind2Web	~100	English	WebJudge	No
BrowseComp	~50	English	Grader	No

Quick Comparison Run

# LexBench-Browser (Recommended, no-login subset)
uv run scripts/run.py --agent browser-use --benchmark LexBench-Browser --split no_login --mode first_n --count 5

# Online-Mind2Web
uv run scripts/run.py --agent browser-use --benchmark Online-Mind2Web --mode first_n --count 5

# BrowseComp
uv run scripts/run.py --agent browser-use --benchmark BrowseComp --mode first_n --count 5

Data Location

All benchmark data is stored in the benchmarks/ directory:

Benchmark	Data File Path
LexBench-Browser	`benchmarks/LexBench-Browser/data/tasks.json`
Online-Mind2Web	`benchmarks/Online-Mind2Web/data/Online_Mind2Web.json`
BrowseComp	`benchmarks/BrowseComp/data/tasks.json`

Planned Support

More benchmarks

If you’d like to add a new benchmark, please refer to the Custom Benchmark guide.

Get Started

Features

Examples

Development

Supported Benchmarks

LexBench-Browser

Online-Mind2Web

BrowseComp

Feature Comparison

Quick Comparison Run

Data Location

Planned Support

Get Started

Features

Examples

Development

​Supported Benchmarks

LexBench-Browser

Online-Mind2Web

BrowseComp

​Feature Comparison

​Quick Comparison Run

​Data Location

​Planned Support

Supported Benchmarks

Feature Comparison

Quick Comparison Run

Data Location

Planned Support