Skip to main content
browseruse-bench integrates multiple mainstream browser evaluation benchmarks, covering various types of web interaction tasks.

Supported Benchmarks

Feature Comparison

BenchmarkTasksLanguageEvaluationLogin Required
LexBench-Browser340ChineseWebJudgePartial
Online-Mind2Web~100EnglishWebJudgeNo
BrowseComp~50EnglishGraderNo

Quick Comparison Run

# LexBench-Browser (Recommended, no-login subset)
uv run scripts/run.py --agent browser-use --benchmark LexBench-Browser --split no_login --mode first_n --count 5

# Online-Mind2Web
uv run scripts/run.py --agent browser-use --benchmark Online-Mind2Web --mode first_n --count 5

# BrowseComp
uv run scripts/run.py --agent browser-use --benchmark BrowseComp --mode first_n --count 5

Data Location

All benchmark data is stored in the benchmarks/ directory:
BenchmarkData File Path
LexBench-Browserbenchmarks/LexBench-Browser/data/tasks.json
Online-Mind2Webbenchmarks/Online-Mind2Web/data/Online_Mind2Web.json
BrowseCompbenchmarks/BrowseComp/data/tasks.json

Planned Support

  • More benchmarks
If you’d like to add a new benchmark, please refer to the Custom Benchmark guide.