Skip to main content

BrowseComp

BrowseComp is a benchmark for browser operation competition tasks.

Overview

AttributeValue
Task TypeBrowser operations
EvaluationGrader-based scoring

Quick Start

# Run tasks
uv run scripts/run.py --agent browser-use --benchmark BrowseComp --mode first_n --count 3

# Evaluate results
uv run scripts/eval.py --agent browser-use --benchmark BrowseComp