Introduction

browseruse-bench is a unified evaluation framework for testing AI browser agents across multiple benchmarks. It provides standardized interfaces to run and evaluate different agents on various web interaction tasks.

Quick Start

Get up and running in 5 minutes

Supported Agents

View all supported browser agents

Benchmarks

Explore available benchmarks

Leaderboard

See agent performance rankings

Key Features

Multi-Agent Support

Unified interface for Agent-TARS, browser-use, and more

Multi-Benchmark

LexBench-Browser, Online-Mind2Web, BrowseComp

Cloud Browser

Lexmount cloud browser integration for scalable testing

Auto Evaluation

GPT-4 powered evaluation with detailed metrics

Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    browseruse-bench                      │
├─────────────────────────────────────────────────────────┤
│  Scripts Layer     │  run.py  │  eval.py  │  leaderboard │
├─────────────────────────────────────────────────────────┤
│  Agents            │  browser-use  │  Agent-TARS  │ ... │
├─────────────────────────────────────────────────────────┤
│  Benchmarks        │  LexBench  │  Mind2Web  │ BrowseComp│
├─────────────────────────────────────────────────────────┤
│  Browser Layer     │  Local Chrome  │  Lexmount Cloud   │
└─────────────────────────────────────────────────────────┘

What’s Next?

Install

Follow the Quick Start guide to set up your environment

Run Benchmark

Execute your first benchmark with any supported agent

Evaluate

Use the evaluation scripts to measure agent performance

Compare

View results on the leaderboard and compare agents

Get Started

Features

Examples

Development

Quick Start

Supported Agents

Benchmarks

Leaderboard

Key Features

Multi-Agent Support

Multi-Benchmark

Cloud Browser

Auto Evaluation

Architecture Overview

What’s Next?

Get Started

Features

Examples

Development

Quick Start

Supported Agents

Benchmarks

Leaderboard

​Key Features

Multi-Agent Support

Multi-Benchmark

Cloud Browser

Auto Evaluation

​Architecture Overview

​What’s Next?

Key Features

Architecture Overview

What’s Next?