browseruse_bench.utils.eval_utils
Evaluation-related utility functions and classes.Import
EvaluationModel
OpenAI model wrapper class for task evaluation.Model name
API Key, defaults to environment variable
API Base URL, defaults to environment variable
generate
Generate evaluation response with automatic retry.load_evaluation_model
Load evaluation model with environment variable fallback.Environment Variables
| Variable | Description |
|---|---|
EVAL_MODEL_NAME | Model name (fallback: gpt-4o) |
EVAL_MODEL_API_KEY | API Key (fallback: OPENAI_API_KEY) |
EVAL_MODEL_BASE_URL | Base URL (fallback: OPENAI_BASE_URL) |
encode_image
Convert a PIL image to base64 string.PIL Image object
Image scale factor (between 0.0 and 1.0), e.g., 0.5 means 50% size
extract_score_from_response
Extract numerical score from evaluation response.Evaluation response text
Extracted score (0 if not found)
calculate_success
Determine if task is successful based on score threshold.Task score
Success threshold
True if score meets or exceeds threshold
normalized_results_file
Context manager that yields a path guaranteed to be JSONL format.Results file path