Run a Benchmark

A benchmark run evaluates a model against a set of test cases and produces a structured results report.

Prerequisites

A model connected or uploaded in Eval (see Upload a model)
At least one benchmark dataset available in your workspace

Steps

Open Eval from the left sidebar.
Select a benchmark from the benchmark library or upload your own.
Choose a model to evaluate — pick from connected APIs or uploaded models.
Configure run settings
- Temperature and sampling parameters
- Max tokens per response
- Timeout per question
Click Run to start the evaluation.
Monitor progress in the run dashboard — results stream in as each test case completes.
View the report when the run finishes (see Read results).

Tips

Run the same benchmark against multiple models to compare them side by side.
Start with a small benchmark (10–50 cases) to validate your setup before scaling up.
Save run configs to re-run the same evaluation after each training iteration.

Overview Upload a Model

⌘I