Documentation Index
Fetch the complete documentation index at: https://docs.benchgen.com/llms.txt
Use this file to discover all available pages before exploring further.
Read Results
After a benchmark run completes, Eval generates a structured results report. This page explains what each section means and how to use it.Results Report Structure
Summary metrics
| Metric | What it means |
|---|---|
| Accuracy | Percentage of test cases where the model’s response matched the expected answer |
| Avg latency | Mean response time per question in milliseconds |
| Avg cost | Mean token cost per question (API models only) |
| Pass / Fail | Count of passed and failed cases |
Per-question breakdown
Each test case shows:- The input prompt
- The model’s response
- The expected answer
- Pass / Fail status
- Latency and token usage