Documentation Index
Fetch the complete documentation index at: https://docs.benchgen.com/llms.txt
Use this file to discover all available pages before exploring further.
competition.yaml is the entry point for every custom environment bundle. BenchGen reads this file to understand the environment’s structure, which data and programs to use, and how to display results.
Top-level fields
title: My Evaluation Environment
description: A short description shown in the Environments Hub.
image: logo.png
terms: terms.md
pages:
- title: Overview
file: overview.md
- title: Evaluation
file: evaluation.md
phases:
- ...
tasks:
- ...
leaderboard:
- ...
| Field | Required | Description |
|---|
title | Yes | Display name shown in the hub and run UI |
description | No | One-line summary shown on the environment card |
image | No | Path to a logo image inside the bundle |
terms | No | Path to a Markdown file with terms of use |
pages | No | Additional Markdown pages rendered as environment documentation tabs |
pages
Optional documentation tabs displayed alongside the environment. Each entry needs a title and a file path relative to the bundle root.
pages:
- title: Overview
file: overview.md
- title: Data format
file: data.md
phases
Phases define the active evaluation windows. Most environments have a single phase. Each phase references one or more tasks by their index.
phases:
- index: 0
name: Evaluation
description: Main evaluation phase
start: 2025-01-01
end: 2027-12-31
tasks:
- 0
| Field | Required | Description |
|---|
index | Yes | Zero-based integer identifier for this phase |
name | Yes | Display name |
description | No | Short description of the phase |
start | No | ISO 8601 date when the phase opens (YYYY-MM-DD) |
end | No | ISO 8601 date when the phase closes |
tasks | Yes | List of task indices active in this phase |
tasks
Each task defines a single evaluation problem — the data and programs needed to score one type of submission.
tasks:
- index: 0
name: Main task
description: Evaluate model accuracy on the test set
scoring_program: scoring_program.zip
reference_data: reference_data.zip
ingestion_program: ingestion_program.zip # optional
input_data: input_data.zip # optional
| Field | Required | Description |
|---|
index | Yes | Zero-based integer identifier, referenced by phases |
name | Yes | Display name |
description | No | What the task is evaluating |
scoring_program | Yes | Path to the scoring program zip inside the bundle |
reference_data | Yes | Path to the reference data zip |
ingestion_program | No | Path to the ingestion program zip |
input_data | No | Path to the input data zip |
solutions
Optional reference solutions included with the bundle. BenchGen uses these to verify that the scoring program works correctly before the environment is published.
solutions:
- index: 0
path: example_solution.zip
tasks:
- 0
| Field | Required | Description |
|---|
index | Yes | Zero-based identifier |
path | Yes | Path to the solution zip inside the bundle |
tasks | Yes | Task indices this solution applies to |
leaderboard
Defines the columns displayed in the results table. Keys must match the keys your scoring program writes to scores.json.
leaderboard:
- title: Results
key: main
columns:
- title: Accuracy
key: accuracy
index: 0
sorting: desc
- title: F1
key: f1
index: 1
sorting: desc
Leaderboard group fields
| Field | Required | Description |
|---|
title | Yes | Section heading in the results table |
key | Yes | Unique identifier for this leaderboard group |
columns | Yes | List of column definitions (see below) |
Column fields
| Field | Required | Description |
|---|
title | Yes | Column heading |
key | Yes | Must match the key in scores.json output by your scoring program |
index | Yes | Display order (zero-based) |
sorting | No | asc or desc — direction used to rank submissions. Defaults to desc |
Complete example
title: Text Classification Benchmark
description: Evaluates model accuracy on a multi-class text classification task.
image: logo.png
pages:
- title: Overview
file: overview.md
- title: Data format
file: data.md
phases:
- index: 0
name: Evaluation
start: 2025-01-01
end: 2027-12-31
tasks:
- 0
tasks:
- index: 0
name: Classification
description: Predict the correct category for each input text
scoring_program: scoring_program.zip
reference_data: reference_data.zip
input_data: input_data.zip
ingestion_program: ingestion_program.zip
solutions:
- index: 0
path: example_solution.zip
tasks:
- 0
leaderboard:
- title: Results
key: main
columns:
- title: Accuracy
key: accuracy
index: 0
sorting: desc
- title: F1
key: f1
index: 1
sorting: desc
Next steps