YAML Reference

competition.yaml is the entry point for every custom environment bundle. BenchGen reads this file to understand the environment’s structure, which data and programs to use, and how to display results.

Top-level fields

title: My Evaluation Environment
description: A short description shown in the Environments Hub.
image: logo.png
terms: terms.md
pages:
  - title: Overview
    file: overview.md
  - title: Evaluation
    file: evaluation.md
phases:
  - ...
tasks:
  - ...
leaderboard:
  - ...

Field	Required	Description
`title`	Yes	Display name shown in the hub and run UI
`description`	No	One-line summary shown on the environment card
`image`	No	Path to a logo image inside the bundle
`terms`	No	Path to a Markdown file with terms of use
`pages`	No	Additional Markdown pages rendered as environment documentation tabs

`pages`

Optional documentation tabs displayed alongside the environment. Each entry needs a title and a file path relative to the bundle root.

pages:
  - title: Overview
    file: overview.md
  - title: Data format
    file: data.md

`phases`

Phases define the active evaluation windows. Most environments have a single phase. Each phase references one or more tasks by their index.

phases:
  - index: 0
    name: Evaluation
    description: Main evaluation phase
    start: 2025-01-01
    end: 2027-12-31
    tasks:
      - 0

Field	Required	Description
`index`	Yes	Zero-based integer identifier for this phase
`name`	Yes	Display name
`description`	No	Short description of the phase
`start`	No	ISO 8601 date when the phase opens (`YYYY-MM-DD`)
`end`	No	ISO 8601 date when the phase closes
`tasks`	Yes	List of task indices active in this phase

`tasks`

Each task defines a single evaluation problem — the data and programs needed to score one type of submission.

tasks:
  - index: 0
    name: Main task
    description: Evaluate model accuracy on the test set
    scoring_program: scoring_program.zip
    reference_data: reference_data.zip
    ingestion_program: ingestion_program.zip   # optional
    input_data: input_data.zip                 # optional

Field	Required	Description
`index`	Yes	Zero-based integer identifier, referenced by phases
`name`	Yes	Display name
`description`	No	What the task is evaluating
`scoring_program`	Yes	Path to the scoring program zip inside the bundle
`reference_data`	Yes	Path to the reference data zip
`ingestion_program`	No	Path to the ingestion program zip
`input_data`	No	Path to the input data zip

`solutions`

Optional reference solutions included with the bundle. BenchGen uses these to verify that the scoring program works correctly before the environment is published.

solutions:
  - index: 0
    path: example_solution.zip
    tasks:
      - 0

Field	Required	Description
`index`	Yes	Zero-based identifier
`path`	Yes	Path to the solution zip inside the bundle
`tasks`	Yes	Task indices this solution applies to

`leaderboard`

Defines the columns displayed in the results table. Keys must match the keys your scoring program writes to scores.json.

leaderboard:
  - title: Results
    key: main
    columns:
      - title: Accuracy
        key: accuracy
        index: 0
        sorting: desc
      - title: F1
        key: f1
        index: 1
        sorting: desc

Leaderboard group fields

Field	Required	Description
`title`	Yes	Section heading in the results table
`key`	Yes	Unique identifier for this leaderboard group
`columns`	Yes	List of column definitions (see below)

Column fields

Field	Required	Description
`title`	Yes	Column heading
`key`	Yes	Must match the key in `scores.json` output by your scoring program
`index`	Yes	Display order (zero-based)
`sorting`	No	`asc` or `desc` — direction used to rank submissions. Defaults to `desc`

Complete example

title: Text Classification Benchmark
description: Evaluates model accuracy on a multi-class text classification task.
image: logo.png

pages:
  - title: Overview
    file: overview.md
  - title: Data format
    file: data.md

phases:
  - index: 0
    name: Evaluation
    start: 2025-01-01
    end: 2027-12-31
    tasks:
      - 0

tasks:
  - index: 0
    name: Classification
    description: Predict the correct category for each input text
    scoring_program: scoring_program.zip
    reference_data: reference_data.zip
    input_data: input_data.zip
    ingestion_program: ingestion_program.zip

solutions:
  - index: 0
    path: example_solution.zip
    tasks:
      - 0

leaderboard:
  - title: Results
    key: main
    columns:
      - title: Accuracy
        key: accuracy
        index: 0
        sorting: desc
      - title: F1
        key: f1
        index: 1
        sorting: desc

Next steps

Bundle structure — the files inside the .zip and what each one does
Create a custom environment — end-to-end upload walkthrough

Get started

Agents

Eval

Train

Top-level fields

`pages`

`phases`

`tasks`

`solutions`

`leaderboard`

Leaderboard group fields

Column fields

Complete example

Next steps

Get started

Agents

Eval

Train

Documentation Index

​Top-level fields

​pages

​phases

​tasks

​solutions

​leaderboard

​Leaderboard group fields

​Column fields

​Complete example

​Next steps

Top-level fields

`pages`

`phases`

`tasks`

`solutions`

`leaderboard`

Leaderboard group fields

Column fields

Complete example

Next steps