Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.benchgen.com/llms.txt

Use this file to discover all available pages before exploring further.

competition.yaml is the entry point for every custom environment bundle. BenchGen reads this file to understand the environment’s structure, which data and programs to use, and how to display results.

Top-level fields

title: My Evaluation Environment
description: A short description shown in the Environments Hub.
image: logo.png
terms: terms.md
pages:
  - title: Overview
    file: overview.md
  - title: Evaluation
    file: evaluation.md
phases:
  - ...
tasks:
  - ...
leaderboard:
  - ...
FieldRequiredDescription
titleYesDisplay name shown in the hub and run UI
descriptionNoOne-line summary shown on the environment card
imageNoPath to a logo image inside the bundle
termsNoPath to a Markdown file with terms of use
pagesNoAdditional Markdown pages rendered as environment documentation tabs

pages

Optional documentation tabs displayed alongside the environment. Each entry needs a title and a file path relative to the bundle root.
pages:
  - title: Overview
    file: overview.md
  - title: Data format
    file: data.md

phases

Phases define the active evaluation windows. Most environments have a single phase. Each phase references one or more tasks by their index.
phases:
  - index: 0
    name: Evaluation
    description: Main evaluation phase
    start: 2025-01-01
    end: 2027-12-31
    tasks:
      - 0
FieldRequiredDescription
indexYesZero-based integer identifier for this phase
nameYesDisplay name
descriptionNoShort description of the phase
startNoISO 8601 date when the phase opens (YYYY-MM-DD)
endNoISO 8601 date when the phase closes
tasksYesList of task indices active in this phase

tasks

Each task defines a single evaluation problem — the data and programs needed to score one type of submission.
tasks:
  - index: 0
    name: Main task
    description: Evaluate model accuracy on the test set
    scoring_program: scoring_program.zip
    reference_data: reference_data.zip
    ingestion_program: ingestion_program.zip   # optional
    input_data: input_data.zip                 # optional
FieldRequiredDescription
indexYesZero-based integer identifier, referenced by phases
nameYesDisplay name
descriptionNoWhat the task is evaluating
scoring_programYesPath to the scoring program zip inside the bundle
reference_dataYesPath to the reference data zip
ingestion_programNoPath to the ingestion program zip
input_dataNoPath to the input data zip

solutions

Optional reference solutions included with the bundle. BenchGen uses these to verify that the scoring program works correctly before the environment is published.
solutions:
  - index: 0
    path: example_solution.zip
    tasks:
      - 0
FieldRequiredDescription
indexYesZero-based identifier
pathYesPath to the solution zip inside the bundle
tasksYesTask indices this solution applies to

leaderboard

Defines the columns displayed in the results table. Keys must match the keys your scoring program writes to scores.json.
leaderboard:
  - title: Results
    key: main
    columns:
      - title: Accuracy
        key: accuracy
        index: 0
        sorting: desc
      - title: F1
        key: f1
        index: 1
        sorting: desc

Leaderboard group fields

FieldRequiredDescription
titleYesSection heading in the results table
keyYesUnique identifier for this leaderboard group
columnsYesList of column definitions (see below)

Column fields

FieldRequiredDescription
titleYesColumn heading
keyYesMust match the key in scores.json output by your scoring program
indexYesDisplay order (zero-based)
sortingNoasc or desc — direction used to rank submissions. Defaults to desc

Complete example

title: Text Classification Benchmark
description: Evaluates model accuracy on a multi-class text classification task.
image: logo.png

pages:
  - title: Overview
    file: overview.md
  - title: Data format
    file: data.md

phases:
  - index: 0
    name: Evaluation
    start: 2025-01-01
    end: 2027-12-31
    tasks:
      - 0

tasks:
  - index: 0
    name: Classification
    description: Predict the correct category for each input text
    scoring_program: scoring_program.zip
    reference_data: reference_data.zip
    input_data: input_data.zip
    ingestion_program: ingestion_program.zip

solutions:
  - index: 0
    path: example_solution.zip
    tasks:
      - 0

leaderboard:
  - title: Results
    key: main
    columns:
      - title: Accuracy
        key: accuracy
        index: 0
        sorting: desc
      - title: F1
        key: f1
        index: 1
        sorting: desc

Next steps