Create a Custom Environment

If the Environments Hub doesn’t cover your task, you can author your own evaluation environment and upload it to BenchGen. Once uploaded, your environment works exactly like any hub environment — pick it, point it at a model, and run.

What you need

A custom environment is packaged as a .zip bundle. At minimum you need:

File	Required	Purpose
`competition.yaml`	Yes	Declares the environment’s structure, tasks, and leaderboard
Reference data	Yes	The ground-truth answers your scoring program compares predictions against
Scoring program	Yes	A script that receives predictions and outputs a `scores.json`
Ingestion program	No	A script that runs the model and produces predictions (needed when BenchGen runs your model, not just receives its outputs)
Input data	No	Test inputs handed to the ingestion program at run time

See Bundle structure for a full explanation of each file and YAML reference for every field in competition.yaml.

Steps

Prepare your data. Collect the test cases you want to evaluate against. Package your ground-truth answers into a reference_data.zip.
Write a scoring program. Create a script that reads a model’s predictions and compares them to your reference data. The script must write a scores.json file with one numeric value per leaderboard column you want to display.
```
{"accuracy": 0.91, "f1": 0.87}
```
Package the script and a metadata.yaml (specifying the run command) into a scoring_program.zip.
(Optional) Write an ingestion program. If BenchGen needs to run your model end-to-end rather than just receive its outputs, write an ingestion program that takes the input data, runs the model, and writes predictions to the output directory. Package it the same way as the scoring program.
Write your competition.yaml. This file wires everything together. It declares the title, tasks, which data files to use, and how to display the leaderboard. See the YAML reference for all available fields.

Assemble the bundle. Put all files into a single .zip. The competition.yaml must be at the root level.

my_environment.zip
├── competition.yaml
├── reference_data.zip
├── scoring_program.zip
├── ingestion_program.zip   ← optional
└── input_data.zip          ← optional

Upload to BenchGen. Go to Environments → New environment → Upload bundle, drop in your .zip, and click Create. BenchGen validates the structure and registers the environment in your workspace immediately.

After uploading

Your environment appears in your workspace’s environment list. You can:

Run it against any model from Eval → Run
Share it with your team so others can use it in their runs
Update it by uploading a revised bundle — previous runs keep a snapshot of the version they used

Next steps

Bundle structure — detailed breakdown of every file in the .zip
YAML reference — every field in competition.yaml explained
Run a benchmark — run your new environment against a model

Get started

Agents

Eval

Train

Create a Custom Environment

What you need

Steps

After uploading

Next steps

Get started

Agents

Eval

Train

Documentation Index

​What you need

​Steps

​After uploading

​Next steps

What you need

Steps

After uploading

Next steps