Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.benchgen.com/llms.txt

Use this file to discover all available pages before exploring further.

If the Environments Hub doesn’t cover your task, you can author your own evaluation environment and upload it to BenchGen. Once uploaded, your environment works exactly like any hub environment — pick it, point it at a model, and run. Bundle creation and submission overview

What you need

A custom environment is packaged as a .zip bundle. At minimum you need:
FileRequiredPurpose
competition.yamlYesDeclares the environment’s structure, tasks, and leaderboard
Reference dataYesThe ground-truth answers your scoring program compares predictions against
Scoring programYesA script that receives predictions and outputs a scores.json
Ingestion programNoA script that runs the model and produces predictions (needed when BenchGen runs your model, not just receives its outputs)
Input dataNoTest inputs handed to the ingestion program at run time
See Bundle structure for a full explanation of each file and YAML reference for every field in competition.yaml.

Steps

  1. Prepare your data. Collect the test cases you want to evaluate against. Package your ground-truth answers into a reference_data.zip.
  2. Write a scoring program. Create a script that reads a model’s predictions and compares them to your reference data. The script must write a scores.json file with one numeric value per leaderboard column you want to display.
    {"accuracy": 0.91, "f1": 0.87}
    
    Package the script and a metadata.yaml (specifying the run command) into a scoring_program.zip.
  3. (Optional) Write an ingestion program. If BenchGen needs to run your model end-to-end rather than just receive its outputs, write an ingestion program that takes the input data, runs the model, and writes predictions to the output directory. Package it the same way as the scoring program.
  4. Write your competition.yaml. This file wires everything together. It declares the title, tasks, which data files to use, and how to display the leaderboard. See the YAML reference for all available fields.
  5. Assemble the bundle. Put all files into a single .zip. The competition.yaml must be at the root level.
    my_environment.zip
    ├── competition.yaml
    ├── reference_data.zip
    ├── scoring_program.zip
    ├── ingestion_program.zip   ← optional
    └── input_data.zip          ← optional
    
  6. Upload to BenchGen. Go to Environments → New environment → Upload bundle, drop in your .zip, and click Create. BenchGen validates the structure and registers the environment in your workspace immediately.

After uploading

Your environment appears in your workspace’s environment list. You can:
  • Run it against any model from Eval → Run
  • Share it with your team so others can use it in their runs
  • Update it by uploading a revised bundle — previous runs keep a snapshot of the version they used

Next steps