Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.benchgen.com/llms.txt

Use this file to discover all available pages before exploring further.

Run a Benchmark

A benchmark run evaluates a model against a set of test cases and produces a structured results report.

Prerequisites

  • A model connected or uploaded in Eval (see Upload a model)
  • At least one benchmark dataset available in your workspace

Steps

  1. Open Eval from the left sidebar.
  2. Select a benchmark from the benchmark library or upload your own.
  3. Choose a model to evaluate — pick from connected APIs or uploaded models.
  4. Configure run settings
    • Temperature and sampling parameters
    • Max tokens per response
    • Timeout per question
  5. Click Run to start the evaluation.
  6. Monitor progress in the run dashboard — results stream in as each test case completes.
  7. View the report when the run finishes (see Read results).

Tips

  • Run the same benchmark against multiple models to compare them side by side.
  • Start with a small benchmark (10–50 cases) to validate your setup before scaling up.
  • Save run configs to re-run the same evaluation after each training iteration.