Documentation Index
Fetch the complete documentation index at: https://docs.benchgen.com/llms.txt
Use this file to discover all available pages before exploring further.
Self-Improvement Loop
BenchGen is built around one idea: agents should improve themselves. Each cycle through the platform produces better agents, richer synthetic data, and tighter benchmarks — automatically.The Four Stages
Simulate
BenchGen ingests your enterprise data — CRM records, ERP logs, support tickets, data warehouse exports — and builds a digital twin of your business. Agents operate inside this simulation, encountering the same complexity as real workflows but in a safe, controlled environment. Output: interaction logs and trajectories that capture how agents succeed and fail.Train
Trajectories from simulation become the training signal. RL agents learn by trial, error, and feedback — reinforced on the behaviors that lead to good outcomes and penalized on failures. Fine-tuning is done with LoRA so iteration is fast and adapters are lightweight. Output: a fine-tuned model that handles your specific business context better than a generic base model.Generate
The trained model runs in the simulation at scale, generating unlimited synthetic data and trajectories. This is the factory part — BenchGen produces the labeled examples that would otherwise require expensive human annotation or waiting for real-world events. Output: a rich synthetic dataset ready to fuel the next training round or export to external tools.Evaluate
Generated data and agent interactions are benchmarked against real tasks. Eval surfaces exactly where agents still fail — which becomes the input for the next Simulate run. Every benchmark narrows the gap between simulation performance and production performance. Output: structured failure cases that drive the next iteration.Why continuous improvement matters
Most agent deployments degrade over time as business context shifts. BenchGen’s loop means agents adapt:- Grounded in your data — simulation reflects your actual workflows, not generic tasks.
- No human bottleneck — synthetic data generation replaces waiting for labeled real-world examples.
- Short cycles — a benchmark failure can kick off a new training run in minutes.
- Measurable progress — every loop is justified by an eval result, not a guess.
Next steps
- Quickstart — run your first simulation and training loop
- Agents overview — set up your simulation environment
- Eval overview — benchmark your agents
- Train overview — fine-tune on generated trajectories