A synthetic multi-system dataset for benchmarking agents that must reason across CRM, engineering, support, and email. Every record is cross-linked — a HubSpot deal ties to a Zendesk ticket, a Jira bug, and a Gmail thread — so each use case requires genuine multi-hop retrieval, not single-system lookup.
1Pick a use case from the left panel. Each one is a realistic natural-language prompt an agent would receive.
2Click Run to stream the ground-truth API call sequence — the exact calls a correct agent should make, with the right parameters.
3The Verify tab (right panel) opens automatically when the run finishes — it scores every step pass / partial / miss and shows what was skipped. Use Copy Summary to capture the result.
4Expand Data Sources in the left panel to see which systems are in the corpus. Expand API Inventory to browse every available endpoint and its filters before wiring up your agent.
5Full Swagger docs at /docs. Cross-system links at /api/corpus/cross_references.