The Agent Layer

src/agents/ — the createAgent factory, the unified AgentClient, the streaming client, and the role agents. Everything that talks to a model lives here.

The unified `AgentClient`

Every lane — Cerebras, GPU, Gemini, Mock, Human — implements the same interface (src/shared/contract.ts):

interface AgentClient {
  name: string;
  run(scenario: TaskScenario, task: TaskType): Promise<AgentResult>;
}

That symmetry is what guarantees a fair race: identical interface, identical pipeline, identical grader. Swapping a lane's client is a one-line change.

`createAgent(config)` — the one factory

All five agent roles are built from the same factory (src/agents/createAgent.ts). Most of buildMessages, parse, and the streaming call are identical across every agent — only the schema and system prompt differ. A new agent is a ~15-line config:

createAgent({
  id, role, label, modality, taskTypeId?,
  outputSchema,         // the single source of truth
  systemPrompt,
  buildMessages(input), // async (vision images are base64-encoded first)
})

run() streams the completion, parses the JSON against the schema, and records real tokens/sec. Parsing is tolerant — parseOutput strips markdown fences, extracts the first balanced {...}, validates against the Zod schema, and coerces obvious stringified numbers/booleans as a last resort. (This is the defensive fallback; the Worker's streamObject now emits schema-valid JSON directly.)

The streaming client

src/agents/streaming.ts — streamChat() POSTs to the Worker (/api/chat) and parses the OpenAI-shaped SSE the Worker emits. It measures real tokens/sec off the stream (estimated tokens ÷ send-to-last-token wall-clock), which feeds the speedometer.

The request body carries the schema identity the Worker needs to run structured output:

{ provider, model?, role, taskTypeId, messages, temperature, max_tokens }

ProviderConfig.provider is 'cerebras' | 'openrouter' | 'nvidia' | 'gemini'. Vision images are converted to base64 data URLs before sending (src/agents/image.ts) — a relative asset path can't be consumed by the model.

The six clients (`src/agents/clients.ts`)

Client	What it does
`makeCerebrasClient`	Cerebras lane via `@ai-sdk/openai-compatible`
`makeOpenRouterClient`	GPU challenger via `@ai-sdk/openai-compatible`
`makeNvidiaClient`	Alternative GPU challenger (NVIDIA NIM), same wrapper — selectable per-run
`makeGeminiClient`	Gemini lane via `@ai-sdk/google`
`makeMockClient`	Deterministic fake: draws output from ground truth, tunable `errorRate`, fake latency + tokens/sec. The fake-first fallback.
`makeHumanClient`	Awaits a resolver (the click-to-classify overlay); measures wall-clock, `tokensPerSec: 0`

The mock profiles (MOCK_PROFILES) tune the visible gap: Cerebras fast (1800 tok/s), Gemini mid (740), GPU slow (110), human zero.

The role agents (`src/agents/roles.ts`)

Router, Checker, Escalation — each a createAgent() config with its own schema (RouterSchema, CheckerSchema, EscalationSchema in src/tasks/schemas.ts). They take (scenario) or { scenario, workerOutput } as input and don't construct a provider themselves — they receive it at .run() time from the orchestrator.

How a call flows end to end

Lane.run(scenario, task)
  → workerAgentFromTask(task).run(scenario, provider)
    → streamChat(messages, provider, ctx { role, taskTypeId })
      → POST /api/chat  (Worker runs streamObject + re-wraps as SSE)
      ← SSE delta frames (schema-valid JSON chunks)
    → parseOutput(text, schema)  [tolerant fallback]
  → AgentResult { output, raw, tokens, latencyMs, tokensPerSec }

The orchestrator chains these (router → worker → checker → escalation) per item. See The Multi-Agent Pipeline.

The unified AgentClient​

createAgent(config) — the one factory​

The streaming client​

The six clients (src/agents/clients.ts)​

The role agents (src/agents/roles.ts)​

How a call flows end to end​