Skip to main content

The Agent Layer

src/agents/ — the createAgent factory, the unified AgentClient, the streaming client, and the role agents. Everything that talks to a model lives here.

The unified AgentClient

Every lane — Cerebras, GPU, Gemini, Mock, Human — implements the same interface (src/shared/contract.ts):

interface AgentClient {
name: string;
run(scenario: TaskScenario, task: TaskType): Promise<AgentResult>;
}

That symmetry is what guarantees a fair race: identical interface, identical pipeline, identical grader. Swapping a lane's client is a one-line change.

createAgent(config) — the one factory

All five agent roles are built from the same factory (src/agents/createAgent.ts). Most of buildMessages, parse, and the streaming call are identical across every agent — only the schema and system prompt differ. A new agent is a ~15-line config:

createAgent({
id, role, label, modality, taskTypeId?,
outputSchema, // the single source of truth
systemPrompt,
buildMessages(input), // async (vision images are base64-encoded first)
})

run() streams the completion, parses the JSON against the schema, and records real tokens/sec. Parsing is tolerantparseOutput strips markdown fences, extracts the first balanced {...}, validates against the Zod schema, and coerces obvious stringified numbers/booleans as a last resort. (This is the defensive fallback; the Worker's streamObject now emits schema-valid JSON directly.)

The streaming client

src/agents/streaming.tsstreamChat() POSTs to the Worker (/api/chat) and parses the OpenAI-shaped SSE the Worker emits. It measures real tokens/sec off the stream (estimated tokens ÷ send-to-last-token wall-clock), which feeds the speedometer.

The request body carries the schema identity the Worker needs to run structured output:

{ provider, model?, role, taskTypeId, messages, temperature, max_tokens }

ProviderConfig.provider is 'cerebras' | 'openrouter' | 'nvidia' | 'gemini'. Vision images are converted to base64 data URLs before sending (src/agents/image.ts) — a relative asset path can't be consumed by the model.

The six clients (src/agents/clients.ts)

ClientWhat it does
makeCerebrasClientCerebras lane via @ai-sdk/openai-compatible
makeOpenRouterClientGPU challenger via @ai-sdk/openai-compatible
makeNvidiaClientAlternative GPU challenger (NVIDIA NIM), same wrapper — selectable per-run
makeGeminiClientGemini lane via @ai-sdk/google
makeMockClientDeterministic fake: draws output from ground truth, tunable errorRate, fake latency + tokens/sec. The fake-first fallback.
makeHumanClientAwaits a resolver (the click-to-classify overlay); measures wall-clock, tokensPerSec: 0

The mock profiles (MOCK_PROFILES) tune the visible gap: Cerebras fast (1800 tok/s), Gemini mid (740), GPU slow (110), human zero.

The role agents (src/agents/roles.ts)

Router, Checker, Escalation — each a createAgent() config with its own schema (RouterSchema, CheckerSchema, EscalationSchema in src/tasks/schemas.ts). They take (scenario) or { scenario, workerOutput } as input and don't construct a provider themselves — they receive it at .run() time from the orchestrator.

How a call flows end to end

Lane.run(scenario, task)
→ workerAgentFromTask(task).run(scenario, provider)
→ streamChat(messages, provider, ctx { role, taskTypeId })
→ POST /api/chat (Worker runs streamObject + re-wraps as SSE)
← SSE delta frames (schema-valid JSON chunks)
→ parseOutput(text, schema) [tolerant fallback]
→ AgentResult { output, raw, tokens, latencyMs, tokensPerSec }

The orchestrator chains these (router → worker → checker → escalation) per item. See The Multi-Agent Pipeline.