Limits & latency
The honest numbers on size caps, cost, and where the time goes.
Size limits
| Limit | Value | What happens past it |
|---|---|---|
Total sources text per call | ~100k characters | 400 sources_too_large |
Cases per verified.eval request | 200 | 400 too_many_cases — page larger sets client-side |
sources is your own retrieved context — keep it to the chunks that could plausibly answer
the question. More sources is more to match against, not better grounding.
Cost: a verified call is ≥ 1 model call
Verification is a layer on top of a normal gateway call, so the model usage is billed as usual on your gateway key:
verified.createroutes at least one model call (the EXTRACT step).usage.modelCallsreports how many.retry: Nre-generates while any claim is still unsupported, up toNmore times — so a call withretrycan cost up toN + 1model calls. It stops early once fully grounded.verified.evalis one model call per case. A 200-case dataset is 200 model calls; pick a fast model for large runs (see Models).
The deterministic VERIFY step itself adds no model cost — it’s local string matching.
Where the latency goes
A verified call’s wall-clock time is almost entirely the underlying model call. The verification step is plain substring matching over your sources (bounded by the ~100k-char budget), so it’s negligible next to generation — microseconds-to-milliseconds of local CPU, not a second model round-trip.
That means the practical latency levers are the ones you already know:
- Model choice dominates. A
responses-endpoint model likegpt-5.5-procan take 10–20s; a fast chat model (claude-haiku-4-5-20251001,gpt-5-mini,gemini-2.5-flash) returns in ~1–2s. Same verification either way. retrymultiplies latency the same way it multiplies cost — each retry is another full generation.verified.createis non-streaming by design: the claim set has to be complete before the check can return a trustworthy answer. If you need token streaming, usechat.completions.create(unverified).
Rate limits & quotas
Verified output uses the same key and the same gateway quota as your other maxmodel.com calls — there’s no separate rate-limit pool. If you hit the gateway’s limit you’ll get the standard rate-limit error (see Errors); back off and retry.