Extract (verified JSON)

Verified structured extraction

Pull structured fields out of your sources where every field value is a verbatim substring of a source — or null. Deterministic, no LLM judge. Unlike schema validators (which check the shape of JSON), this checks each field’s value is actually grounded.

  • SDK: mx.verified.extract(params)
  • HTTP: POST https://api.maxmodel.com/v1/verified/extract

SDK

const r = await mx.verified.extract({
  model: 'gpt-5.5-pro',
  sources: [
    { id: 'pricing.md', text: 'The Pro plan costs $29/month and includes 50 seats.' },
    { id: 'sla.md', text: 'Uptime SLA is 99.9%. Support responds within 24 hours.' },
  ],
  schema: {
    price: 'the monthly price of the Pro plan',
    sla: 'the uptime SLA percentage',
    ceo: 'the name of the CEO',
  },
})
 
r.fields.price // { value: '$29/month', source: 'pricing.md', range: [19, 28], grounded: true }
r.fields.ceo   // { value: null, grounded: false, reason: 'not_found' }  ← not in the sources
r.coverage     // 0.67  (grounded fields / total fields)
r = mx.verified.extract(model="gpt-5.5-pro", sources=[...], schema={
    "price": "the monthly price", "ceo": "the CEO name",
})
r.fields["price"].value      # "$29/month"
r.fields["price"].source     # "pricing.md"
r.fields["ceo"].grounded     # False  (reason: "not_found")
r.coverage

Request / response (HTTP)

// POST /v1/verified/extract
{ "model": "gpt-5.5-pro", "mode": "strict",
  "sources": [ { "id": "pricing.md", "text": "..." } ],
  "schema": { "price": "the monthly price", "ceo": "the CEO name" } }
{ "id": "ext_...", "model": "gpt-5.5-pro",
  "fields": {
    "price": { "value": "$29/month", "source": "pricing.md", "range": [19, 28], "grounded": true },
    "ceo":   { "value": null, "grounded": false, "reason": "not_found" }
  },
  "coverage": 0.5,
  "usage": { "prompt_tokens": 120, "completion_tokens": 30, "total_tokens": 150, "model_calls": 1 } }

How it works

The model is asked to return each field as an exact substring of a source. Then code checks each value appears verbatim in some source (the same character-for-character match as verified.create, boundary-exact). A value that isn’t found is set to null with reason: 'not_found' — so a hallucinated field can’t slip through with a real-looking value. range is the offset of the match in the cited source, for highlighting.

  • schema is { fieldName: description }. Keep descriptions short and concrete.
  • mode: strict (default) or lenient (tolerates punctuation/case in the match).
  • coverage = grounded fields / total fields.