Document Information Extraction

What it actually means

Document information extraction (DIE, sometimes called intelligent document processing or IDP) covers any technique that takes an unstructured document — PDF, image, scan, email, Word file — and returns a structured representation of the values inside it.

Historically, teams approached this with regular expressions, rule engines, or per-template OCR pipelines. Those approaches work on a narrow set of documents and break the moment a new vendor shows up or a layout changes.

Why modern AI changes the game

Vision-language models (the same family of models behind tools like ChatGPT and Gemini Vision) can read a document the way a person does — looking at the layout, the visual cues, the table structure, and the surrounding text — and answer questions about it. That removes the need for per-template tuning entirely.

DocParse uses Gemini 2.5 Flash as its default extraction engine (with OpenAI available as a swappable provider). Both are frontier multi-modal models that accept the file directly and return JSON in the schema you specify.

How DocParse does it

The DocParse workflow has three steps, and they are the same whether you use the dashboard or the REST API:

Define the fields you need (name, type, optional description) — either pick a built-in template or define your own custom schema
Upload PDFs, JPGs, PNGs, WEBP, or DOCX files (up to 25 MB per file, 30 files per batch)
Receive structured JSON back — export from the dashboard, poll the REST API, or have signed webhooks push deliveries to your endpoint

When extraction needs help

Some documents have structure that benefits from an extra hint. DocParse exposes seven document options you can toggle per extraction: tables, charts, checkboxes, handwritten, multi-page, split-PDF, and specific-pages. Each one tells the model to look for that specific structure, so a multi-page bank statement returns the full transaction list instead of just the summary on page one.

Languages and scripts

DocParse does not maintain a fixed list of supported languages. The underlying multi-modal model is multilingual by default, so a mixed-script document (English headers, Japanese values, an Arabic stamp) usually works without extra configuration. You can also pin a specific language per extraction if you want the model to bias for it.

Handwriting works the same way — the model recognises it as part of normal reading, and you can use the handwritten document option to nudge it for low-quality scans.

Getting the data out

Once the extraction completes, the JSON is available three ways:

Dashboard — view, edit, export as JSON or CSV
REST API — GET /api/v1/getBatchResults
Signed webhook — HMAC-SHA256 over the delivery body, per the Standard Webhooks spec

Document information extraction, explained.

What it actually means

Why modern AI changes the game

How DocParse does it

When extraction needs help

Languages and scripts

Getting the data out

Try it on your own documents.

How automated document recognition works

Intelligent data automation, in practice

Document extraction for business intelligence