Automated Document Recognition

The legacy approach: OCR first, then rules

Traditional pipelines run optical character recognition (OCR) over the document to get a flat block of text, then apply regular expressions or template-based rule engines to find specific fields. This works for tightly controlled inputs — a single vendor's invoice template that never changes — but breaks the moment a new layout, language, or scan quality appears.

It also loses structure. An OCR-flat document doesn't know that a number sits inside a table cell, that a checkbox is ticked, or that a signature is in the right place.

The modern approach: multi-modal models

Multi-modal AI models read the document as an image and as text simultaneously. They see the layout — columns, tables, checkboxes, signatures, stamps — and they read the values in context. The same model that recognises that a PDF is an invoice can immediately fill the fields you ask for.

DocParse is built on this approach. There is no separate OCR pre-processor and no separate rule layer — the model is asked, in a structured prompt, to return the fields you defined, and it returns valid JSON.

How DocParse handles structure

DocParse exposes seven document option flags you can toggle per extraction. Each one shapes how the model interprets the page:

tables — return rows as nested arrays of objects, preserving column relationships
charts — extract values from bars, lines, and pies, not just the surrounding text
checkboxes — return checked / unchecked state for each labelled box
handwritten — bias the model toward freehand recognition (legibility still matters)
multi-page — extract across the full document instead of just the first page
split-PDF — split a multi-doc PDF into individual extractions
specific-pages — restrict extraction to a page range you specify

What about classification?

Recognition and classification are related but distinct: recognition is about reading what's on the page, classification is about deciding what kind of document it is. DocParse ships both — the Document Classification module lets you define categories with names, descriptions, and keywords, and optionally route each category to a target extraction.

Common pattern: route incoming Gmail attachments through a classifier first to label them as invoice / receipt / contract, then run each through the matching extraction template.

File formats

DocParse accepts PDF, PNG, JPG / JPEG, WEBP, DOCX, and plain text. DOCX is converted to text before the model sees it (preserving paragraph structure). PDFs and images go directly to the multi-modal model. Hard limits: 25 MB per file, 30 files per batch.

How automated document recognition works.

The legacy approach: OCR first, then rules

The modern approach: multi-modal models

How DocParse handles structure

What about classification?

File formats

Recognition, classification, and extraction.

Document information extraction, explained

Intelligent data automation, in practice

Document extraction for business intelligence