Custom extraction

Any document.Your schema.

Define the fields you need. DocParse fills them — across PDFs, scans, photos, and emails — in any language. No templates. No model training.

Start extracting See the API

100 pages free No card SOC 2 in progress

app.docparse.io / extractions / rfp-response-2026.pdf

Live

PROPOSAL — RFP-2026-114

Halcyon Logistics → Switchback Industries

Submitted byHalcyon Logistics, LLC

ContactErin Voss · evoss@halcyon.co

Bid totalUSD 248,400

Term24 months, renewable

EffectiveJune 1, 2026

Notice90 days written

JSONCSVWebhook98.6% confidence

{
  "vendor": "Halcyon Logistics, LLC",
  "rfp_id": "RFP-2026-114",
  "contact_email": "evoss@halcyon.co",
  "bid_total_usd": 248400,
  "term_months": 24,
  "effective_date": "2026-06-01",
  "notice_days": 90
}

Extracted in 2.4s · 7 fields

Scroll

Pipe extracted data into any of these — via Zapier or signed webhooks

Zapier

Google Drive

Gmail

Slack

Sheets

Notion

Airtable

Webhook

Zapier

Google Drive

Gmail

Slack

Sheets

Notion

Airtable

Webhook

Why DocParse

Three reasons teams switch to us

Define your schema in plain English

Describe the fields, paste JSON, or click through a sample. We turn it into a runnable extractor in seconds.

5 minfrom upload to first JSON

Works on layouts you have never seen

No per-template training. Drop a new vendor, format, or language and DocParse adapts on the first run.

0training samples needed

Confidence on every field

Each field is scored. Auto-process the clean ones, route the uncertain ones to humans, and keep the audit trail.

99.4%auto-process rate at the median

How it works

From raw custom documents
to structured data, in four steps.

Drop document, paste URL, or POST file

PDFPNGJPGTIFFDOCXHEICHTMLEMLXLSX

The schema

Starter schema for custom documents.
Tweakable in seconds.

The custom documents template comes with a 10-field starter schema based on the most common fields teams pull from custom documents. Add your own fields, mark which are required, and change types in the dashboard or via the REST API.

Custom documents · default schema

document_typestringrequired99.6%

partiesarrayrequired98.9%

effective_datestringrequired99.2%

amountnumberoptional98.6%

currencystringoptional99.8%

term_monthsintegeroptional97.4%

jurisdictionstringoptional98.1%

reference_idstringoptional99.5%

line_itemsarrayoptional96.8%

signatoriesarrayoptional97.9%

JSON SchemaTypeScriptPython

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Customdocuments",
  "type": "object",
  "required": [
    "document_type",
    "parties",
    "effective_date"
  ],
  "properties": {
    "document_type": {
      "type": "string"
    },
    "parties": {
      "type": "array"
    },
    "effective_date": {
      "type": "string"
    },
    "amount": {
      "type": "number"
    },
    "currency": {
      "type": "string"
    },
    "term_months": {
      "type": "integer"
    },
    "jurisdiction": {
      "type": "string"
    },
    "reference_id": {
      "type": "string"
    },
    "line_items": {
      "type": "array"
    },
    "signatories": {
      "type": "array"
    }
  }
}

What to expect

Field-level accuracy per field.

Multi-modal models do the reading, and accuracy depends on document quality. The numbers below are illustrative ranges we've seen on custom documents — run your own documents and compare against a small ground-truth set before you scale.

98.4%

illustrative field-level
accuracy ceiling

10starter fields

Anylanguage supported

25 MBmax file size

FieldAccuracy

parties

99.1%

effective_date

99.6%

amount

98.9%

currency

99.8%

term_months

97.4%

jurisdiction

97.8%

line_items

96.8%

signatories

98%

The API

One endpoint.
Every output you need.

# Extract with one POST
curl -X POST "https://api.docparse.io/v1/extract" \
  -H "Authorization: Bearer $DOCPARSE_KEY" \
  -F file=@"rfp-2026-114.pdf" \
  -F schema="custom-rfp" \
  -F webhook="https://api.acme.co/incoming"

# Returns:
{
  "status": "complete",
  "confidence": 0.987,
  "latency_ms": 2412,
  "data": { ... }
}

Plain HTTP, no SDK lock-in

Bearer-token auth with revocable, SHA-256-hashed API keys. Call it from any language that can hit a REST endpoint — we publish docs and copy-pasteable snippets, not opinionated wrappers.

cURLPythonNode.jsGoRubyPHPJava.NET

Signed webhooks for async

Register an endpoint, set the events, and we POST signed deliveries (HMAC-SHA256, Standard Webhooks spec) as extractions finish. Every attempt is logged in the dashboard with response code, body, and timing.

Webhook delivery log · per-endpoint retries

The alternatives

Why teams switch from regex.

A look at how DocParse compares to the three things you've probably already tried.

Regex + scripts

Manual review (BPO)

Textract / FormRecognizer

DocParse

Works on a layout it has never seen

partial

Handles handwriting and scans

partial

Custom fields without per-vendor setup

Multi-lingual out of the box

partial

REST API + signed webhooks + Zapier

partial

Pricing scales with pages, not seats

partial

Free tier — 100 pages on signup

partial

Time-to-first-extraction

Days

Weeks

5 minutes

Where the data goes

Reach the tools you already run.

DocParse ships two integration surfaces directly — REST API and signed webhooks — plus a native Zapier app that opens up everything else.

Zapier

Automation

Webhooks

API

REST API

API

JSON export

Export

CSV export

Export

Google Drive

via Zapier

Google Sheets

via Zapier

Gmail

via Zapier

Outlook

via Zapier

Slack

via Zapier

Dropbox

via Zapier

Airtable

via Zapier

Notion

via Zapier

HubSpot

via Zapier

Salesforce

via Zapier

Make.com

via Webhooks

n8n

via Webhooks

Postgres

via Webhooks

REST API · Signed webhooks (HMAC-SHA256) · Zapier to 6,000+ apps · JSON / CSV export

Common patterns

How teams use DocParse for custom documents.

Illustrative scenarios drawn from teams piloting DocParse — names and figures are examples, not customer quotes.

“

We had a regex pipeline for vendor onboarding paperwork. It broke every time a new template arrived. DocParse replaced 1,800 lines of code with one schema.

Lena Park

Staff Eng · Northwave

1,800 LoCreplaced with one schema

“

The thing that closed it for us: confidence scores per field. We auto-process 94% of intakes and only route the rest. Our reviewers are bored now.

Marcus Chen

Head of Ops · Halcyon

94%auto-processed without review

“

We extract 40 fields from procurement docs in 11 languages. Switched from a vendor that needed 200 sample docs per template. We needed zero.

Aditi Rao

Director, Procurement · Tarn Industries

11 langswith no per-language tuning

Frequently asked

The questions teams ask before they sign up.

How do I define a custom schema?

Describe the fields in plain English, paste a JSON Schema, or click through a sample document. DocParse turns whichever you give it into a runnable extractor in seconds and lets you tune fields one at a time.

Will it work on a document type I have never uploaded?

Yes. There is no per-template training. DocParse adapts to layouts it has not seen before. If accuracy is below your confidence threshold, the system tags the document for review automatically.

Can it pull tables and nested objects?

Yes. Schemas can include arrays of objects, conditional fields, and references to other fields. The output is valid JSON, ready for your pipeline.

What about handwritten or low-quality scans?

Handwriting works in 18 languages. Every field is scored with a confidence value so you can branch on it — auto-process clean pages, route the rest.

Can I use this on-prem?

Yes. DocParse runs as a self-hosted container with the same API. Common for healthcare, defense, and finance customers under data-residency rules.

How is pricing calculated?

Per page, per parse — not per field, document type, or seat. Volume discounts kick in automatically. Re-runs on the same page cost nothing.

Stop writing per-template extractors.

Define a schema once. Run it on every document type your team handles, in every language. Free for the first 100 pages.

Start extracting Talk to sales

Free for first 100 pages 5-minute setup No credit card

Any document.Your schema.

Three reasons teams switch to us

Define your schema in plain English

Works on layouts you have never seen

Confidence on every field

From raw custom documents
to structured data, in four steps.

Define the fields

Upload the file

Multi-modal AI reads it

Get structured data back

Starter schema for custom documents.
Tweakable in seconds.

Field-level accuracy per field.

One endpoint.
Every output you need.

Plain HTTP, no SDK lock-in

Signed webhooks for async

Why teams switch from regex.

Reach the tools you already run.

How teams use DocParse for custom documents.

The questions teams ask before they sign up.

Stop writing per-template extractors.

Any document.Your schema.

Three reasons teams switch to us

Define your schema in plain English

Works on layouts you have never seen

Confidence on every field

From raw custom documentsto structured data, in four steps.

Define the fields

Upload the file

Multi-modal AI reads it

Get structured data back

Starter schema for custom documents.Tweakable in seconds.

Field-level accuracy per field.

One endpoint.Every output you need.

Plain HTTP, no SDK lock-in

Signed webhooks for async

Why teams switch from regex.

Reach the tools you already run.

How teams use DocParse for custom documents.

The questions teams ask before they sign up.

Stop writing per-template extractors.

From raw custom documents
to structured data, in four steps.

Starter schema for custom documents.
Tweakable in seconds.

One endpoint.
Every output you need.