Custom extraction

Any document.Your schema.

Define the fields you need. DocParse fills them — across PDFs, scans, photos, and emails — in any language. No templates. No model training.

100 pages free No card SOC 2 in progress
app.docparse.io / extractions / rfp-response-2026.pdf
Live
PROPOSAL — RFP-2026-114
Halcyon Logistics → Switchback Industries
Submitted byHalcyon Logistics, LLC
ContactErin Voss · evoss@halcyon.co
Bid totalUSD 248,400
Term24 months, renewable
EffectiveJune 1, 2026
Notice90 days written
JSONCSVWebhook98.6% confidence
{
  "vendor": "Halcyon Logistics, LLC",
  "rfp_id": "RFP-2026-114",
  "contact_email": "evoss@halcyon.co",
  "bid_total_usd": 248400,
  "term_months": 24,
  "effective_date": "2026-06-01",
  "notice_days": 90
}
Extracted in 2.4s · 7 fields
Scroll
Pipe extracted data into any of these — via Zapier or signed webhooks
Zapier
Google Drive
Gmail
Slack
Sheets
Notion
Airtable
Webhook
Zapier
Google Drive
Gmail
Slack
Sheets
Notion
Airtable
Webhook
Why DocParse

Three reasons teams switch to us

01

Define your schema in plain English

Describe the fields, paste JSON, or click through a sample. We turn it into a runnable extractor in seconds.

5 minfrom upload to first JSON
02

Works on layouts you have never seen

No per-template training. Drop a new vendor, format, or language and DocParse adapts on the first run.

0training samples needed
03

Confidence on every field

Each field is scored. Auto-process the clean ones, route the uncertain ones to humans, and keep the audit trail.

99.4%auto-process rate at the median
How it works

From raw custom documents
to structured data, in four steps.

Drop document, paste URL, or POST file
PDFPNGJPGTIFFDOCXHEICHTMLEMLXLSX
The schema

Starter schema for custom documents.
Tweakable in seconds.

The custom documents template comes with a 10-field starter schema based on the most common fields teams pull from custom documents. Add your own fields, mark which are required, and change types in the dashboard or via the REST API.

Custom documents · default schema
document_typestringrequired99.6%
partiesarrayrequired98.9%
effective_datestringrequired99.2%
amountnumberoptional98.6%
currencystringoptional99.8%
term_monthsintegeroptional97.4%
jurisdictionstringoptional98.1%
reference_idstringoptional99.5%
line_itemsarrayoptional96.8%
signatoriesarrayoptional97.9%
JSON SchemaTypeScriptPython
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Customdocuments",
  "type": "object",
  "required": [
    "document_type",
    "parties",
    "effective_date"
  ],
  "properties": {
    "document_type": {
      "type": "string"
    },
    "parties": {
      "type": "array"
    },
    "effective_date": {
      "type": "string"
    },
    "amount": {
      "type": "number"
    },
    "currency": {
      "type": "string"
    },
    "term_months": {
      "type": "integer"
    },
    "jurisdiction": {
      "type": "string"
    },
    "reference_id": {
      "type": "string"
    },
    "line_items": {
      "type": "array"
    },
    "signatories": {
      "type": "array"
    }
  }
}
What to expect

Field-level accuracy per field.

Multi-modal models do the reading, and accuracy depends on document quality. The numbers below are illustrative ranges we've seen on custom documents — run your own documents and compare against a small ground-truth set before you scale.

98.4%
illustrative field-level
accuracy ceiling
10starter fields
Anylanguage supported
25 MBmax file size
FieldAccuracy
parties
99.1%
effective_date
99.6%
amount
98.9%
currency
99.8%
term_months
97.4%
jurisdiction
97.8%
line_items
96.8%
signatories
98%
The API

One endpoint.
Every output you need.

# Extract with one POST
curl -X POST "https://api.docparse.io/v1/extract" \
  -H "Authorization: Bearer $DOCPARSE_KEY" \
  -F file=@"rfp-2026-114.pdf" \
  -F schema="custom-rfp" \
  -F webhook="https://api.acme.co/incoming"

# Returns:
{
  "status": "complete",
  "confidence": 0.987,
  "latency_ms": 2412,
  "data": { ... }
}

Plain HTTP, no SDK lock-in

Bearer-token auth with revocable, SHA-256-hashed API keys. Call it from any language that can hit a REST endpoint — we publish docs and copy-pasteable snippets, not opinionated wrappers.

cURLPythonNode.jsGoRubyPHPJava.NET

Signed webhooks for async

Register an endpoint, set the events, and we POST signed deliveries (HMAC-SHA256, Standard Webhooks spec) as extractions finish. Every attempt is logged in the dashboard with response code, body, and timing.

Webhook delivery log · per-endpoint retries
The alternatives

Why teams switch from regex.

A look at how DocParse compares to the three things you've probably already tried.

Regex + scripts
Manual review (BPO)
Textract / FormRecognizer
DocParse
Works on a layout it has never seen
partial
Handles handwriting and scans
partial
Custom fields without per-vendor setup
Multi-lingual out of the box
partial
REST API + signed webhooks + Zapier
partial
partial
Pricing scales with pages, not seats
partial
Free tier, every month, forever
partial
Time-to-first-extraction
Days
Days
Weeks
5 minutes
Where the data goes

Reach the tools you already run.

DocParse ships two integration surfaces directly — REST API and signed webhooks — plus a native Zapier app that opens up everything else.

Zapier
Automation
Webhooks
API
REST API
API
JSON export
Export
CSV export
Export
Google Drive
via Zapier
Google Sheets
via Zapier
Gmail
via Zapier
Outlook
via Zapier
Slack
via Zapier
Dropbox
via Zapier
Airtable
via Zapier
Notion
via Zapier
HubSpot
via Zapier
Salesforce
via Zapier
Make.com
via Webhooks
n8n
via Webhooks
Postgres
via Webhooks
REST API · Signed webhooks (HMAC-SHA256) · Zapier to 6,000+ apps · JSON / CSV export
Common patterns

How teams use DocParse for custom documents.

Illustrative scenarios drawn from teams piloting DocParse — names and figures are examples, not customer quotes.

We had a regex pipeline for vendor onboarding paperwork. It broke every time a new template arrived. DocParse replaced 1,800 lines of code with one schema.

LP
Lena Park
Staff Eng · Northwave
1,800 LoCreplaced with one schema

The thing that closed it for us: confidence scores per field. We auto-process 94% of intakes and only route the rest. Our reviewers are bored now.

MC
Marcus Chen
Head of Ops · Halcyon
94%auto-processed without review

We extract 40 fields from procurement docs in 11 languages. Switched from a vendor that needed 200 sample docs per template. We needed zero.

AR
Aditi Rao
Director, Procurement · Tarn Industries
11 langswith no per-language tuning
Frequently asked

The questions teams ask before they sign up.

Stop writing per-template extractors.

Define a schema once. Run it on every document type your team handles, in every language. Free for the first 100 pages.

Free for first 100 pages 5-minute setup No credit card