We had a regex pipeline for vendor onboarding paperwork. It broke every time a new template arrived. DocParse replaced 1,800 lines of code with one schema.
Any document.Your schema.
Define the fields you need. DocParse fills them — across PDFs, scans, photos, and emails — in any language. No templates. No model training.
{
"vendor": "Halcyon Logistics, LLC",
"rfp_id": "RFP-2026-114",
"contact_email": "evoss@halcyon.co",
"bid_total_usd": 248400,
"term_months": 24,
"effective_date": "2026-06-01",
"notice_days": 90
}Three reasons teams switch to us
Define your schema in plain English
Describe the fields, paste JSON, or click through a sample. We turn it into a runnable extractor in seconds.
Works on layouts you have never seen
No per-template training. Drop a new vendor, format, or language and DocParse adapts on the first run.
Confidence on every field
Each field is scored. Auto-process the clean ones, route the uncertain ones to humans, and keep the audit trail.
From raw custom documents
to structured data, in four steps.
Starter schema for custom documents.
Tweakable in seconds.
The custom documents template comes with a 10-field starter schema based on the most common fields teams pull from custom documents. Add your own fields, mark which are required, and change types in the dashboard or via the REST API.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Customdocuments",
"type": "object",
"required": [
"document_type",
"parties",
"effective_date"
],
"properties": {
"document_type": {
"type": "string"
},
"parties": {
"type": "array"
},
"effective_date": {
"type": "string"
},
"amount": {
"type": "number"
},
"currency": {
"type": "string"
},
"term_months": {
"type": "integer"
},
"jurisdiction": {
"type": "string"
},
"reference_id": {
"type": "string"
},
"line_items": {
"type": "array"
},
"signatories": {
"type": "array"
}
}
}Field-level accuracy per field.
Multi-modal models do the reading, and accuracy depends on document quality. The numbers below are illustrative ranges we've seen on custom documents — run your own documents and compare against a small ground-truth set before you scale.
accuracy ceiling
One endpoint.
Every output you need.
# Extract with one POST
curl -X POST "https://api.docparse.io/v1/extract" \
-H "Authorization: Bearer $DOCPARSE_KEY" \
-F file=@"rfp-2026-114.pdf" \
-F schema="custom-rfp" \
-F webhook="https://api.acme.co/incoming"
# Returns:
{
"status": "complete",
"confidence": 0.987,
"latency_ms": 2412,
"data": { ... }
}Plain HTTP, no SDK lock-in
Bearer-token auth with revocable, SHA-256-hashed API keys. Call it from any language that can hit a REST endpoint — we publish docs and copy-pasteable snippets, not opinionated wrappers.
Signed webhooks for async
Register an endpoint, set the events, and we POST signed deliveries (HMAC-SHA256, Standard Webhooks spec) as extractions finish. Every attempt is logged in the dashboard with response code, body, and timing.
Why teams switch from regex.
A look at how DocParse compares to the three things you've probably already tried.
Reach the tools you already run.
DocParse ships two integration surfaces directly — REST API and signed webhooks — plus a native Zapier app that opens up everything else.
How teams use DocParse for custom documents.
Illustrative scenarios drawn from teams piloting DocParse — names and figures are examples, not customer quotes.
The thing that closed it for us: confidence scores per field. We auto-process 94% of intakes and only route the rest. Our reviewers are bored now.
We extract 40 fields from procurement docs in 11 languages. Switched from a vendor that needed 200 sample docs per template. We needed zero.
The questions teams ask before they sign up.
Stop writing per-template extractors.
Define a schema once. Run it on every document type your team handles, in every language. Free for the first 100 pages.