Email parsing

Every email thread.Boiled down to JSON.

Support threads, order confirmations, intake forms, B2B inquiries. DocParse pulls entities, intent, and action items from any email and attachment.

Start extracting See the API

100 pages free No card SOC 2 in progress

app.docparse.io / extractions / support-ticket-44821.eml

Live

Re: Order #PR-1148 stuck in Long Beach

logistics@acme.co · 4 messages

CustomerAcme Industries

OrderPR-1148

IssueHeld at Long Beach

SentimentFrustrated → Resolved

IntentStatus request

Suggested actionPush to ops queue

JSONCSVWebhook98.6% confidence

{
  "thread_id": "th_44821",
  "customer": "Acme Industries",
  "order_no": "PR-1148",
  "intent": "status_request",
  "sentiment": "resolved",
  "action_items": ["dispatch_eta","notify_customer"],
  "attachments": [ 2 ]
}

Extracted in 2.4s · 7 fields

Scroll

Pipe extracted data into any of these — via Zapier or signed webhooks

Zapier

Google Drive

Gmail

Slack

Sheets

Notion

Airtable

Webhook

Zapier

Google Drive

Gmail

Slack

Sheets

Notion

Airtable

Webhook

Why DocParse

Three reasons teams switch to us

Threads, not just messages

DocParse stitches an email thread, including forwarded snippets and quoted replies, into one structured record.

1 rowper thread, not per message

Entities, intent, and sentiment

Order numbers, customer names, action items, and sentiment all extracted with confidence and source span.

15+entity types out of box

Attachments handled inline

PDFs, images, and screenshots in attachments are extracted and merged with the email content into one structured record.

12 fmtsattachment types parsed

How it works

From raw emails
to structured data, in four steps.

Drop document, paste URL, or POST file

PDFPNGJPGTIFFDOCXHEICHTMLEMLXLSX

The schema

Starter schema for emails.
Tweakable in seconds.

The emails template comes with a 10-field starter schema based on the most common fields teams pull from emails. Add your own fields, mark which are required, and change types in the dashboard or via the REST API.

Emails · default schema

thread_idstringrequired99.9%

subjectstringrequired99.8%

participantsarrayrequired99.5%

customerstringrequired98.9%

intentstringrequired97.4%

sentimentstringoptional96.8%

entitiesarrayrequired98.2%

action_itemsarrayoptional95.6%

attachmentsarrayoptional99.1%

languagestringrequired99.7%

JSON SchemaTypeScriptPython

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Emails",
  "type": "object",
  "required": [
    "thread_id",
    "subject",
    "participants",
    "customer",
    "intent",
    "entities",
    "language"
  ],
  "properties": {
    "thread_id": {
      "type": "string"
    },
    "subject": {
      "type": "string"
    },
    "participants": {
      "type": "array"
    },
    "customer": {
      "type": "string"
    },
    "intent": {
      "type": "string"
    },
    "sentiment": {
      "type": "string"
    },
    "entities": {
      "type": "array"
    },
    "action_items": {
      "type": "array"
    },
    "attachments": {
      "type": "array"
    },
    "language": {
      "type": "string"
    }
  }
}

What to expect

Field-level accuracy per field.

Multi-modal models do the reading, and accuracy depends on document quality. The numbers below are illustrative ranges we've seen on emails — run your own documents and compare against a small ground-truth set before you scale.

97.6%

illustrative field-level
accuracy ceiling

10starter fields

Anylanguage supported

25 MBmax file size

FieldAccuracy

subject

99.8%

participants

99.5%

customer

98.9%

intent

97.4%

sentiment

96.8%

entities

98.2%

action_items

95.6%

language

99.7%

The API

One endpoint.
Every output you need.

# Extract with one POST
curl -X POST "https://api.docparse.io/v1/emails" \
  -H "Authorization: Bearer $DOCPARSE_KEY" \
  -F file=@"support-ticket-44821.eml" \
  -F schema="email-thread" \
  -F webhook="https://api.acme.co/incoming"

# Returns:
{
  "status": "complete",
  "confidence": 0.987,
  "latency_ms": 2412,
  "data": { ... }
}

Plain HTTP, no SDK lock-in

Bearer-token auth with revocable, SHA-256-hashed API keys. Call it from any language that can hit a REST endpoint — we publish docs and copy-pasteable snippets, not opinionated wrappers.

cURLPythonNode.jsGoRubyPHPJava.NET

Signed webhooks for async

Register an endpoint, set the events, and we POST signed deliveries (HMAC-SHA256, Standard Webhooks spec) as extractions finish. Every attempt is logged in the dashboard with response code, body, and timing.

Webhook delivery log · per-endpoint retries

The alternatives

Why teams switch from regex.

A look at how DocParse compares to the three things you've probably already tried.

Regex + scripts

Manual review (BPO)

Mailparser / Einstein

DocParse

Works on a layout it has never seen

partial

Handles handwriting and scans

partial

Custom fields without per-vendor setup

Multi-lingual out of the box

partial

REST API + signed webhooks + Zapier

partial

Pricing scales with pages, not seats

partial

Free tier — 100 pages on signup

partial

Time-to-first-extraction

Days

Weeks

5 minutes

Where the data goes

Reach the tools you already run.

DocParse ships two integration surfaces directly — REST API and signed webhooks — plus a native Zapier app that opens up everything else.

Zapier

Automation

Webhooks

API

REST API

API

JSON export

Export

CSV export

Export

Google Drive

via Zapier

Google Sheets

via Zapier

Gmail

via Zapier

Outlook

via Zapier

Slack

via Zapier

Dropbox

via Zapier

Airtable

via Zapier

Notion

via Zapier

HubSpot

via Zapier

Salesforce

via Zapier

Make.com

via Webhooks

n8n

via Webhooks

Postgres

via Webhooks

REST API · Signed webhooks (HMAC-SHA256) · Zapier to 6,000+ apps · JSON / CSV export

Common patterns

How teams use DocParse for emails.

Illustrative scenarios drawn from teams piloting DocParse — names and figures are examples, not customer quotes.

“

Salesforce Einstein got intent right 78% of the time. DocParse hits 97%, and it auto-extracts attachments. Our CSAT jumped four points.

Riya Saxena

Head of CX · Halcyon

+4ptsCSAT in a quarter

“

Customers fwd a half-PDF in the body, screenshot a number, and ask for help. DocParse glues the whole thread into one structured record. Magic.

Felipe Sosa

Support Engineering · Quartile

1 recordper messy thread

“

We auto-route 92% of inbound emails based on DocParse intent + entity tags. Our triage queue is down to 30 minutes from a full day.

Lior Cohen

Ops Lead · Tidemark

92%auto-routed cleanly

Frequently asked

The questions teams ask before they sign up.

Does it work on .eml or just plain text?

.eml, .msg, .mbox, plain text, HTML, and forwarded screenshots. Headers preserved, signatures stripped, quoted replies de-duped.

How does intent detection work?

Out of the box: status_request, refund, complaint, billing, inquiry, churn, scheduling, lead. Add your own intents by description; no training needed.

What about attachments?

PDFs, images, screenshots, spreadsheets — extracted using the matching DocParse model and merged into the thread record with the same schema.

Can it route to a queue?

Most customers wire DocParse into their CRM or helpdesk webhook. The output includes a suggested queue and confidence; auto-route above your threshold, escalate below it.

Email to JSON. Threads to action.

Inbound email is unstructured by default. Make it the cleanest data source you have.

Start extracting Talk to sales

Free for first 100 pages 5-minute setup No credit card

Every email thread.Boiled down to JSON.

Three reasons teams switch to us

Threads, not just messages

Entities, intent, and sentiment

Attachments handled inline

From raw emails
to structured data, in four steps.

Define the fields

Upload the file

Multi-modal AI reads it

Get structured data back

Starter schema for emails.
Tweakable in seconds.

Field-level accuracy per field.

One endpoint.
Every output you need.

Plain HTTP, no SDK lock-in

Signed webhooks for async

Why teams switch from regex.

Reach the tools you already run.

How teams use DocParse for emails.

The questions teams ask before they sign up.

Email to JSON. Threads to action.

Every email thread.Boiled down to JSON.

Three reasons teams switch to us

Threads, not just messages

Entities, intent, and sentiment

Attachments handled inline

From raw emailsto structured data, in four steps.

Define the fields

Upload the file

Multi-modal AI reads it

Get structured data back

Starter schema for emails.Tweakable in seconds.

Field-level accuracy per field.

One endpoint.Every output you need.

Plain HTTP, no SDK lock-in

Signed webhooks for async

Why teams switch from regex.

Reach the tools you already run.

How teams use DocParse for emails.

The questions teams ask before they sign up.

Email to JSON. Threads to action.

From raw emails
to structured data, in four steps.

Starter schema for emails.
Tweakable in seconds.

One endpoint.
Every output you need.