Document Type

What Parsli Extracts from an Email — Field Schema Reference

A field-by-field reference for the data Parsli surfaces from incoming emails. Below: the default JSON output, confidence-score baselines per field type, and the custom fields most teams add on top of the defaults.

Start Extracting Free See How It Works

What You Can Extract

Define your schema with any combination of these fields — or add your own custom fields.

sender_name / sender_email

Display name and RFC 5322 address from the From: header. Deterministic — pulled from the envelope, not inferred.

received_at

ISO 8601 UTC timestamp from the Received: header chain. Falls back to Date: header if Received: is missing.

subject

Decoded subject line, MIME-encoded values normalized to UTF-8.

body_intent

LLM-inferred classification: invoice, order, lead, notification, rate_quote, support_reply, or custom enum from your schema.

body_amounts[]

Array of monetary values labeled with their adjacent text — total, tax, fuel surcharge, etc. Currency normalized to ISO 4217.

body_dates[]

Array of dates with role labels — pickup, delivery, due_date, expiration. Output as ISO 8601 even when source is informal.

signature.name / signature.title / signature.phone

Sender signature parsed into structured fields. Includes phone normalization to E.164 when country is detectable.

attachments[]

Per-attachment object with filename, MIME type, page count, and the full extracted-fields JSON from running the attachment through Parsli's parser.

thread_id / reply_chain

RFC 2822 Message-ID + an ordered array of prior messages in the thread for deduplication and context.

raw_html / raw_plain

Both rendered-text bodies (the AI parses from `raw_plain` by default; HTML tags are stripped before extraction).

Heads up: This page covers email content extraction — pulling structured fields out of message bodies and attachments. If you arrived looking for an email scraper (collecting email addresses from web pages or files), see the patterns section below for cross-references; address harvesting is a different operation with different output.

The default email extraction schema

Every email Parsli ingests goes through the same extraction pipeline and returns a consistent JSON envelope. Below is the full default schema — every field listed is present in the output unless its source is missing from the source email. Custom fields you add via the schema builder are appended under `custom`.

Field	Type	Example	Confidence baseline
sender.name	string	"Sarah Brennan"	0.95+ (header)
sender.email	email	"ops@acme-logistics.com"	0.99 (deterministic)
received_at	datetime (ISO 8601)	"2026-04-23T14:32:00Z"	0.99 (header)
subject	string	"Rate confirmation #RC-7821"	0.98 (header)
body.intent	enum	"rate_confirmation"	0.85–0.95 (LLM)
body.amounts[]	array<money>	[{ value: 1240.00, currency: "USD", label: "Linehaul" }]	0.85–0.95
body.dates[]	array<date>	[{ value: "2026-04-25", label: "Pickup" }]	0.90–0.97
signature.name	string	"Sarah Brennan"	0.75–0.92
signature.phone	phone (E.164)	"+13125550143"	0.80–0.95
attachments[].filename	string	"invoice_7821.pdf"	0.99 (header)
attachments[].extracted_fields	object	{ shipper: "Acme", weight_lbs: 18420 }	Inherits attachment doctype baseline
thread.thread_id	string	"<rc7821@acme-logistics.com>"	0.99 (RFC 2822 Message-ID)
custom.<field>	user-defined	See custom field examples below	Field-dependent

JSON output

Sample output for a rate-confirmation email with one PDF attachment. Every Parsli email extraction returns an envelope of this shape.

{
  "doc_id": "doc_a8f29d4e",
  "received_at": "2026-04-23T14:32:00Z",
  "sender": {
    "name": "Sarah Brennan",
    "email": "ops@acme-logistics.com"
  },
  "subject": "Rate confirmation #RC-7821",
  "body": {
    "intent": "rate_confirmation",
    "intent_confidence": 0.94,
    "amounts": [
      { "label": "Linehaul rate", "value": 1240.00, "currency": "USD", "confidence": 0.97 },
      { "label": "Fuel surcharge", "value":  186.00, "currency": "USD", "confidence": 0.91 }
    ],
    "dates": [
      { "label": "Pickup",   "value": "2026-04-25", "confidence": 0.98 },
      { "label": "Delivery", "value": "2026-04-27", "confidence": 0.96 }
    ]
  },
  "signature": {
    "name":  "Sarah Brennan",
    "title": "Dispatch Coordinator",
    "phone": "+13125550143",
    "email": "ops@acme-logistics.com"
  },
  "attachments": [
    {
      "filename": "BOL_7821.pdf",
      "mime": "application/pdf",
      "page_count": 2,
      "extraction_mode": "digital",
      "extracted_fields": {
        "shipper":   "Acme Logistics",
        "consignee": "Midwest Distribution",
        "weight_lbs": 18420,
        "pro_number": "ACL-99182734"
      }
    }
  ],
  "thread": {
    "thread_id": "<rc7821@acme-logistics.com>",
    "reply_count": 0
  }
}

Pull a structured order ID out of the body or subject line. The regex acts as a hint, not a hard filter — the LLM still finds the right value if formatting drifts.

po_referencestring

Locate a purchase-order number anywhere in the body or any attachment. Useful for AP workflows that match invoices to POs.

shipping_carrierenum (UPS, FedEx, USPS, DHL, Other)

Constrain the LLM to a known set of carriers; useful when downstream systems require a strict enum and not a free-text guess.

ticket_idstring with cross-reference

Extract a ticket ID and validate it against an existing row in your CRM or Helpdesk via webhook before ingestion. If the validation fails, the parse is held in review.

urgencyenum (low, normal, high) — LLM-inferred

Classify body sentiment to triage incoming requests. Combine with intent classification to route urgent leads or VIP-customer support directly to a Slack channel.

Parser modes for email processing

The same field schema runs in four operating modes. Most accounts mix two or three depending on the workload — push for inbox automation, batch for backlogs, sandbox for tuning. Pricing is identical across modes; you pay per page extracted regardless of how the message arrived.

Real-time push processing

Gmail push notifications and SMTP delivery fire the parser within ~30 seconds of a message landing. The default mode for inbox-management workloads where freshness matters.

Scheduled batch processing

Cron-driven sweeps over a Gmail label, an IMAP folder, or a backlog of .eml files. Useful for week-end reconciliation or one-time migrations from a legacy email parser.

API-only processing

POST a raw email or .eml file to the REST endpoint, get the JSON envelope back. No inbox connection required — used by automated email-processing pipelines that already have message bytes in hand.

Free-tier sandbox processing

30 pages per month, no credit card, every feature on. Used by teams comparing AI extraction to a pattern-matching email parser before committing — same JSON output as paid tiers, identical confidence scoring.

Other email-extraction patterns

Adjacent intents users sometimes land on this page looking for. Each row points to the right page or guide for the specific job — they're listed here for disambiguation, not because they're handled by the email-content parser above.

Extract email addresses from PDFs

Different operation — text-mining a PDF for any email-shaped string. Use the PDF parser with a custom string field set to an email regex.

See the right page

Extract email addresses from Excel / CSV

Pull a column or scan free-text cells for email addresses. The spreadsheet parser handles this; treat the email column as a typed string field.

See the right page

Email address extraction from websites (scraping)

Out of scope for Parsli — this is web-scraping intent, not email parsing. We recommend a dedicated scraper for that workflow; we ship downstream of it once the addresses are in your CRM.

Extract emails from Gmail

If you mean 'extract data out of emails in my Gmail account', that's exactly the workflow above — connect Gmail OAuth, define a schema, run the parser.

See the right page

Power Automate / Outlook 365 text extraction

Power Automate's built-in 'extract text from email' action covers basic regex patterns. Parsli runs as the AI step inside a Power Automate flow when patterns aren't enough — connect via webhook.

See the right page

Supported Formats

Gmail (native OAuth, read-only scope)
Microsoft 365 / Outlook (forwarding + webhook)
Forwarded emails to your unique parsli.co inbox address
MIME multipart/alternative, multipart/mixed, multipart/related
Attachments: PDF, PNG, JPG, TIFF, .docx, .xlsx, .csv

Free Tools for Emails

Try these free browser-based tools. No sign-up required.

PDF to Text

Extract text from PDF email attachments.

Try free

AI Summarizer

Summarize key information from email documents.

Try free

Frequently Asked Questions

Does Parsli parse the HTML body or the plain-text body?

By default Parsli parses `raw_plain` — the rendered plain-text version of the email. For multipart/alternative messages where the plain-text part is missing or malformed, the HTML is stripped of tags and parsed as text. Both `raw_html` and `raw_plain` are returned in the JSON output so you can audit which version drove the extraction.

How does Parsli handle email threads?

By default the parser extracts from the latest message in the thread and includes the prior messages as a `reply_chain` array for context. You can switch the schema to extract from the full thread (useful for support emails where the relevant data is buried in the customer's first message) — set `thread_mode: 'full'` on the parser config.

What happens with quoted/forwarded portions of the body?

Quoted lines (lines beginning with `>`) and `--- Forwarded message ---` blocks are detected and excluded from the primary extraction unless the schema explicitly requests them. Forwarded attachments are still processed in full.

Are email signatures parsed into structured fields?

Yes. The signature block is detected via positional heuristics + the LLM, then parsed into `signature.name`, `signature.title`, `signature.email`, `signature.phone`, and `signature.company`. Phone numbers are normalized to E.164 when a country code is present or inferable from `sender_email`.

What encodings and character sets are supported?

MIME-decoded headers and bodies are normalized to UTF-8 before extraction. Source encodings supported: UTF-8, ISO-8859-1 through -16, Windows-1250–1258, Shift_JIS, GB18030, EUC-KR. Right-to-left scripts (Arabic, Hebrew) are preserved through extraction and JSON output.

How are attachments that are image scans (not digital PDFs) handled?

Image-only PDFs and direct image attachments are routed through the OCR pipeline first, then the OCR output is fed into the same field-extraction step as a digital PDF. The `attachments[].extraction_mode` field in the output records which path was used (`digital`, `ocr`, or `mixed` for hybrid PDFs).

Related Resources

Use Case

Start Extracting Data from Emails

Set up in minutes. No credit card required.

Get Started Free

What Parsli Extracts from an Email — Field Schema Reference

What You Can Extract

sender_name / sender_email

received_at

subject

body_intent

body_amounts[]

body_dates[]

signature.name / signature.title / signature.phone

attachments[]

thread_id / reply_chain

raw_html / raw_plain

The default email extraction schema

JSON output

How confidence scoring works

Header fields (sender, subject, received_at)

Body amounts and dates

body.intent classification

Signature parsing

Attachment field extraction

Adding custom fields to the email schema

Parser modes for email processing

Real-time push processing

Scheduled batch processing

API-only processing

Free-tier sandbox processing

Other email-extraction patterns

Extract email addresses from PDFs

Extract email addresses from Excel / CSV

Email address extraction from websites (scraping)

Extract emails from Gmail

Power Automate / Outlook 365 text extraction

Supported Formats

Free Tools for Emails

PDF to Text

AI Summarizer

Frequently Asked Questions

Related Resources

Email-to-Database Workflow

Extract Data from Emails

Parse Invoices to QuickBooks

Document Automation

E-Commerce

Logistics & Shipping

Insurance

AI-Powered Email Management

Parse Any Document

Gmail

Outlook

Google Sheets

HubSpot

Zapier

REST API

Start Extracting Data from Emails