Turning Any Invoice Into Review-Ready Data

Our intelligent extraction engine is built to handle the full spectrum of legal invoice formats—from structured LEDES to scanned PDFs. See how we convert any file into a clean, reliable dataset primed for analysis.
Our Transparent Extraction Process
A concise pipeline designed for legal invoices.
1
Ingest
  • Accept the file you have: LEDES, CSV/pipe, XML, DOC/DOCX, XLS, PDF (digital or scanned), JPG/PNG/TIFF.
  • Detect basic properties (type, size, text vs. image) to route to the right parsers.
2
Detect & classify
  • Identify whether the content is structured (LEDES, delimited, XML) or unstructured (DOC, PDF, image).
  • Choose deterministic parsing or OCR + layout analysis as needed.
3
Parse & extract
  • For structured inputs, map fields directly to line items.
  • For unstructured or scanned inputs, use OCR and text interpretation to reconstruct entries.
4
Normalize & align
  • Standardize dates, currencies, and numeric fields.
  • Align headers (or infer them if missing) to consistent internal fields.
5
Validate
  • Run sanity checks (totals, required fields, numeric consistency).
  • Highlight anomalies before applying your rule checks.
6
Ready for checks
  • Structured entries flow into your selected invoice rule checks.
  • Integrated AI is used where it adds value and clarity; other checks are fully deterministic.
Expert Handling for Every Invoice Format
Each input type is parsed with an approach tailored to its structure. Here’s what to expect.
LEDES 1998B (structured)
Highest fidelity

How we handle

Direct, field-to-field parsing with strict validation.

Captures matter IDs, timekeepers, roles, rates, hours, amounts, and expenses out of the box.

Best when

Law-firm e-billing exports and panels.

Teams who want the most precise checks and analytics.

Potential limitations

Malformed delimiters or headers can interfere with parsing; we validate before upload as best as possible.

Delimited tables with headers
High fidelity

How we handle

Map header names to canonical fields; flexible header synonyms supported.

One entry per line yields strong extraction quality.

Best when

Exports from billing/accounting tools where LEDES is unavailable.

Potential limitations

Inconsistent column order or merged cells may require cleanup.

Delimited tables without headers
Moderate–High fidelity

How we handle

Infer column meaning using position, sampling, and content patterns.

Optionally let reviewers confirm inferred fields for reliability.

Best when

Simple line exports with consistent column order.

Potential limitations

Mixed or shifting column layouts can reduce confidence—add headers if possible.

XML (structured)
High fidelity

How we handle

Parse against common e-billing structures; map fields to InvoiceChecker’s schema.

Preserve hierarchical relationships for entries and expenses.

Best when

Systems that produce XML invoice feeds.

Potential limitations

Custom XML without stable tags may require mapping guidance.

DOC / DOCX
Variable fidelity

How we handle

Detect and parse tables; extract text and numeric fields from rows.

When tables are irregular, heuristics align cells to known fields.

Best when

Manual invoices built in Word with clear tables.

Potential limitations

Freeform text and multi-column layouts can reduce precision—exporting to CSV improves results.

PDF (digital, text-based)
Good fidelity

How we handle

Extract selectable text, then reconstruct line items.

Handle common multi-page layouts and totals.

Best when

Invoices exported directly from billing systems.

Potential limitations

Complex multi-column or decorative layouts can require review. Prefer structured files if available.

PDF (scanned) & images (JPG/PNG/TIFF)
Variable fidelity

How we handle

OCR converts images to text, then layout analysis reconstructs rows.

Integrated AI helps interpret ambiguous descriptions where appropriate.

Best when

Legacy or paper workflows when digital files aren’t available.

Potential limitations

OCR is sensitive to blur/tilt; 300+ DPI scans work best. Photos are least reliable—use a scanner when possible.

Input quality still matters

We use sophisticated extraction, but “garbage in, garbage out” still applies. Wherever possible, choose LEDES 1998B or clean CSV/pipe with one entry per line. Headers can help.

The Result: A Comprehensive, Structured Data Set
The goal is consistent, line-level data that’s easy to check, review, and explain.
Line item fields
  • Matter identifiers (when present)
  • Timekeeper name/ID and role
  • Date, hours, rate, amount
  • Narrative/description text
  • Expenses and disbursements
  • Task/activity codes when provided
Invoice-level context
  • Vendor/law firm details (when present)
  • Invoice number/date and totals
  • Page and line references to support review
Enterprise-grade security & privacy

Data is encrypted in transit and at rest using industry-standard encryption on a cloud platform with SOC 2 Type II posture. Results remain under your control and can be removed when no longer needed.

Managed Extraction for Complex Cases

For unique, non-standard formats or large-scale backlog processing, our team of experts is available to assist. We offer a managed service to ensure even the most complex files are accurately structured for review. This is an optional, premium service.