Turning Any Invoice Into Review-Ready Data
Our intelligent extraction engine is built to handle the full spectrum of legal invoice formats—from structured LEDES to scanned PDFs. See how we convert any file into a clean, reliable dataset primed for analysis.
A concise pipeline designed for legal invoices.
Ingest
- Accept the file you have: LEDES, CSV/pipe, XML, DOC/DOCX, XLS, PDF (digital or scanned), JPG/PNG/TIFF.
- Detect basic properties (type, size, text vs. image) to route to the right parsers.
Detect & classify
- Identify whether the content is structured (LEDES, delimited, XML) or unstructured (DOC, PDF, image).
- Choose deterministic parsing or OCR + layout analysis as needed.
Parse & extract
- For structured inputs, map fields directly to line items.
- For unstructured or scanned inputs, use OCR and text interpretation to reconstruct entries.
Normalize & align
- Standardize dates, currencies, and numeric fields.
- Align headers (or infer them if missing) to consistent internal fields.
Validate
- Run sanity checks (totals, required fields, numeric consistency).
- Highlight anomalies before applying your rule checks.
Ready for checks
- Structured entries flow into your selected invoice rule checks.
- Integrated AI is used where it adds value and clarity; other checks are fully deterministic.
Each input type is parsed with an approach tailored to its structure. Here’s what to expect.
LEDES 1998B (structured)
How we handle
Direct, field-to-field parsing with strict validation.
Captures matter IDs, timekeepers, roles, rates, hours, amounts, and expenses out of the box.
Best when
Law-firm e-billing exports and panels.
Teams who want the most precise checks and analytics.
Potential limitations
Malformed delimiters or headers can interfere with parsing; we validate before upload as best as possible.
Delimited tables with headers
How we handle
Map header names to canonical fields; flexible header synonyms supported.
One entry per line yields strong extraction quality.
Best when
Exports from billing/accounting tools where LEDES is unavailable.
Potential limitations
Inconsistent column order or merged cells may require cleanup.
Delimited tables without headers
How we handle
Infer column meaning using position, sampling, and content patterns.
Optionally let reviewers confirm inferred fields for reliability.
Best when
Simple line exports with consistent column order.
Potential limitations
Mixed or shifting column layouts can reduce confidence—add headers if possible.
XML (structured)
How we handle
Parse against common e-billing structures; map fields to InvoiceChecker’s schema.
Preserve hierarchical relationships for entries and expenses.
Best when
Systems that produce XML invoice feeds.
Potential limitations
Custom XML without stable tags may require mapping guidance.
DOC / DOCX
How we handle
Detect and parse tables; extract text and numeric fields from rows.
When tables are irregular, heuristics align cells to known fields.
Best when
Manual invoices built in Word with clear tables.
Potential limitations
Freeform text and multi-column layouts can reduce precision—exporting to CSV improves results.
PDF (digital, text-based)
How we handle
Extract selectable text, then reconstruct line items.
Handle common multi-page layouts and totals.
Best when
Invoices exported directly from billing systems.
Potential limitations
Complex multi-column or decorative layouts can require review. Prefer structured files if available.
PDF (scanned) & images (JPG/PNG/TIFF)
How we handle
OCR converts images to text, then layout analysis reconstructs rows.
Integrated AI helps interpret ambiguous descriptions where appropriate.
Best when
Legacy or paper workflows when digital files aren’t available.
Potential limitations
OCR is sensitive to blur/tilt; 300+ DPI scans work best. Photos are least reliable—use a scanner when possible.
Input quality still matters
We use sophisticated extraction, but “garbage in, garbage out” still applies. Wherever possible, choose LEDES 1998B or clean CSV/pipe with one entry per line. Headers can help.
The goal is consistent, line-level data that’s easy to check, review, and explain.
Line item fields
- Matter identifiers (when present)
- Timekeeper name/ID and role
- Date, hours, rate, amount
- Narrative/description text
- Expenses and disbursements
- Task/activity codes when provided
Invoice-level context
- Vendor/law firm details (when present)
- Invoice number/date and totals
- Page and line references to support review
Enterprise-grade security & privacy
Data is encrypted in transit and at rest using industry-standard encryption on a cloud platform with SOC 2 Type II posture. Results remain under your control and can be removed when no longer needed.
Managed Extraction for Complex Cases
For unique, non-standard formats or large-scale backlog processing, our team of experts is available to assist. We offer a managed service to ensure even the most complex files are accurately structured for review. This is an optional, premium service.