capability

Structure from chaos.

Most documents arriving in your enterprise are unstructured — scanned forms, PDF invoices, handwritten field reports, multi-language regulatory filings, photographed permit sheets. TeamSync's Metadata Extraction capability turns them into structured records: typed metadata, extracted text, classified document type, all stored against the document in the Intelligent Repository.

Talk to an IDP solutions engineer · Compare to ABBYY · Compare to Hyperscience

Diagram: an unstructured document (scan, PDF, photo, handwritten) entering an extraction pipeline (OCR / ICR / classification / field extraction) and emerging as structured metadata bound to the document.

What's in the capability.

Component	Purpose
OCR (Optical Character Recognition)	100+ languages; printed text from scans, PDFs, photos
ICR (Intelligent Character Recognition)	Handwriting via ML models; checkbox + signature detection
Document classification	Auto-tag document type (invoice / contract / claim / form / report) per the Intelligent Repository taxonomy
Field extraction	Type-specific field extraction (amount + due date for invoice; counterparty + effective date for contract; patient ID + diagnosis for clinical note)
Confidence scoring	Per-field confidence; below-threshold fields flagged for human review
Human-in-the-loop verification	Review interface for low-confidence extractions; corrections feed back into model training (with consent)
Audit ledger	Every extraction event anchored: source document, model version, output, human corrections

How TeamSync compares.

Capability	TeamSync	ABBYY Vantage	Hyperscience	Rossum	Tungsten Automation
Native to the document platform (no integration)	✅	Standalone	Standalone	Standalone	Standalone
Multilingual OCR (100+ languages)	✅	✅ Strong	Limited	Strong (EU focus)	✅
ICR (handwriting)	✅	✅	✅ Strong	Limited	✅
Per-field confidence with human-in-the-loop	✅	✅	✅	✅	✅
Audit ledger Merkle anchor per extraction	✅	Standard log	Standard log	Standard log	Standard log
Per-cluster pricing (no per-page metering)	✅	Per-page	Per-page	Per-document	Per-page

Read the IDP alternative comparisons →

Frequently asked questions.

What about edge-case OCR accuracy — handwriting, low-quality scans?

For the 90% of enterprise documents (printed text, standard forms), TeamSync meets the production accuracy bar. For edge cases (handwriting on legacy forms, very-low-quality scans, exotic-language handwriting), TeamSync coexists with specialist IDP tools (ABBYY, Hyperscience) that can be invoked as workflow nodes.

Can I train a custom field extractor?

Yes. Customers train custom extractors for their document types via the human-in-the-loop verification interface. Training data is tenant-isolated.

Does it handle multi-page documents?

Yes including page-segment classification (a single PDF containing multiple document types is split and each section classified separately).

Intelligent Repository — extracted metadata lands here
Business Process Automation — extraction as a workflow node
Document Templates — extracted fields populate templates
DocuTalk — AI grounds in extracted metadata

HIPAA — PHI extraction with tenant isolation
FDA 21 CFR Part 11 — clinical-form extraction with audit