AI-Powered
Data Extractor

What Can You Extract?
Tolstoy’s Data Extractor extracts and labels text from virtually any document or record.
Unlike other solutions, we customize an Extractor for each client. Because of this, we are able to process highly unstructured documents — from legal transcripts, to email data, to handwritten records.
OCR
Print and handwriting, structured or unstructured text.
Custom Tagging
Find and label custom types of text. Names, addresses, document numbers, dates, locations, industry-specific terms, etc.
Extract
Extract text from unstructured documents into a structured output: CSV, XLSX, JSON, or other structured format.
Classify
Classify documents, emails, and other records with custom categories.
Customer Success

Legal Transcripts
Extract document references from trial dialogue.
Entity Extraction
The university digitized a collection of one million documents from the Nuremberg War Crimes Trial. The contents of the entire collection was unknown, so the university partnered with Tolstoy to help index the documents. This was done by identifying all document references in their 150,000+ pages of court transcripts.
Tolstoy tagged the references with 99%+ accuracy. This saved them approx. 7-8 months of work.
Read more
News clippings
Extract text from old news articles.
Custom OCR
To celebrate their 130th anniversary, The Wall Street Journal wanted to digitize and reprint articles from their entire history in a special edition. Since many of the articles were very old, traditional OCR software did not pick up the text well or at all.
We wrote a custom OCR script that parsed their digitized articles with 95%+ accuracy. This saved them several weeks of manual transcription.
Read more
Museum records
Extract fields from handwritten cards.
OCR and Entity Extraction
Museums across the world are digitizing their records. This includes extracting text from specimen labels into databases. The labels are often handwritten, with an unknown number of formats. The labels come from various eras, countries, and institutions.
Tolstoy built an Extractor that processes print labels with 98%+ accuracy, and handwritten labels with ~80% accuracy, regardless of the format. This project is ongoing and has the potential to reduce processing times from decades to weeks.
See demoWhat types of text can be processed?
-
Reports
-
Transcripts
-
Forms
-
Receipts
-
Handwritten labels
-
News articles
-
and more
Automate to Accelerate Data Mastery
Understand what's in your internal documents, client responses, maintenance forms, and other records. Have a truly complete view of your information. Ensure a strong foundation to track, analyze, and run predictive analytics from your data.
It's your data. Make the best use of it.