NASA has a corpus of 50,000 scientific papers, with 12 possible categories for each paper. The lowest frequency label has around 1000 occurrences in the dataset, while the highest has around 10,000 occurrences. Each paper can have multiple labels.
We were able to assign labels with 83-85% accuracy, depending on the model used, and an F1 score of >70%, in less than a minute.
A prestigious university digitized a historical collection of one million documents from the Nuremberg War Crimes Trial. The university partnered with Tolstoy to help index the documents, by tagging all document references in their 150,000+ pages of court transcripts.
This involved two parts: 1) identifying and extracting all document mentions from the dialogue, and 2) tagging the mention as a prosecution, defense, or evidence file.
Previously, the university employed staff to do this as it required reading complex human dialogue and tagging nuanced mentions.
We were able to tag document references with 99%+ accuracy, and 92-95% recall (number of all positives captured). This saved the university several months of staff work.
Optical Character Recognition
The Wall Street Journal celebrated their 130th anniversary in 2019. As part of the celebration, they wanted to digitize and reprint articles from their entire history in a special edition.
Since many of the articles were very old, with poor, spotty scans, traditional OCR software did not pick up the text well or at all. Furthermore, many of the articles included several columns and images, which off-the-shelf OCR also struggled with.
During the 2020 pandemic lockdown, a large UK-based garden retailer received an influx of orders from customers staying at home. We helped them tag customer emails with 98%+ accuracy.
This saved them approx. 700 customer agent hours.
Gardening grew popular during the lockdown. The retailer quickly built up a backlog of 50,000+ customer emails between March and April — with thousands added per week.
We built a custom AI model to tag their emails (delivery inquires, cancel order, non-urgent). We also helped them discover customers often chose the wrong categories via Freshdesk, with an accuracy of just 50-70%.
This helped the retailer accurately categorize their backlog in 2 days, versus a month of 4-5 agents reading emails. The retailer could finally triage and respond to urgent requests first.
The World Bank has nearly 30,000 projects from its storied history. Each of them have multiple associated documents, including a 50-100 page document called the Project Appraisal Document (PAD). The World Bank employs a team of 40 people to assign sector and theme codes to these projects based on reading their PADs. It takes them several months to process a few thousand.
We took 10,000 PADs as training data and tuned our model to predict 11 theme codes with 80% accuracy, 78% F1 score. The per label accuracy ranges from 80-99%. This compares with the Bank team’s measured accuracy at 55% (compared with expert-labelled).
This takes our model less than a minute, versus four months of manual work with their contractors.