Low-quality machine printed documents -> high-quality automation

Low-quality machine printed documents can make life difficult for organizations in a number of ways. In this post, we review how to overcome these challenges and get through the paperwork cycle. 

Where Do Low-Quality Machine Printed Documents Come From?

Paper isn’t going away anytime soon, which means paperwork processing will continue to be a necessary evil for many organizations.

This is especially true for insurance and healthcare organizations. Processing beneficiary designations, death certificates, patient enrollment forms and prescriptions are a daily burden that require precious time and resources on manual data entry. 

Why? Because most of these documents come in the form of scans or faxes (or scanned faxes) – this makes them too low-quality for any traditional processing workflow to handle.  

The process of faxing and scanning documents introduces digital noise, or artifacts, that degrades the overall quality of the document. The more you fax a document back and forth, the fuzzier it becomes after each fax. And scanned images can become skewed or compressed during the scan.

Sometimes, though, the quality is degraded on purpose:

  1. Security – documents like death certificates are designed to produce artifacts to prevent fraud or duplications. 
  2. Storage – scans are typically used for storing paper (or if you need to digitize paper for any reason). But because of storage premiums and how much content needs to be stored, people will scan at a lower DPI to save space. 

While these are valid reasons, organizations run into problems downstream when they need to process forms through an automation workflow or dig through historical data for insights. 

What’s the Problem?

Many organizations already have workflows set up to process machine print. (Until recently, handwriting processing was off the table, so they have separate measures for that.) 

But even in a traditional optical character recognition (OCR) workflow specifically designed for machine print, the automation can only handle about 80% of the documentation. When a low-quality document comes through, the process has to be interrupted in order to resolve the bad data. 

Granted, 20% of human intervention to deal with these low-quality documents in an operational workflow is better than 100%. But over time, the costs and time lost from this 20% adds up. This is because the exception handling path for low-quality documentation requires kicking the data back to a human to clean it up. 

And any amount of human intervention unfortunately leads to three major challenges: 

  1. Inaccuracy – Data errors from mistyping.
  2. Resources – It is difficult to source talent willing and able to manually extract text. Plus, it’s time-consuming and costly to have someone extract text when they could be working on something more valuable. 
  3. Security – When data moves from machine to human back to machine, the process is more vulnerable to security issues, especially for highly regulated industries with sensitive information, like healthcare.  

Moreover, since traditional OCR struggles with processing a full workflow that includes low-quality documentation, and the process of manually sifting through historical data is too costly and time-consuming, organizations don’t have access to valuable data that could be used for enhanced insights and decision making.

The day-to-day challenges of exception handling processes are difficult enough. But this major, overarching challenge stands out: the inability to perform advanced data analytics.

Why Processing Low-Quality Documents Matters for Analytics 

Organizations with large quantities of data but no way to use it are sitting on insights that could be used to improve decision making. So, being able to process low-quality machine print opens up new avenues for collecting data that were previously inaccessible.

To see why this matters, let’s take a look at an example use case: an insurance company processing death certificates. 

Digitizing Death Certificates

Death certificates are a prime example of the problem of low-quality scans. Because they are scanned at low-quality, and intentionally disrupted with artifacts to prevent fraud, they can’t be easily extracted by humans or machine solutions alike. 

But death certificates are usually presented with important case data or family policies that provide valuable insight for life insurers. Insight that can help them offer competitive rates and stay ahead of competition. To capture this insight, they would have to achieve a statistically significant sample of data – often 20 years of data or more. 

This isn’t possible to pull off by hand. But imagine if insurers could process this amount of historical data at a high rate of accuracy. 

The cost of conducting analytics without the right technology is insurmountable. But with technology that can handle low-quality machine print, you can recover the opportunity cost of not performing analytics at a much lower operational cost.

It’s evident that processing these low-quality machine print documents makes a difference. So how can you make it happen? 

How to Process Low-Quality Documents 

As we’ve touched on, low-quality machine printed document processing requires something more advanced than just traditional OCR. Luckily, this technology exists and is available today.

Vidado’s highly trained machine learning models and computer vision engines unlock the ability to extract this low-quality text. It’s specifically trained to ignore scan artifacts and form labels. Plus, it creates templates for every type of each form it’s ever scanned, so, it can correct skewed and compressed images by pulling snippets off the original and into the clean template. 

As a result, Vidado solves all of the above challenges of low-quality machine print with better-than-human (about 97%+) accuracy. Specifically: 

  1. 42% increase in straight-through processing (STP)
  2. 70% decrease in manual exception handling 
  3. 80% decrease in manual data entry
  4. 50% decrease in operational costs
  5. 93% decrease in processing time 

Vidado does what traditional OCR and humans can’t. Better STP and less human intervention mean less data inaccuracy. Less resource strain. Fewer security risks. And an unlocked ability to perform advanced analytics.

Start turning low-quality scans and faxes into digitized data with a free 30-day trial of Vidado. 

Ready to get started with great AI?

Request a Demo