Welcome to Oscar

Quickly turn paper documents into digital data. Oscar provides data digitisation with built-in quality control.

Quick as a flash

Data you can trust

Edit with ease

See Oscar in action

Welcome to EM

Our effective environment management solution combines our people, processes and technology to maximise results from your investment.

Full review of your

A plan to address
these systematically

See improvements

Case studies

What is OCR and why is it important? 

Optical character recognition (OCR), sometimes referred to as text recognition, is a method of converting images of typed, handwritten, or printed text into editable digital text. OCR can be used to help digitise text from books, images, documents, and other sources. 

In this modern world of Artificial Intelligence, Machine Learning and Automation it seems bizarre that paper-based documentation is still so prevalent. However, if you talk to any Solicitor, Forensic Accountant or Criminal Investigator, they will tell you they have desks, drawers and cupboards full of documents – to process, extract key data from and analyse – and there’s little sign of this stopping. 

Surely it would make sense to have a solution that can automatically identify text within a document, convert it to a digital format, quality check it and output it into a format you can use? The problem is, OCR is not an exact science. Understanding patterns and context comes easily to human beings but asking a computer to do the same is a challenge. Typically, the more complex the use case, the more expensive the solution.  

In a perfect world, where the page has been scanned in high quality, with no smears, smudges, or inconveniently placed specs of dust, modern OCR software does an excellent job of extracting the data.  But how confident can you be that the software has done the job accurately? And what about those other cases, where the scan wasn’t so good, or the original wasn’t perfect in the first place? 

We know that having confidence in your data is critical, and Oscar has a few tricks to give you confidence in the output. Let’s take Oscar’s bank statement module as an example – we’ll walk you through how it works and how we can guarantee you get the results you need, quickly and accurately.  

Just enough automation 

We know you don’t want to be manually dragging boxes around the text you want to extract or calculating rows and columns to make sure the financial data you’re extracting all adds up. We also know that, if possible, you just want to select multiple bank statements, click ‘run’ and get the output. 

Oscar’s automated process does all that for you. Simply drag the scanned files from your computer folder into Oscar and, seconds later, the output is there for you in Excel – quality checked, in a format you can work with. 

How can you be sure that the data Oscar has extracted can be trusted?  This is where practitioner-led development comes in. Oscar could automate everything, but there are cases when you will still want to view the data and amend it or correct it before it is output to Excel, so we’ve incorporated functionality to allow you to do that.  

Maximising your confidence in the OCR results 

If there was a single standard font or document format that everyone in the world used, OCR wouldn’t be so tricky. But that isn’t the case.  Oscar is ‘taught’ to recognise individual letter shapes, or the shapes that make up a single letter. We also teach Oscar how to recognise documents, areas within documents, and key items within each area to ensure we extract all the relevant information – automatically. 

Oscar then assigns a confidence value (as a percentage) to the result – how confident is the software that it has correctly identified the letter? Words are another layer on top, which can help increase the accuracy of the result (if we assume that a collection of letters is usually intended to be a word). If Oscar recognises four letters, it can consult a dictionary to help it chose the correct result, then assign a confidence score at word level as well. 

It is important to note that these confidence values are not probabilistic, for example a confidence score of 95% does not mean there is a 95% chance the word is correct. Likewise, a correct recognition could have a low confidence. These values help to set a threshold and guide for where we need to focus manual review. 

What column is this?

Working with a known type of data gives Oscar an advantage when it comes to accuracy. Once OCR and table recognition is performed, Oscar determines the likely type of each column – date, description, transaction type or numeric value (debit, credit, balance). Knowing the column type narrows down the available character set and format and opens up some other validation techniques. 

Are my dates in order?

Each entry in a bank statement is almost always dated. Oscar can assess these dates for accuracy in two ways. Firstly, is the text a valid date format? Can Oscar parse it to a day, month and year? If not, it may require some manual attention. Secondly, Oscar can assess the order of the rows. Statements are either presented in ascending or descending date order. If there are dates that don’t follow that pattern, something may be amiss (a page scanned out of order, for example).

Does it all add up? 

Numeric fields in Oscar can be corrected to clean up any unexpected text. For example, currency symbols such as $ and £ can often be misinterpreted. Lastly, Oscar ensures that the statement balances. Oscar adds a ‘running balance’ column to the output data. Given the starting balance, debits and credits, the running balance is calculated and compared to the balance read from the statement. If the statement balances, there is a high confidence that the numeric fields of the statement have been recognised correctly. If there is a difference between the two, this is highlighted in the results table for manual review. 

Together, these features add up to give you confidence that when Oscar gives a statement a green tick, the output is correct.

Try Oscar for free: https://demarq.com/oscar/