One of our clients is a start-up company in the UK. This company mainly serves self-employed, SMEs to extract related data from receipts and invoices for accounting and money management purposes. The client hoped to develop an internal receipt processing system. The system would automatically extract some useful information such as consumption amount, consumption date, VAT number, etc and import the data into the ReceiptForm database so that its users can view the information online or import to other financial software.
OCR Functions & Problems
1) The client needed to use our OCR technology to extract:
- Store name
- Total amount
- Tax amount
- Payment method (cash / credit card / etc.)
- Payment method reference (e.g. last 4 digits of credit card)
- VAT number present
2) Current problems:
- Receipt formats varied
- Image quality was not certain and some images had background watermarks or noise
- Companies’ names were replaced by logos
- Some images had hand-written characters
Aimed at the client’s requirement, we developed a receipt processing system based on ExperVision’s OpenRTK. Our solution was described below:
- Acquired receipt images from scanner or mobile camera
- Developed pre-processing modules, such as setting an image threshold to ensure RTK receive the highest quality image for recognition
- Called RTK to fully recognize.
- Extracted key information according to rules below:
- If structure was similar, directly called specified template to recognize and import information into the database;
- If structure was quite different, called intelligent analysis template, by searching key words such as consumption amount, total, etc., then extract and import the text into the database
- Proofed results and manually entered some poorly recognized information such as store logo name or handwritten characters.