This article originally appeared in the November/December, 2017 issue of
    Mailing Systems Technology.

    Artificial intelligence (AI) and machine learning rank as the top technology trend in 2017 per the leading analyst firms. Businesses are looking to gain greater access to their data, reduce manual processes, and leverage automation technologies to improve their operations. Their systems must become more nimble to handle the increasing complexities of the work environment and simultaneously be more cost efficient. AI promises just that.

    In a 2017 study by AIIM of 100 managers overseeing document processing, 64% reported their businesses process over 10 incoming document types using their capture system. These forms ranged from invoices, receipts, checks, application forms to claims. Unfortunately, turning physical mail along with the high volume of soft copy documents into accurate digital information upon receipt continues to be a challenge due to lack of time and staff.

    Getting Started Addressing Mail Center Challenges

    Successful mail center processing starts with organization. Documents need to be classified whether by hand or via automation before they can be routed and their necessary data shared with the right people. Staff rarely has the time to adequately scope document types, develop classes, and create the metadata required for efficient, accurate retrieval so it becomes an inconsistent, error-prone activity. Automated classification that uses advanced machine learning can do a lot to take on the burden of document organization. Documents can be automatically classified based on text, visual analysis, or a combination, and improving upon the classifier becomes a much simpler, faster task.

    In order to choose the document classification technology best for your business, there are a couple of essential steps to keep in mind. One is to develop an inventory of key incoming documents used to support your critical business processes. This gives you the basis for your next move — to create a high-level process map along with the document classes involved and how they are used. It is also helpful to inventory the key data on each document that is needed.

    Next Steps to Success

    Once your business achieves a structured, consistent document classification process, the next step is locating and extracting the right data for the right people in a timely way. Capturing all the right data begins with context — especially when handwritten data is in the mix. Context is the knowledge about data in a field that provides valuable clues for accurate recognition results. For example, a string of digits, such as 802029998, means little without context. In one context, these numbers might be a ZIP Code. In a different context, these digits might represent programming code. This string of data only has validity given the right context.

    Even the most simple form fields, such as “address,” hold valuable context information. The address may be expressed in alpha or numeric form. The character style may be constrained or unconstrained handprint, machine print, or cursive handwriting. The field may also be in multiple types of formats or conventions depending on the country.

    Individual fields are recognized using context as an effective and flexible tool that improves recognition accuracy. The more precise the context provided, the more restricted the range of possible answers, and this increases the accuracy of the data extracted. To provide a high recognition rate, the software must have context to determine what values should be contained in each field of a document or image. Defining the field type, properties, character style, and vocabulary provides context. Until recently, definitions and associated business rules had to be laboriously created by subject matter experts (SMEs) and added to the system by programmers.

    Moving Beyond Context

    AI and machine learning can make accurate data location and extraction a whole lot easier and faster. This is because machine learning is basically the group of algorithms and models that can learn and make inferences about the data. Given enough sample documents and their accurate extracted data (ground truth data), advanced AI software can automatically develop definitions and business rules. Of course, this presents its own challenge. That is, the sample documents and their extracted data results have to be fully representative of the real-world mail center in variety, type, and contents. This is crucial for the AI to develop the right features to correctly and automatically extracting data from incoming documents.

    Tomorrow’s mail center documents will change from today’s documents. That much is certain. Document automation needs to leverage adaptive technologies that ensure the ongoing tuning of your capture system. Staff cannot be expected to handle all of the exceptions in order to address data location and extraction problems from new or changed documents. Therefore, tuning and validation of data also become key components to full automation.

    For every large-scale mail center operation, machine learning techniques for configuration and tuning along with advances in automated measurement, document classification, data location, and extraction will ultimately create the most nimble mail center.

    Kaz Jaszczak is VP of Postal Automation at Parascript with over 35 years of experience in research, product management and business development in imaging, document management, signal processing and OCR software. Greg Council is VP of Marketing and Product Management at Parascript responsible for market vision and product strategy.