DocDataExtraction (DDE)

What is it

The DocDataExtraction (DDE) technology, fully developed by SATA, is the most efficient way to capture data from documents in PDF Raster, PDF vector and SPOOL formats. Its main features are very short set-up times and very high interpretation success rate. DDE can produce any kind of structured format in output, thus enabling very high levels of interoperability.

On the international market, and partially also national, there are other Intelligent Data Capture tools that we are continuously evaluating in their evolutions. With respect to other proposals, we think we offer clear advantages both on the technological side and on the commercial side.

Please download the presentation SATA-DDE-2017.pdf


DocDataExtraction has been designed and developed to provide a variety of features:
  • A lot of customers. SATA is working since the beginning for SMEs and its mission is realising powerful solutions that are also easy to use and at a reasonable price. The business model is the based on the possibility of reaching many (hundreds of thousands) customers with a highly engineered solution instead of few customers with very expensive projects. Anyway, a number of large customers and service providers are using our solutions and this is a sound evidence of the high quality of our proposal.
  • Application variety. SATA solutions are suited to a large variety of potential customers, ranging from the small-medium companies to banks and providers of document management services, including accounting consultants and other business process outsourcers, besides electronic invoice circuits.
  • Low entry level. The effort needed for putting in production a new user, thanks to the adopted technological solutions, is justified with few tens of documents per year. This means offering the system to hundreds of thousands companies and organisations of different nature.
  • Ready for SaaS. Many web-based applications claim to be “suitable for provision as-a-service”. Our solution couples an intuitive web user interface (in the verification phase, the most important from the operational viewpoint), with a modular and scalable architecture especially suited to manage high data volumes, by processing in parallel the most computationally demanding operations.
  • Flexible service model. The adopted technological solutions can satisfy different service provision models, from a strictly “outsourcing” approach with centralised validation, to the direct involvement of the customer that can autonomously verify its invoices, up to the possibility to manage mixed flows, partly unstructured (scanned) and partly structured (proprietary formats, PDF vector, spool).

Tecnological benefits

DocDataExtraction is designed and developed to ensure a large number of innovative features:
  • Extraction completeness. Since the beginning, we pursued an ambitious objective, i.e. generate structured contents according to the bank standard CBI2-4 “white label”. This choice pushed us to solve critical problems related to VAT details, payment mode and deadlines, references to orders and goods receipts, body lines, thus obtaining a level of completeness that our competitors are not reaching.
  • Extraction reliability. For raster and image PDF we are using the best OCR engine on the market, ABBYY Fine Reader, in addition, we use a combination of advanced techniques, from syntactical controls to fuzzy-logic, semantic controls on single fields and sets of fields, by applying selectively also positional logics and image centring. The choice of using a template is a distinctive feature with respect to other competitors that often apply heuristic rules at every extraction without capitalising on the knowledge acquired so far, and is very important to increase the success rate (more than 80% on the full document instead of 60-70%).
  • Specialisation on document types. Although many components are the same, thanks to the intrinsic architecture modularity, we prefer to specialise our solutions according to the type of document to manage (invoices, orders, goods receipts, fixed format documents), so as to capture and satisfy at best all the peculiar features.
  • Distributed architecture. The DocDataExtraction architecture is distributed, as the extraction phase can be executed by enacting in parallel one or more extraction modules, so as to obtain better performances. The remote access can be obtained via VPN or via HTTP protocol, meaning that the system is already ready to be offered in SaaS mode.