How does Canopy Extract work?
All the data we need is in table format
Our data is invariably in table format. Typically we need to extract the following 3 tables from each PDF document
- Holdings
- Transactions
- Current Account Credits and Debits
Canopy Extract is designed to extract any table (not just the 3 tables above) from any PDF document. In case you need to extract charts and images from a PDF document then Canopy Extract is not for you.
Extract needs the PDF document and an Excel Configuration file
To work the PDF Extract needs two files
- PDF document to be extracted (e-PDF is preferred, but paper scans will also work)
- Excel Configuration File (which describes the table to be extracted)

The Extract needs an Excel Configuration File (which describes the table to be extracted)
What does a Typical PDF document look like
Multilayer headers and nesting are the key issues while extracting data from a PDF table

Typical table in a Bank Statement
What does an Excel Configuration file look like
The Excel Configuration file for the above table is given below. Further details are on page Parts of a Config File

Excel Configuration file to extract the Holdings table in the image above
Updated over 1 year ago