You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
atlanhq / camelot Public
Camelot: PDF Table Extraction for Humans
Notifications You must be signed in to change notification settings
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Go to fileCamelot is a Python library that makes it easy for anyone to extract tables from PDF files! Note: You can also check out Excalibur, which is a web interface for Camelot! Here's how you can extract tables from PDF files. Check out the PDF used in this example here.
>>> import camelot >>> tables = camelot.read_pdf('foo.pdf') >>> tables >>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html, sqlite >>> tables[0] >>> tables[0].parsing_report < 'accuracy': 99.02, 'whitespace': 12.24, 'order': 1, 'page': 1 >>>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html, to_sqlite >>> tables[0].df # get a pandas DataFrame!
Cycle Name | KI (1/km) | Distance (mi) | Percent Fuel Savings | |||
---|---|---|---|---|---|---|
Improved Speed | Decreased Accel | Eliminate Stops | Decreased Idle | |||
2012_2 | 3.30 | 1.3 | 5.9% | 9.5% | 29.2% | 17.4% |
2145_1 | 0.68 | 11.2 | 2.4% | 0.1% | 9.5% | 2.7% |
4234_1 | 0.59 | 58.7 | 8.5% | 1.3% | 8.5% | 3.3% |
2032_2 | 0.17 | 57.8 | 21.7% | 0.3% | 2.7% | 1.2% |
4171_1 | 0.07 | 173.9 | 58.1% | 1.6% | 2.1% | 0.5% |
There's a command-line interface too! Note: Camelot only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
The easiest way to install Camelot is to install it with conda, which is a package manager and environment management system for the Anaconda distribution.
$ conda install -c conda-forge camelot-py
After installing the dependencies (tk and ghostscript), you can simply use pip to install Camelot:
$ pip install camelot-py[cv]
$ git clone https://www.github.com/camelot-dev/camelot
and install Camelot using pip:
$ cd camelot $ pip install ".[cv]"
The Contributor's Guide has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.
You can check the latest sources with:
$ git clone https://www.github.com/camelot-dev/camelot
You can install the development dependencies easily, using pip:
$ pip install camelot-py[dev]
After installation, you can run tests using:
$ python setup.py test
Camelot uses Semantic Versioning. For the available versions, see the tags on this repository. For the changelog, you can check out HISTORY.md.
This project is licensed under the MIT License, see the LICENSE file for details.