Introduction
The Dotprod API is organized around REST principles. Our API has predictable resource-oriented URLs, accepts form-encoded request bodies, returns JSON-encoded responses, and uses standard HTTP response codes, authentication, and verbs.
The API key you use to authenticate the request determines whether the request is live mode or test mode.
Resources
- Read the API reference
- View the current API status (soon)
- Learn about our policies
Support
If you have any questions or need help, feel free to reach out to our support team at support@dotprod.com (mailto:support@dotprod.com) or visit our support page.
Changelog
Keep up-to-date with the latest changes and updates to the Dotprod API.
Coming in next release
Implemented enhancements:
- New GPU-based parsing engine (up to +20% accuracy in our test datasets)
v0.7.0 (2024-06-11)
Implemented enhancements:
- Experiment with visually describing files using a Visual Language Model
- New
/v1/metadata
endpoint to easily extract relevant information (mime type, creation date, number of pages) - Add a
page
parameter to the/v1/preview
endpoint, to generate the preview of specific slides for example
v0.6.0 (2024-05-27)
Implemented enhancements:
- Rewrite our whole parsing engine improving the speed by 5x
- Improve the markdown conversion with better font metric estimations
- Support nested figures in PDF files
Fixed bugs:
- Same level titles are now correctly detected
- Add edge cases for ill-formatted date formats in old PDF generators
v0.5.0 (2024-04-29)
Implemented enhancements:
- Improve the title detection
- Support rotated files
- Support Microsoft Word
.doc
files - Fail early with a
503
status code when we are at capacity - Implement a detection filter for scanned images to increase the parsing speed at 99% recall
v0.4.0 (2024-04-19)
Implemented enhancements:
- Implement a custom ligature resolution algorithm
- Increase the variety of files in our test datasets
v0.3.0 (2024-04-17)
Implemented enhancements:
- Support reading PDFs with text inside figure constructs, usually old PDF files generated by EvoPDF
- Track various processing times to measure our improvements
- Experiment with processing pdf pages in parallel
Fixed bugs:
- Low-resolution images in PDF files used to be upscaled to A4 sizes, slowing down the processing
- Reading order was broken with overlapping text lines
v0.2.0 (2024-04-12)
Implemented enhancements:
- Optimize our reading order algorithm to run in less than 20ms per page
v0.1.0 (2024-04-10)
First commit:
- Turn our Documentalist parsing engine into its own API.