Introduction

The Dotprod API is organized around REST principles. Our API has predictable resource-oriented URLs, accepts form-encoded request bodies, returns JSON-encoded responses, and uses standard HTTP response codes, authentication, and verbs.

The API key you use to authenticate the request determines whether the request is live mode or test mode.

Dotprod manages its own set of AI models on its Paris cluster hosted on Scaleway. This allow us to offer a high quality of service and the highest level of privacy for our clients.

Resources


Support

If you have any questions or need help, feel free to reach out to our support team at support@dotprod.com (mailto:support@dotprod.com) or visit our support page.


Changelog

Keep up-to-date with the latest changes and updates to the Dotprod API.

Coming in next release

Implemented enhancements:

  • New GPU-based parsing engine (up to +20% accuracy in our test datasets)

v0.7.0 (2024-06-11)

Implemented enhancements:

  • Experiment with visually describing files using a Visual Language Model
  • New /v1/metadata endpoint to easily extract relevant information (mime type, creation date, number of pages)
  • Add a page parameter to the /v1/preview endpoint, to generate the preview of specific slides for example

v0.6.0 (2024-05-27)

Implemented enhancements:

  • Rewrite our whole parsing engine improving the speed by 5x
  • Improve the markdown conversion with better font metric estimations
  • Support nested figures in PDF files

Fixed bugs:

  • Same level titles are now correctly detected
  • Add edge cases for ill-formatted date formats in old PDF generators

v0.5.0 (2024-04-29)

Implemented enhancements:

  • Improve the title detection
  • Support rotated files
  • Support Microsoft Word .doc files
  • Fail early with a 503 status code when we are at capacity
  • Implement a detection filter for scanned images to increase the parsing speed at 99% recall

v0.4.0 (2024-04-19)

Implemented enhancements:

  • Implement a custom ligature resolution algorithm
  • Increase the variety of files in our test datasets

v0.3.0 (2024-04-17)

Implemented enhancements:

  • Support reading PDFs with text inside figure constructs, usually old PDF files generated by EvoPDF
  • Track various processing times to measure our improvements
  • Experiment with processing pdf pages in parallel

Fixed bugs:

  • Low-resolution images in PDF files used to be upscaled to A4 sizes, slowing down the processing
  • Reading order was broken with overlapping text lines

v0.2.0 (2024-04-12)

Implemented enhancements:

  • Optimize our reading order algorithm to run in less than 20ms per page

v0.1.0 (2024-04-10)

First commit:

  • Turn our Documentalist parsing engine into its own API.