Methodology and Limitations
Last updated: April 2, 2026
How extraction works
- PDFs are processed to recover readable document structure.
- When available, OCR is applied for scanned or image-based pages.
- Output is generated in markdown, plain text, and JSON formats.
Known limitations
- Low-resolution scans and skewed pages can reduce OCR accuracy.
- Complex merged-cell tables may require manual cleanup in edge cases.
- Highly stylized forms may lose some visual context in text output.
Data handling policy
- Uploaded input files are deleted after processing completes.
- Generated outputs are retained for 24 hours, then purged automatically.
- This page summarizes behavior at the time of the last update above.