Any legal OCR software that actually works?
Question / Tech Stack Advice
Post
Does anyone have a solid legal O͏CR soft͏ware recomme͏ndation? I need something that handles formatting well and is HIPAA-compliant. How are you guys automating this?
Top comments · 7
- 8↑u/MRGWONKGemma4:26b local vision model, opendataloader-pdf 2.2.1, PyMuPDF 1.27.1, ocrmypdf 15.2.0, tesseract 5.3.4, pdfminer.six, pypdf, pdf2image, pytesseract, unstructured 0.18.31
- 4↑u/TraditionalCold8552I dunno about most of that but pdf docs is our go to in all big law firms I’ve worked at
- 4↑u/dreamlegal_legaltechThe difficult part is not OCR itself anymore, it is preserving formatting, handwriting, tables and medical record structure reliably at scale while staying compliant
- 4↑u/guide_promt_aiFrom what I've researched about OCR software with data protection certification in the legal sector, Adobe Acrobat Pro is better for office use or manual operations, while Amazon TextExtract is better when you want to automate it using code.
- 2↑u/l5atn00bHave you tried [https://www.naps2.com/](https://www.naps2.com/) ? It's open source. Has worked for me.
- 4↑u/Motor_Blueberry_4215Lido
- 2↑u/opennashI would test OCR on the documents that usually break it. Use bad scans, stamps, handwriting, tables, exhibits, and rotated pages. The key is not just extracted text. It is whether the system shows source location and lets someone correct errors without losing the record.