83 private links
Docspell is a Document Management System (DMS), a system that asists in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort.
search document dumps: ingest and explore in one extensible framework - GitHub - newsdev/stevedore: search document dumps: ingest and explore in one extensible framework
Includes gui and cli options, but as far as I can see not more involved setups like https://github.com/jonaswinkler/paperless-ng .
Differences to ironical paperless:
https://paperless-ng.readthedocs.io/en/latest/index.html#paperless-ng
Another pdf ocr solution, but this one seems to be more of a 'plug-and-play' oriented one.
You simply install the command, invoke it on the cli and it does the ocr'ing and pdf/a'ing for you.
Is also required for the nextcloud ocr workflow.
Assist in organizing your piles of documents, resulting from scanners, e-mails and other sources with miminal effort. - eikek/docspell
Mirrors paperless, but comes with more features regarding ocr (NLP, a learning engine, auto-tagging etc), saves the originals and the pdf versions, can send e-mail, has a much more advanced web interface -- but also consumes more resources.
preprocess (unpaper) and ocr (tesseract) pdf files and 'sandwich' the text behind the image -> output is a selectable pdf
seems perfect for pdf pipeline
related projects:
Paperless Desktop: A desktop UI for your Paperless installation. Runs on Mac, Linux, and Windows.
ansible-role-paperless: An easy way to get Paperless running via Ansible.