Optical character recognition

Tesseract OCR - Tesseract Open Source OCR Engine (main repo).

Tesseract.js - JavaScript library that gets words in almost any language out of images.

keras-ocr - Packaged and flexible version of the CRAFT text detector and Keras CRNN recognition model.

Awesome Scanning - Curated list of awesome projects to simplify and improve paper scannning.

Scale Document - Secure platform for document processing.

Easy OCR - Ready-to-use OCR with 40+ languages supported including Chinese, Japanese, Korean and Thai. (HN)

OCRmyPDF - Adds an OCR text layer to scanned PDF files, allowing them to be searched.

FilingDB - Database of extracted and structured text from European company filings. Optimised for quant investors.

InvoiceNet - Deep neural network to extract intelligent information from invoice documents.

PaddleOCR - Rich, leading, and practical OCR tools that help users train better models and apply them into practice.

TextRecognitionDataGenerator - Synthetic data generator for text recognition.

Paperless - Index and archive all of your scanned paper documents.

Links