Machine learning requires the comparative analysis of data in as large a quantity and as high a quality as possible.
On this basis, automatic recognition of printed texts (OCR) has improved significantly in recent years. However, machine reading of handwritten texts (HCR) is still in its infancy. There is a lack of training data in the form of correct transcriptions with corresponding image sections of individual lines and terms.
Within the framework of source editions and indexing projects, larger text corpora are transcribed. Heterogeneous handwritten documents are, among other things, the subject of palaeography courses or individual local and family history studies. The Transkribus platform will collect the files and data required for the further development of HCR and generate text recognition models for individual languages and comparable writing styles.
In addition, the platform offers many useful functions for reading and editing various documents.