CLARIAH+ VOC Use Case
The repositories listed here are all work in progress.
Preprocessing for NER
The following repositories contain code to preprocess data for the Clariah WP6 VOC use case, for NER annotations in particular.
- voc-clariah-scripts: a first set of scripts for text and index extraction from VOC pre-TEI missives
- TeiReader: extraction of TEI texts to raw text for the Chronicles and CLARIAH-PLUS VOC use case projects
- voc-missives: extraction of TEI-formatted missives to NAF and Conll, and integration of manually annotated entities
- voc-missives-data: companion data repository to voc-missives, containing the Generale Missiven corpus, and manual named-entity annotations
Entity identification
- entity-identification-from-scratch: entity identification by clustering