The Parsed Old and Middle Irish Corpus (POMIC)
The Parsed Old and Middle Irish Corpus (POMIC) is a corpus of Irish texts spanning the years from c. 700 to c. 1100. The current beta-version of the corpus consists of 14 texts which have been POS-tagged and syntactically parsed. The corpus is, however, a work in progress and future additions are envisioned which will include texts written at the end of the Middle Irish period (up to around 1200) as well as very early legal material that may, in some cases, be dated to the 7th century.
Tag-set and Parsing based on Penn Corpora
The tag-set and parsing annotation adopted for POMIC is intended to be broadly compatible with the Penn-group of corpora (see for instance the corpora of historical English data, the corpus of historical Icelandic data, the corpus of historical Portuguese data, the corpus of historical German data, and the corpus of historical Greek data, among some others).
The corpus files may be searched using the corpus query software CorpusSearch developed by Beth Randall. I include in the download list below a current .jar file in order to run CorpusSearch
# java -classpath CS_2.003.04.jar csearch/CorpusSearch
The annotation scheme adopted for the corpus is described in the manual (found below in the download list). This manual was developed as an adaptation of the manual (Release 2, 2010) for the Penn Corpora of Historical English written by Beatrice Santorini. The manual for POMIC is at present incomplete, but this will be rectified in future updates. I have tried to follow the Penn manual as closely as possible, but I have deviated from the Penn manual in order to show how POMIC differs from the Penn corpora.
In order to use the corpus you can download the following corpus text files (.psd) (encoded in Mac OS Roman, at present). As well as the current (incomplete) corpus manual (.pdf).
- Corpus files
- Cambrai Homily
- Additamenta from the Book of Armagh
- Lambeth Commentary on the Sermon on the Mount
- Old Irish Table of Penitential Commutations
- The Treatise on the Mass
- The Treatise on the Psalter
- The West Munster Synod
- The Monastery of Tallaght
- The Old Irish Homily
- The Vision of Laisrén
- Fingal Rónáin
- The Story of Finn and Gráinne
- The Irish prefaces from the Liber Hymnorum
- The Three Drinking Horns of Cormac úa Cuinn
- Corpus Manual
- Annotation Manual
- CorpusSearch file
I would like to thank Beatrix Färber (UCC, History Dept.) and Dr Hugh Fogarty (formerly of UCD, editor of TLH) for allowing me to use some of the texts from the CELT and TLH databases respectively in order to create POS-tagged and syntactically parsed versions. I would also like to thank Professor Liam Breatnach of the School of Celtic Studies for guidance in various matters relating to the corpus.
This work was done while holding an O’Donovan Scholarship at the School of Celtic Studies in the Dublin Institute for Advanced Studies (2011–2014)
If using this corpus for research purposes, please cite it as:
Lash, Elliott. 2014. The Parsed Old and Middle Irish Corpus (POMIC). Version 0.1. https://www.dias.ie/index.php?option=com_content&view=article&id=6586&Itemid=224&lang=en
In order to improve the corpus, I welcome any email sent to the following address:
Elliott Lash: email@example.com