DIAS Headquarters, 10 Burlington Road - D04C932 contact@dias.ie 00353 (0) 16140100

The Parsed Old and Middle Irish Corpus (POMIC)

Introduction

The Parsed Old and Middle Irish Corpus (POMIC) is a corpus of Irish texts spanning the years from c. 700 to c. 1100. The current beta-version of the corpus consists of 14 texts which have been POS-tagged and syntactically parsed. The corpus is, however, a work in progress and future additions are envisioned which will include texts written at the end of the Middle Irish period (up to around 1200) as well as very early legal material that may, in some cases, be dated to the 7th century.

Tag-set and Parsing based on Penn Corpora

The tag-set and parsing annotation adopted for POMIC is intended to be broadly compatible with the Penn-group of corpora (see for instance the corpora of historical English data, the corpus of historical Icelandic data, the corpus of historical Portuguese data, the corpus of historical German data, and the corpus of historical Greek data, among some others).

CorpusSearch

The corpus files may be searched using the corpus query software CorpusSearch developed by Beth Randall. I include in the download list below a current .jar file in order to run CorpusSearch
# java -classpath CS_2.003.04.jar csearch/CorpusSearch

Corpus Manual

The annotation scheme adopted for the corpus is described in the manual (found below in the download list). This manual was developed as an adaptation of the manual (Release 2, 2010) for the Penn Corpora of Historical English written by Beatrice Santorini. The manual for POMIC is at present incomplete, but this will be rectified in future updates. I have tried to follow the Penn manual as closely as possible, but I have deviated from the Penn manual in order to show how POMIC differs from the Penn corpora.

Downloads

In order to use the corpus you can download the following corpus text files (.psd) (encoded in Mac OS Roman, at present). As well as the current (incomplete) corpus manual (.pdf).

Corpus files
Corpus Manual
Annotation Manual
CorpusSearch file
CS_2.003.04.jar

Acknowledgements

I would like to thank Beatrix Färber (UCC, History Dept.) and Dr Hugh Fogarty (formerly of UCD, editor of TLH) for allowing me to use some of the texts from the CELT and TLH databases respectively in order to create POS-tagged and syntactically parsed versions. I would also like to thank Professor Liam Breatnach of the School of Celtic Studies for guidance in various matters relating to the corpus.

This work was done while holding an O’Donovan Scholarship at the School of Celtic Studies in the Dublin Institute for Advanced Studies (2011–2014)

Citation

If using this corpus for research purposes, please cite it as:

Lash, Elliott. 2014. The Parsed Old and Middle Irish Corpus (POMIC). Version 0.1. https://www.dias.ie/index.php?option=com_content&view=article&id=6586&Itemid=224&lang=en

Contact info

In order to improve the corpus, I welcome any email sent to the following address:

Elliott Lash: eljlash@gmail.com