digital archives and historical linguistics

It is challenging to write about my work while I’m in the midst of it, since proposals, abstracts, and indeed project tasks and oversight require tremendous amounts of time and attention… Stealing away to write for a moment, however, makes a lot of sense just now as I immerse myself in the complementary tasks of teaching and doing linguistics while developing a digital archive.

Though I’m trained formally as a historical linguist and maintain a primary interest in understanding how and why language changes, for historical linguists to describe and understand language change, we need text, lots of text. And, to do our work well, we eschew normalized, edited, (prettified) texts, the kinds that are designed to promote readability (and/or to sell copies) — but often at the expense of textual preservation.

We seek out digital representations of texts as the texts were written – orthographic, morphological, lexical anomalies and all. Such texts, particularly in the form of parallel collections or parallel corpora, enable us to study (using computational methods) patterns of variation (orthographic, morphological, lexical, phonological) and to analyze and better understand language change.

The days of the index card and pencil method have long passed (okay, it’s only been thirty years or so, but that’s thirty in an era of Moore’s Law). Computational analysis has altered our ability to analyze linguistic data, our methods, and our findings. And it has exciting implications for socio-historical linguistic study as well.

But we cannot do this work well unless we have digital representations of texts as the texts were written (and here I use representation deliberately because a transcription or facsimile is not a primary source document in the strict sense; it is a representation of a text, and this is a critical distinction). Our methods for digital representation continue to improve, and we’re ever closer to advancing Matthew Driscoll’s ‘everything but the smell‘ model of text editing to include the smell…

Yet even with some cool new developments that enable us to leverage the aid of extra hands, this work is as time and labor-intensive as it is invaluable.

My own role in creating resources such as a digital archive that grants priority to textual fidelity, facilitates artefactual philology and historical linguistics, and disseminates an important cultural heritage collection — for me: an ideal hybridization.