Archive for October, 2011

the ‘real’ Jane Austen, textual study, & historical linguistics

Posted in Jane Austen, Linguistics on October 20th, 2011 by sschlitz – Comments Off

a Jane-ite trawling the internet would be confronted with an unimaginably large number of Austen biographies, fan sites, images, and editions – as well as an array of films and adaptations and even wilder extensions of the Victorian moral canon (in Google-ese: a search for Jane Austen returns about 22,400,000 results in 0.14 seconds).

While potentially tough to navigate, these many dueling Janes offer excellent fodder for serious discussion of digital editions and digital representations (esp. of primary source texts) and digital literacy in general, the kind of discussion that treats editions and digital representations as subjects for dissection, the kind that apply to textual studies and historical linguistics across genres and across historical periods. (and it’s Austen, so it’s fun)

For instance, the British Library publishes a beautiful, interactive, virtual version of Austen’s Volume II. The images are vivid; the pages are turnable; and if you’re unsure about the handwriting displayed on any page within the text, you can click the audio or text button to access an orally delivered or transcribed version. A wonderful resource – one I love to peruse and to share… but not one I would recommend (at least not in isolation) for serious literary or linguistic study.

When the British Library Volume II is assayed against the Jane Austen Fiction Manuscripts (JAFM) edition of Volume II, students see almost immediately that the British Library version is incomplete, displaying only Austen’s The History of England, not the complete set of writings she compiled within the vellum binding. And when assessing an identical page from each version, quickly noted are disparities in pagination, which in the British Library edition is inaccurate (albeit deliberately & I think for end-user utility) and therefore misleading; the British Library transcriptions are undoubtedly meant to be helpful, but they’re unmistakably editorial, failing to correspond linearly with the original, failing to represent line-end hyphens, and failing to represent the insertions, deletions, underscores (and so on) which punctuate Austen’s hand (and on this point, they’re counter- end-user utility).

A particularly astute student of mine yesterday pointed out (happily, before I even needed to) the presence of the Head Note in JAFM, noting that it detailed critical textual attributes such as size and provenance, details, she suggested, which would matter in understanding the language, the text, and the author. Excellent. But stuff you won’t find in most Austen versions…

To be fair, the intentions of the British Library and JAFM versions are distinct. And the access the British Library provides to Austen as well as Carroll, and Blake, and Mozart – freely – is invaluable. Still, the scholarly utility of the British Library’s virtual books is finite, and I wonder if it needn’t have been.

The in-class dissection continues in the next few days as I introduce a Project Gutenburg version of Austen’s Persuasion and we compare it to the JAFM edition version, and as we press on to distinguish our synchronic study of Austen from the kind of diachronic analysis enabled by historical corpora, where again, consideration of a source’s scholarly intentions and its utility is critical (and where again, given the technology available to editors, I’m not so sure creating editions and corpora as distinct entities with distinct ends is necessary).

An in-depth technical how of JAFM and similar sources is tough to cover in a history of English class, but a little of the how beside a lot of the why is important, since our understanding of virtually every period in English’s pre-twentieth-century linguistic and literary history relies on text, and our understanding is only as good as our texts.

txs virginia (x2) and linguistic visualization

Posted in Uncategorized on October 18th, 2011 by sschlitz – Comments Off

just wrapped up assisting EAPSU (entirely new to me but great to work with) members with their fall conference which was held at Bloomsburg. This year’s conference theme was English in the Digital Age: Developments in Language, Literacy,and Literature, and among the highlights were the number of undergraduates who not only attended but also weighed in during the Q/A and Jerome McGann‘s keynote and provocative follow-up discussion, which left me thinking at length about borderlands, digital object repositories, and bottom-up institutional change (something José Cruz alluded to in his data-rich & highly illuminating talk at Bloomsburg earlier in the week).

this spring we’ll be hosting another amazing guest from Virginia: Rebecca Wheeler… a scholar whose work – like McGann’s – is foundational, resonates deeply across humanities disciplines, and has profound significance for teachers and researchers and students.

finally — I use Mark Davies’ corpora as often as I can in my linguistics class. i’ve just been prepping to again integrate COHA within my history of the english language course and noted the incredibly awesome visualizations developed, using COHA data, by Martin Hilpert. These interactive graphs track frequency and morphosyntactic change — very, very cool:

digital archives and historical linguistics

Posted in Digital Archive, Linguistics on October 6th, 2011 by sschlitz – Comments Off

It is challenging to write about my work while I’m in the midst of it, since proposals, abstracts, and indeed project tasks and oversight require tremendous amounts of time and attention… Stealing away to write for a moment, however, makes a lot of sense just now as I immerse myself in the complementary tasks of teaching and doing linguistics while developing a digital archive.

Though I’m trained formally as a historical linguist and maintain a primary interest in understanding how and why language changes, for historical linguists to describe and understand language change, we need text, lots of text. And, to do our work well, we eschew normalized, edited, (prettified) texts, the kinds that are designed to promote readability (and/or to sell copies) — but often at the expense of textual preservation.

We seek out digital representations of texts as the texts were written – orthographic, morphological, lexical anomalies and all. Such texts, particularly in the form of parallel collections or parallel corpora, enable us to study (using computational methods) patterns of variation (orthographic, morphological, lexical, phonological) and to analyze and better understand language change.

The days of the index card and pencil method have long passed (okay, it’s only been thirty years or so, but that’s thirty in an era of Moore’s Law). Computational analysis has altered our ability to analyze linguistic data, our methods, and our findings. And it has exciting implications for socio-historical linguistic study as well.

But we cannot do this work well unless we have digital representations of texts as the texts were written (and here I use representation deliberately because a transcription or facsimile is not a primary source document in the strict sense; it is a representation of a text, and this is a critical distinction). Our methods for digital representation continue to improve, and we’re ever closer to advancing Matthew Driscoll’s ‘everything but the smell‘ model of text editing to include the smell…

Yet even with some cool new developments that enable us to leverage the aid of extra hands, this work is as time and labor-intensive as it is invaluable.

My own role in creating resources such as a digital archive that grants priority to textual fidelity, facilitates artefactual philology and historical linguistics, and disseminates an important cultural heritage collection — for me: an ideal hybridization.