Digital Archive

janus

Posted in Digital Archive, Digital Humanities, MBDA on January 18th, 2012 by sschlitz – Comments Off

Endings and Beginnings. January is tough. It hasn’t snowed enough yet for us to test our skis. The (packed, productive, working) holiday is over. And my spring term to do  list is HUGE. This semester the MBDA project is pushing to advance our work with crowdsourced editing. Below is a very brief section of a recent grant application (funding outcome pending) which sketches what we’re up to. Priority this term: serious work on the metadata editing (for MBDA this mainly means Dublin Core) aspect of our participatory editing tool, a project innovation so essential yet cool I can hardly wait to test it!

Humanities Problem

Placing the preservation of works that embody a key component of our cultural heritage exclusively in the hands of a few highly specialized scholars is inefficient and impractical. And it is unnecessary. As Crane has argued, “We need editors—lots of them […] We have vast amounts of work before us—far more than a relative handful of salaried academics can accomplish and plenty accessible to our students and to those who love a given subject but maintain a day job doing something else.”

Over the past decade, there has been immense interest in finding ways to involve the larger community in activities that have traditionally been the domain of scholarship. Documentary editing is among the most challenging (for technical reasons) yet also among those that offer the greatest potential benefit (because of the opportunity to take advantage of local expertise and interest). As interest continues to grow, there is tremendous need for tools and methods to advance the emerging practices and to carve new inflections that expand their utility.

At the onset of this project, the manuscript and typescript papers and the photographs in the Martha Berry Collection were extant exclusively in their original print format. Held in c. 160 container boxes maintained by the Berry College Archives, the collection was in fragile condition, largely inaccessible, and, like many important archival collections, urgently in need of digitization (scanning is now in progress). In light of limited resources, the task of editing the collection, while daunting, presented an opportunity to explore methodological innovations and to pioneer new methods of participatory editing that would support not only the Berry project but also creators and stewards of other collections who are confronted with similar resource challenges (e.g. restrictive budgets, limited staff, a large number of unedited documents, and – at the same time – a community of prospective editors but the absence of a means to enlist them for project involvement).

This project, a collaboration between Bloomsburg University and Berry College, utilizes thousands of documents from the incipient Martha Berry Digital Archive as a pilot implementation in the development of Crowd-Ed, a freely available, open source web application through which members of the community, as well as students and scholars, can use custom forms and game-inspired user interfaces to access a repository of archival documents in order to 1) engage in reviewing, categorizing, and entering metadata to describe these documents and 2) engage in transcription and basic structural (greeting, paragraph, closing) and semantic (name, date, place) markup.

 

Posted in Digital Archive on December 4th, 2011 by sschlitz – Comments Off

a quiet hive of activity

digital archives and historical linguistics

Posted in Digital Archive, Linguistics on October 6th, 2011 by sschlitz – Comments Off

It is challenging to write about my work while I’m in the midst of it, since proposals, abstracts, and indeed project tasks and oversight require tremendous amounts of time and attention… Stealing away to write for a moment, however, makes a lot of sense just now as I immerse myself in the complementary tasks of teaching and doing linguistics while developing a digital archive.

Though I’m trained formally as a historical linguist and maintain a primary interest in understanding how and why language changes, for historical linguists to describe and understand language change, we need text, lots of text. And, to do our work well, we eschew normalized, edited, (prettified) texts, the kinds that are designed to promote readability (and/or to sell copies) — but often at the expense of textual preservation.

We seek out digital representations of texts as the texts were written – orthographic, morphological, lexical anomalies and all. Such texts, particularly in the form of parallel collections or parallel corpora, enable us to study (using computational methods) patterns of variation (orthographic, morphological, lexical, phonological) and to analyze and better understand language change.

The days of the index card and pencil method have long passed (okay, it’s only been thirty years or so, but that’s thirty in an era of Moore’s Law). Computational analysis has altered our ability to analyze linguistic data, our methods, and our findings. And it has exciting implications for socio-historical linguistic study as well.

But we cannot do this work well unless we have digital representations of texts as the texts were written (and here I use representation deliberately because a transcription or facsimile is not a primary source document in the strict sense; it is a representation of a text, and this is a critical distinction). Our methods for digital representation continue to improve, and we’re ever closer to advancing Matthew Driscoll’s ‘everything but the smell‘ model of text editing to include the smell…

Yet even with some cool new developments that enable us to leverage the aid of extra hands, this work is as time and labor-intensive as it is invaluable.

My own role in creating resources such as a digital archive that grants priority to textual fidelity, facilitates artefactual philology and historical linguistics, and disseminates an important cultural heritage collection — for me: an ideal hybridization.

 

 

imaging and linguistics

Posted in Digital Archive on September 5th, 2011 by sschlitz – Comments Off

Finishing up a weekend of tasks related to the Martha Berry Digital Archive and one crucial step (which might be considered less significant than others but isn’t) which demands continual attention is the imaging process. Imaging of the Martha Berry Collection began at Berry several months ago, is being managed by a wonderful group of staff and students, and (given the size of the collection) will continue indefinitely. The creation of image derivatives, on the other hand, is a task managed remotely by the programmer and by me.

What does this mean? First, TIFFs and JPEGs. Document scans are saved initially by Berry staff as TIFF files; this format ensures high quality representation of the material source document. But TIFFs are large, so we don’t upload these image files to the digital archive; instead, we create compressed image derivatives: JPEGs. Managed manually, this would be a tedious, sigh inducing (or worse) process. Automated (and here’s where it gets good), the process is called transmogrification. In brief, we use a script to transmogrify TIFFs, generating (or in minecraft-ese, a jargon every parent of a gamer who has recently spent a rainy three-day holiday weekend alone with said gamer will know, spawning) JPEG derivatives. Transmogrification is quick and enables batch deriv generation so that we can upload lots of derivs to the archive at once while retaining the TIFFs to support preservation of the primary source documents.

OED-cited attestations indicate that the verb transmogrify and the derivative noun transmogrification carry pejorative meaning (see below), but in programming and artistic contexts, the term and its derivative, as well as the clipped variant mogrify (which is widely used in place of transmogrify with respect to imaging but has yet to make an appearance in the OED), shoulder no such semantic burden. From the perspective of someone who is working with many thousands of images, the terms and the process they describe are to be celebrated.

And, from a linguistic perspective, square in the midst of work on a digital archive and image derivs, an elegant example of semantic amelioration.

(here some suggest that mogrify is not a word but the c. 147,000 Google search returns tell a different story, as do the mogrified images I just uploaded to the archive, even if lexicographers haven’t yet caught up. But — I’m optimistic and will be keeping an even closer eye than usual on OED’s quarterly updates.)


 

 

 

 

MBDA in the news

Posted in Digital Archive on July 11th, 2011 by sschlitz – Comments Off

The Martha Berry Digital Archive (MBDA), the collaborative DH project I’m directing between Bloomsburg University and the Berry College Memorial Library, the Berry College Archives, Oak Hill and the Martha Berry Museum, Berry History and English faculty, and Berry and Bloomsburg students – plus, with funding from a Bloomsburg Research grant, programmer Garrick Bodine – was featured in a recent article in the Rome News Tribune: Pieces of the Past: A bold new project aims at digitizing thousands of Martha Berry’s letters and making all of them available to the public.

 

MBDA Project

Posted in Digital Archive, Linguistics on June 27th, 2011 by sschlitz – 1 Comment

Just returned from an excellent trip to Berry College where I spent the last two weeks working on the Martha Berry Digital Archive Project. I initiated the project in 2010 and, working collaboratively with Berry Library and Museum staff and students, colleagues in History and English, and a programmer from Penn State, I’m in the process of making the documents in the Martha Berry Collection, including over 160 fileboxes of manuscript and typescript papers (i.e. many, many thousands of papers), freely available for cultural, historical and linguistic research.

The collection is arranged chronologically and includes personal and business letters written to Martha Berry and/or the Berry Schools between 1885 and 1941. Berry was an extraordinary record keeper, and typed copies of virtually every letter composed in response to those received have been retained within the collection, offering a rich and complete picture of the discourse between correspondents.

When letters are reviewed by year, the collection offers a synchronic snapshot of the school, of Berry, of her correspondents, and of the milieu, linguistically as well as historically. When writings are studied across decades, the collection chronicles the longstanding friendships and business relationships maintained by Berry (e.g. decades-long correspondence between Berry and Clara Ford, Berry and Emily Vanderbilt Hammond, Berry and Corra Harris), narratives which are in many cases more compelling in their revelations about those writing to Berry than they are about Berry herself, as, while Berry largely remains on message, focusing her communications primarily on the development of the Berry Schools (an impressive testament to her unwavering devotion to the schools), her correspondents are far more generous in their personal revelations, sharing insights ranging from concerns about the war to educational reform to family gossip.

The scope and range of the collection - from letters imploring Berry to take on a ‘poor child’ to a letter calling on her to participate in a protest against a ladies magazine which published a beer advertisement extolling the virtues of the beverage as a children’s tonic and therein as the key to a calm and happy home to one which criticizes her for the cut of her neckline - yields a fascinating subject. I look forward to sharing more soon as our work continues!

 

Primary Sources, Essential Dialogues, and Cool Tools

Posted in Digital Archive on March 8th, 2011 by sschlitz – Comments Off

Classes have resumed (in fact, somehow it’s  nearly mid-term), and I’ve been learning the strengths of a coterie of new students. To me, this is among the most interesting aspects of every new semester — an excellent opportunity to modify course materials and assignments in recognition of students’ abilities and interests.

On a research note (though not exclusively research, as it’s difficult to separate my current long-term project from pedagogy), I’ve been deeply immersed in the development of a digital archive which will disseminate documents (manuscript and typescript papers & photographs) from the Martha Berry Collection. Berry is my undergraduate alma mater, so – needless to say – this project is on some level a labor of love. But, much more importantly, Martha Berry (MB) is a serious and worthy subject; her work as an educational innovator (founder of Berry Schools  in 1902, later Berry College) has had a profound impact on education in the U.S.  as well as in Georgia. As I examine current educational trends, it fascinates me to note that Berry’s educational theories and practices – which are experiential, consistently calling for engagement of the “head, heart, and hands” – have long anticipated twenty-first century participatory learning models.

A lifelong philanthropist who devoted her life and fortune to the education of the children in the mountain regions of northeast Georgia, Berry moved among a highly influential circle of benefactors and correspondents (from Margaret Sanger and Booker T. Washington to Henry Ford and Pres. Theodore Roosevelt), and the documents in the Berry Collection are rich in history — a veritable turn-of-the-century must read.

Working with manuscripts is among the research tasks I find most rewarding. Indeed, reading and editing primary source documents is as close as I think we can come to holding a conversation with the documents’ authors, and whether working with texts authored by putative forgers, Icelandic grandmothers, or Martha Berry, the kind of dialogue ms. editing espouses has, at least for me, without fail proven extraordinarily worthwhile.

Work on this archive has inspired the development of a new tool, Crowd-Ed, which will enable community-driven (i.e. crowdsourced) editing of the documents in the MB archive (and collections in general — the tool is being designed as open source, freely available, and highly extensible to facilitate adoption by other projects), and my colleagues and I envision (pending funding) implementation during spring and summer 2012. Crowd-Ed will be integrated within the archive itself (which leverages Omeka for the publishing interface and Fedora Commons for the repository), and we are engrossed in development of an archive design model that will not only disseminate the works in the MB Collection, but will also, via Crowd-Ed, invite students and community members (as well as academics) to engage in participatory editing.

Omeka is as easy-to-use and as flexible as Fedora is seriously robust – both are essential to the archive project and highly recommended; one more useful tool that we’ve adopted for non-technical project management is BuddyPress; clearly I’m late on the scene with this one, but it’s proving indispensable as a collaboration tool embedded within the project’s public site.