SAND interviews as TEI-conformant XML

In the spirit of making our research data available to the linguistic community in open formats, we are pleased to offer the transcriptions of the oral SAND interviews as Text Encoding Initiative (TEI)-conformant XML.

The overall structure of the corpus is modeled on the structure of the Newcastle Electronic Corpus Of Tyneside English (NECTE): one XML file per interview, with each interview represented in the overarching <teiCorpus.2> document by an entity reference.

The custom sand.dtd DTD was created with the TEI Pizza Chef tool, choosing the Transcriptions of Speech tag set with elements for Linking, Analysis and Corpora. The XML was checked for validity with xmllint.

At the moment the corpus contains only the interviews itself, not yet the part of speech-taggings or lemmatizations.

Download: SAND corpus (.zip file, 7.2 MB)

Many thanks to Hermann Moisl (University of Newcastle upon Tyne) for his kind advice in TEI matters.

Questions about the SAND corpus can be addressed to Jan Pieter Kunst (concern-infrastructure@di.huc.knaw.nl) for technical inquiries or to Sjef Barbiers for linguistic inquiries.