Axel Herold, CLARIN-D 2016-09-26 http://hdl.handle.net/11858/00-203C-0000-002B-D2BF-2 clarin.eu:cr1:p_1357720977528 Dortmunder Chatkorpus 2.0 (Kernkorpus)

Resource http://hdl.handle.net/11858/00-203C-0000-002B-D2BE-4 LandingPage http://hdl.handle.net/11858/00-203C-0000-002B-D2C6-0 Dortmunder Chatkorpus 2.0 (Kernkorpus) Dortmund Chat Corpus 2.0 (Core) Dortmunder Chatkorpus 2.0 (Kernkorpus, 2016) Dortmund Chat Corpus 2.0 (Core, 2016) SpeechCorpus 2.0 production 2016-09-27 2000–2006 Germany Deutschland DE written chat logfiles Chat-Logfiles Das Dortmunder Chatkorpus (2009) wurde an der Technischen Universität Dortmund am Lehrstuhl für Linguistik der deutschen Sprache und Sprachdidaktik aufgebaut. Das Ziel des Korpusprojekts war es, eine Ressource für die Erforschung sprachlicher Besonderheiten und sprachlicher Variation in verschiedenen Nutzungskontexten internetbasierter Kommunikation zu schaffen. Das Korpus umfasst 478 Logfile-Dokumente mit ca. 140.000 Postings bzw. ca. 1 Mio. Tokens aus deutschen Chat-Mitschnitten, die die Nutzung von Chat-Software in verschiedenen Anwendungskontexten dokumentieren (Chat-Kommunikation im Freizeitbereich, Beratungschats, Chats im Kontext von Lernen und Lehren, moderierte Chats in Medienkontexten). Das Korpus ist in einem XML-Format (ChatXML) annotiert, das die folgenden Phänomene erfasst: (1) die grundlegende Struktur und Eigenschaften von Chat-Logfiles und -postings, (2) ausgewählte »netzsprachliche« Phänomene wie Emoticons, Aktionswörter, Adressierungen, Nicknames und Akronyme, (3) ausgesuchte Metadaten über die Chat-Nutzer. Seit 2005 wird das Korpus unter http://www.chatkorpus.tu-dortmund.de als XML-Version mit einer speziellen Such- und Auswertungssoftware zum Download bereitgestellt; außerdem können die Chat-Mitschnitte auch online eingesehen werden. Im Rahmen eines CLARIN-D-Kurationsprojektes (2015–2016, http://de.clarin.eu/en/curation-project-1-3-german-philology) wurde das Chatkorpus in eine TEI-konforme Repräsentation überführt und kuration (POS-Tagging nach STTS-2.0-alpha, Strukturkorrekturen, Anonymisierung, Metadaten). Das Kernkorpus des Chatkorpus-2.0 steht als freie Ressource allgemein zur Verfügung. The Dortmund Chat Corpus (2009) was compiled at the Technical University of Dortmund, Institute for German Language and Literature. provides a resource for research in computer mediated communication on the basis of chat logfiles. The resource comprises 478 logfiles containing approx. 140,000 postings and approx. 1 million tokens of German chat conversations. Different thematic domains are covered, such as leisure, consulting, teaching/lerning, and moderated media discussions. The corpus is stored in the proprietary ChatXML format and has been available for online querying at http://www.chatkorpus.tu-dortmund.de since 2005. In the course of a CLARIN-D curation project (2015–2016, http://de.clarin.eu/en/curation-project-1-3-german-philology) the Dortmund Chat Corpus was transformed into a standard-compliant TEI representation. It was also substantially curated (POS tagging according to STTS-2.0-alpha, structural corrections, anonymisation, metadata curation). The core component of the resulting Dortmund Chat Corpus 2.0 is provided publically as a free resource. free This resource can be downloaded from the CLARIN-D Repository at the BBAW at http://clarin.bbaw.de/ CreativeCommons Attribution 4.0 International (CC BY 4.0)

Berlin-Brandenburgische Akademie der Wissenschaften, DWDS, Jägerstraße 22/23, D-10117 Berlin

dwds@dwds.de Angelika Storrer Michael Beißwenger project leader of the original resource TU Dortmund Michael Beißwenger Angelika Storrer Eric Ehrhardt Axel Herold Harald Lüngen curator CLARIN-D hosting institution CLARIN-D Berlin-Brandenburg Academy of Sciences and Humanities 1 German

deu

speech text Monolingual Lemma Word form Speaker turn Text structure encoding 2922 token 295 posting UTF-8 Latin Latn text/plain application/xml