A curated global dataset of social contact between diverse language communities

Kashima, Eri and Di Garbo, Francesca and Raatikainen, Oona and Forkel, Robert and Avelino, Rosnataly and Beck, Sacha and Berge, Anna and Blanco Pena, Ana and Bowden, Ross and Brid, Nicolas and Brincat, Joseph M. and Carpio, Maria Belen and Cobbinah, Alexander and Cuneo, Paola and Doyiso, Deginet Wotango and Fehn, Anne-Maria and Gholami, Saloumeh and Ghosh, Arun and Gibson, Hannah and Hall, Elizabeth and Hannss, Katja and Haynie, Hannah and Jacka, Jerry J. and Jenny, Mathias and Kowalik, Richard and Kulkarni-Joshi, Sonal and Mous, Maarten and Mendoza, Marcela and Messineo, Cristina and Moro, Francesca Romana and Nater, Hank and Ocasio, Michelle and Olsson, Bruno and Ospina Bozzi, Ana Maria and Paredes, Agustina and Phiri, Admire and Quint, Nicolas and Sandman, Erika and Schokkin, Dineke and Singer, Ruth and Smith-Dennis, Ellen and Souag, Lameen and Sulistyono, Yunus and Treis, Yvonne and Urban, Matthias and Vaughan, Jill and Ziegelmeyer, Georg and Zikmundova, Veronika and Napoleao De Souza, Ricardo and Sinnemaki, Kaius (2025) A curated global dataset of social contact between diverse language communities. SCIENTIFIC DATA, 12 (1): 1958. ISSN , 2052-4463

Full text not available from this repository. (Request a copy)

Abstract

The GramAdapt Social Contact Dataset is a curated dataset of 34 language pairs with qualitative and quantifiable data on social interaction and aspects of societal multilingualism. The language pairs were sampled globally to represent the world's linguistic diversity. The dataset can be used to interrogate the social dimensions of language contact independently or in conjunction with appropriate linguistic data. The data were collected by distributing a questionnaire to experts who have experience with either one or both of the language communities of a pair. The data represent subjective expert assessments based on choices from predetermined answers which can be quantified. Authors 1, 2 and 3 manually checked the response to identify possible misjudgments or misunderstandings. This results in a dataset containing 13,493 data points. This dataset is a first of its kind in the field of linguistics, built upon wide findings from sociolinguistics, historical linguistics, psycholinguistics, and linguistic anthropology.

Item Type:	Article
Uncontrolled Keywords:	ENDANGERMENT;
Subjects:	400 Language > 400 Language, Linguistics
Divisions:	Languages and Literatures > Institut für Information und Medien, Sprache und Kultur (I:IMSK) > Lehrstuhl für Allgemeine und vergleichende Sprachwissenschaft
Depositing User:	Dr. Gernot Deinzer
Date Deposited:	25 Mar 2026 07:42
Last Modified:	25 Mar 2026 07:42
URI:	https://pred.uni-regensburg.de/id/eprint/68034

Actions (login required)

View Item