REPACT: reference production and the evolution of person agreement: corpus-based typological and computational approaches


Quantitative methods have been gaining ground in comparative linguistics, most notably in dealing with global linguistic diversity and the evolution of language families as well as variation in language use, language change and linguistic diversification.

Here we deploy quantitative corpus‐linguistic methods to investigate the choice between referential null forms and overt person forms, and its interaction with the evolution of person agreement across diverse languages. This investigation targets a vexed puzzle in comparative linguistics, namely how the cross‐linguistically ubiquitous possibility of null reference (“pro‐drop”) can be reconciled with the likewise ubiquitous tendency towards the evolution of person agreement, which presupposes an increased use of overt person forms? Likewise unresolved to date is the discrepancy between the postulated cross‐linguistic uniformity of mechanisms of referential choice and variable cross‐linguistic preferences therein. Our project will be the first of its kind to tackle these quandaries through extensive cross‐linguistic corpus analyses that implement cutting‐edge quantitative methodologies from corpus linguistics, natural language processing as well as phylogenetic comparative methods. In doing so we will be able to determine the relative importance of discourse‐functional and structural factors on referential choice and the evolution of person agreement, as well as the role of phylogenetic, areal and socio‐cultural features of diverse languages in explaining cross‐linguistic diversity. This project will thus fill a glaring gap in usage‐based approaches to the typology of reference production and grammatical relations, and grammatical systems more generally, which have only rarely been addressed in first‐ hand empirical studies of actual language use. It will also contribute to explaining the development of grammatical complexity, and more generally to the areas of language processing and cognitive sciences, as well as human communication and interaction, and their socio‐cultural underpinnings.

The project will draw on corpora of unprecedented detail of annotation and cross‐linguistic breadth that have been designed precisely for research of the type pursued here, enabling the extraction of relevant information with a minimum of theoretical preconceptions, thus enabling evaluation of all relevant functional and structural factors. We focus on two language subfamilies and linguistic areas: for Oceanic languages, we will deploy phylogenetic comparative methods to assess the influence of areal and socio‐cultural vis‐à‐vis phylogenetic factors in referential choice and the evolution of person agreement throughout the family. For Iranian languages, we will draw on historical corpora to reconstruct referential choice and the evolution of person agreement, supplementing earlier philological work.

Envisaged outcomes are a series of academic articles in which we report unprecedented insights into the conditions of null reference and person forms within and across languages, and in relation to models of emerging person agreement. The annotated corpus will be made available to the scientific community and will constitute a central resource for future cross‐linguistic research on reference production and grammatical relations, and other areas of human language grammar and communication.


SNSF Project, PI: Stefan Schnell