A cognitive advantage for Zipfian distributions? Language learning is facilitated in language-like efficiency
Understanding how children learn to talk is (still) one of the most puzzling questions in cognitive science. In my work, I explore experience-based explanations for how children learn language, and why they seem better at it than adults. In the current project, we apply an information-theoretic perspective to issues in language acquisition, focusing on the impact of the kind of word distributions found in language. During their first year, infants learn how to segment speech, extract words, and start mapping them to objects. Both the words they hear, and the objects they see are structured in a particular way: words in language follow a Zipfian distribution with few very frequent words and many low frequency ones. In contrast, most lab-based investigations of word learning and segmentation present learners with uniform distributions, where all words (and objects) appear equally often, and are therefore less predictable than the ones children are naturally exposed to. Here, we use the information-theoretic measure of efficiency (defined as the ratio between the observed unigram entropy and the maximal entropy for the same set size) to characterise the distribution of words in children’s linguistic environment. We then investigate experimentally whether language learning is facilitated under similar conditions in both children and adults. Using corpus data, we find that child-directed speech has very similar efficiency values across languages. We then show that both children and adults show improved word segmentation when exposed to distributions that are similar to those of natural language in terms of their efficiency values. I discuss the theoretical and methodological implications of these results for lab-based studies of language learning, and their possible impact on the way languages are structured.