Conceptualization in reference production: Probabilistic modeling and experimental testing.

In psycholinguistics, there has been relatively little work investigating conceptualization–how speakers decide which concepts to express. This contrasts with work in natural language generation (NLG), a subfield of artificial intelligence, where much research has explored content determination during the generation of referring expressions. Existing NLG algorithms for conceptualization during reference production do not fully explain previous psycholinguistic results, so we developed new models that we tested in three language production experiments. In our experiments, participants described target objects to another participant. In Experiment 1, either size, color, or both distinguished the target from all distractor objects; in Experiment 2, either color, type, or both color and type distinguished it from all distractors; In Experiment 3, color, size, or the border around the object distinguished the target. We tested how well the different models fit the distribution of description types (e.g., “small candle,” “gray candle,” “small gray candle”) that participants produced. Across these experiments, the probabilistic referential overspecification model (PRO) provided the best fit. In this model, speakers first choose a property that rules out all distractors. If there is more than one such property, then they probabilistically choose one on the basis of a preference for that property. Next, they sometimes add another property, with the probability again determined by its preference and speakers’ eagerness to overspecify. (PsycINFO Database Record (c) 2019 APA, all rights reserved)