AI as Respondent: Extending Factorial Survey Methods with LLMs

Chen Peng , Bocconi University
Arnstein Aassve, Bocconi University
Nicolò Cavalli, Bocconi University

Large language models (LLMs) are increasingly used as “synthetic respondents” in social and population research, yet little is known about whether they reproduce the distributions and heterogeneity characteristic of human judgment. This study evaluates five state-of-the-art models (GPT-4, GPT-4.1, DeepSeek EN/CN, Claude 3.5) using validated factorial survey experiments on family ideals from China (n = 5,186) and the United States (n = 5,906). Models received the original vignette texts and rating instructions used in human surveys, while demographic personas were constructed from real respondent profiles to ensure realistic heterogeneity. LLMs broadly replicate the direction and rank order of human effects—assigning higher ratings to families with better communication, greater community respect, and higher income—but deviate in three systematic ways. First, response variance is sharply compressed (variance = 1.1–2.1 vs. humans 2.4–2.6), producing over-deterministic predictions with little disagreement. Second, models display normative inflation, overvaluing relational harmony and respect, treating these social-relational cues as moral indicators of “ideal families.” Third, subgroup differences are flattened. LLM simulations cannot uncover the gendered response pattern. These deviations suggest that uncalibrated LLM simulations may overstate normative consensus and understate contested trade-offs, biasing cross-national and policy analyses. We propose a calibration workflow combining scale alignment, subgroup variance restoration, and human-anchored response priors. By grounding LLM evaluation in validated experimental data, this study reveals both the promise and current limits of using LLMs as credible respondents in demographic and attitudinal research.

See paper

 Presented in Session 26. Flash Session Emerging Data Sources in Demography: Digital Traces, AI and Mobile Phone Data