|
|
Max Thaning , Swedish Institute for Social Research
Siddartha Aradhya, Stockholm University
Surveys are fundamental to demographic and social science research, yet missing data and sample selection bias compromise basic descriptive estimates like prevalences and population means. Despite a conventional view that descriptive research requires no causal reasoning, we argue that even the most fundamental descriptive analysis can be informed by causal inference when data are incomplete or unrepresentative. We use directed acyclic graphs for missing data (m-DAGs), methods combing external data with survey data, Probabilistic Bias Analysis to show how to recover target prevalences using Monte Carlo simulations. We evaluate ideal-type data scenarios ranging from simplistic Missing Completely At Random (MCAR) to complicated Missing Not At Random (MNAR). Conventional complete case (or list-wise deletion) analysis produce severe bias across all scenarios. Multiple imputation recovered unbiased estimates only when data were Missing At Random (MAR) with fully observed confounders. De-biased estimation with external data (e.g., census information) successfully recovered true prevalence across all MAR and sample selection scenarios. For the taxing Missing-Not-At-Random (MNAR) conditions, Probabilistic Bias Analysis (PBA) with simualted validation studies recovers the true value. These findings demonstrate that causal inference is critical for descriptive demographic research whenever missingness or selection occur - which is a nearly universal condition in survey practice. By leveraging m-DAGs, external data, and sensitivity analysis, researchers can recover valid population estimates for key demographic indicators.
Presented in Session P8. Demographic Trends, History, Data and Methods