Several effects discovered in Psychology and other social sciences don’t replicate consistently. Even when replications of a phenomenon succeed, they often are heterogeneous nonetheless - the effects vary more strongly than assumed under typical statistical models. Similar or largely “identical” studies do not necessarily produce the same results. The German Research Foundation (DFG) is funding several projects under the priority programme MetaRep (SPP 2317) to support research that examines implications of this fact. Within this priority programme, we established the project DRIPHT - Direct Replications in Psychology and Heterogeneity (RE 2762/3-1), in cooperation with the Ludwig-Maximilian-University in Munich. The project is funded with about 200,000 Euro.
We aim to examine possible methodological issues surrounding the measurement of heterogeneity and to resolve them where it is appropriate. In the direct replications in Psychology, heterogeneity is mainly being identified in so-called standardized effect sizes, which give rise to a substantial amount of these issues. Such effect sizes may be (systematically) biased by statistical artefacts. For example, variance within samples has a specific effect on the size of a standardised effect size. Often, we have reason to assume that repeating the same protocol at different locations or points in time, may lead to different sample characteristics, such as variances. Thereby, standardised effect sizes in Psychology may additionally be affected by issues outside of the effect in question - leading to heterogeneity in a meta-analysis nonetheless. Similarly, variation in measurement accuracy (reliability) may lead to heterogeneity in standardised effect sizes.
In psychology, various large-scale replication series have emerged in recent years, providing the data for our research project. Our goal is to re-analyze these existing replication projects, such as ManyLabs (e.g., Klein et al., 2018) and Registered Replication Reports (e.g., Wagenmakers et al., 2016). In doing so, we will explore the extent to which statistical artifacts, such as varying standard deviations or measurement (in)precision, can explain heterogeneity in psychological effects.