What is P-Hacking

P hacking


P-hacking is a bias in statistics that occurs when researchers collect or evaluate data for such a long time until previously insignificant results become significant.

There is a growing concern that many scientific results could be false positives (Barch & Yarkoni, 2013; Jager & Leek, 2014; Nyberg, Graham, & Stokes, 1977). It is argued that current scientific practices provide strong incentives for scientists, mainly to publish statistically significant results. Journals with high impact factors in particular publish statistically significant studies with an above-average frequency (Begg & Berlin, 1988; Dwan et al., 2008; Rosenthal, 1979; Song, Eastwood, Gilbody, Duley, & Sutton, 2001). Scientists are also often rated how much and in which journals they have published.

All of this creates incentives to publish statistically significant results.

P-hacking thus represents a great danger for the acquisition of scientific knowledge. Since there are hardly any incentives in science to replicate scientific studies, such false results can persist for years and have a negative impact on future research.

Examples of P hacking

P-hacking is therefore often used as selection bias. selection bias) or inflation bias inflation bias) because the real effect found was not published. One way to do P-Hacking would beperform the same experiment more than once. Since most sciences are tested at a significance level of 5%, one would assume that almost 5% of scientific papers are false-positive. If you have an experiment, the results of which did not become significant, and you repeat it 20 times, you would expect it to become significant (false-positive) in one run. Instead of specifying all 20 runs, however, only the one significant run is reported. This approach is possible above all if sufficient financial resources are available and a non-significant result could have economic consequences.

Another way to do P-Hacking would be to use a variety of statistical methods in the evaluation, but only to report those that became statistically significant. According to scientific conventions, the type of statistical evaluation procedure and the examined variables should also have been determined before the evaluation.

Further examples of P-Hacking would be:

  1. Collect different variables, perform different analyzes on these variables, but only report significant variables and analyzes in the end
  2. Collect several dependent variables, but only report significant results
  3. Set variables as covariates after analysis
  4. To see which test persons had the greatest influence on the fact that the effect was not significant and to exclude them from the statistical analysis
  5. To stop collecting data when statistical significance has been reached
  6. To divide or merge groups afterwards - or to exclude certain groups completely


  1. Barch, D. M., & Yarkoni, T. (2013). Introduction to the special issue on reliability and replication in cognitive and affective neuroscience research. Cognitive, affective & behavioral neuroscience, 13(4), 687-689. doi: 10.3758 / s13415-013-0201-7
  2. Begg, C. B., & Berlin, J. A. (1988). Publication Bias: A Problem in Interpreting Medical Data. Journal of the Royal Statistical Society. Series A (Statistics in Society), 151(3), 419. doi: 10.2307 / 2982993
  3. Dwan, K., Altman, D. G., Arnaiz, J. A., Bloom, J., Chan, A. W., Cronin, E.,. . . Williamson, P.R. (2008). Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PloS one, 3(8), e3081. doi: 10.1371 / journal.pone.0003081
  4. Jager, L. R., & Leek, J. T. (2014). An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics (Oxford, England), 15(1), 1-12. doi: 10.1093 / biostatistics / kxt007
  5. Nyberg, G., Graham, R.M., & Stokes, G.S. (1977). The effect of mental arithmetic in normotensive and hypertensive subjects, and its modification by beta-adrenergic receptor blockade. British journal of clinical pharmacology, 4(4), 469–474.
  6. Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638-641. doi: 10.1037 // 0033-2909.86.3.638
  7. Song, F., Eastwood, A., Gilbody, S., Duley, L., & Sutton, A. (2001). Publication and Related Biases. In A. Stevens, K. Abrams, J. Brazier, R. Fitzpatrick, & R. Lilford (Eds.), The Advanced Handbook of Methods in Evidence Based Healthcare (pp. 371-390). 1 Oliver’s Yard, 55 City Road, London EC1Y 1SP United Kingdom: SAGE Publications Ltd.