# Statistical significance

(Difference between revisions)

## Current revision

In statistics, a result is significant if it is unlikely to have occurred by chance.

In traditional frequentist statistical hypothesis testing, the significance level of a test is the maximum probability, assuming the null hypothesis, that the observed statistic would be observed and still be considered consistent with chance variation (consistent with the the null hypothesis). Hence, if the null hypothesis is true, the significance level is the probability that it will be rejected in error (a decision known as a Type I error). The significance of a result is also called its p-value; the smaller the p-value, the more significant the result is said to be.

Significance is represented by the Greek symbol, α (alpha). Popular levels of significance are 10%, 5%, and 1%. If a test of significance gives a p-value lower than the α-level, the null hypothesis is rejected. Such results are informally referred to as 'statistically significant'.

Different α-levels have different advantages and disadvantages. A very small α-level (say 1%) is less likely to be more extreme than the critical value and so is more significant than high α-level values (say 5%). However, smaller α-levels run greater risks of failing to reject a false null hypothesis (a Type II error), and so have less statistical power. The selection of an α-level inevitably involves a compromimse between significance and power, and consequently between the Type I error and the Type II error.

## Pitfalls

A common misconception is that a statistically significant result is always meaningful, or demonstrates a large effect in the population. This pun is untrue, but commonly encountered in scientific writing. Given a sufficiently large sample, extremely small and non-notable differences can be found to be statistically significant, and statistical significance says nothing about the practical significance of a difference.

One of the more common problems in significance testing is the tendency for multiple comparisons to yield spurious significant differences even where the null hypothesis is true. For instance, in a study of twenty comparisons, using an α-level of 5%, one comparison will likely yield a significant result despite the null hypothesis being true. In these cases p-values are adjusted in order to control either the false discovery rate or the familywise error rate.

An additional pitfall is that frequentist analyses of p-values overstates statistical signficance<ref name=Goodman1999a>Template:Cite journal</ref><ref name=Goodman1999b>Template:Cite journal</ref>. See Bayes Factor for details.

## Signal–noise ratio conceptualisation of significance

Statistical significance can be considered to be the confidence one has in a given result. In a comparison study, it is dependent on the relative difference between the groups compared, the amount of measurement and the noise associated with the measurement. In other words, the confidence one has, in a given result being non-random (i.e. it is not a consequence of chance), depends on the signal-to-noise ratio (SNR) and the sample size.

Expressed mathematically, the confidence that a result is not by random chance is given by the following formula by Sackett<ref>Sackett DL. Why randomized controlled trials fail but needn't: 2. Failure to employ physiological statistics, or the only formula a clinician-trialist is ever likely to need (or understand!). CMAJ. 2001 Oct 30;165(9):1226-37. PMID 11706914. Free Full Text.</ref>:

$\mathrm{confidence} = \frac{\mathrm{signal}}{\mathrm{noise}} \times \sqrt{\mathrm{sample\ size}}.$

For clarity, the above formula is presented in tabular form below.

Dependence of confidence with noise, signal and sample size (tabular form)

Parameter Parameter increases Parameter decreases
Noise Confidence decreases Confidence increases
Signal Confidence increases Confidence decreases
Sample size Confidence increases Confidence decreases

In words, the dependence of confidence is high if the noise is low and/or the sample size is large and/or the effect size (signal) is large. The confidence of a result (and its associated confidence interval) is not dependent on effect size alone. If the sample size is large and the noise is low a small effect size can be measured with great confidence. Whether a small effect size is considered important is dependent on the context of the events compared.

In medicine, small effect sizes (reflected by small increases of risk) are often considered clinically relevant and are frequently used to guide treatment decisions (if there is great confidence in them). Whether a given treatment is considered a worthy endeavour is dependent on the risks, benefits and costs.

## References

<references/>

Problem with the site?

.