Patrick Mugisha, "On my honor, as a student, I have neither given nor received unauthorized aid on this academic work."

What statistical techniques are available?

Differences?

Goals

Python packages

Install pingouin (a new python package for statistical inference testing)

Assumptions of statistical tests

Parametric data looks like this

Non-parametric data looks like this

Loading python packages

Loading data

Exploring data & Data visualization

Check assumptions

Normality test

p-value is high, it means it is likely that the data is normally distributed.

Hypothesis Testing using t-test & ANOVA

t-test

http://www.socialresearchmethods.net/kb/stat_t.php

2-sample test

Our null hypothesis is that ther is no difference between Light Blond and Dark Blond in terms of painthreshold.

ANOVA test

painthreshold appear to be different by haircolor

Important: The tukeyhsd of statsmodels doesn't return P value. https://stackoverflow.com/questions/16049552/what-statistics-module-for-python-supports-one-way-anova-with-post-hoc-tests-tu

To view p-value, use pingouin python package.

The following groups look difference in terms of pain threshold:

Another Dataset

Normality test

ANOVA

Since p value is less than 0.05, we can claim with high confidence that the three groups' weights are statistically different.

You can alos run Post Hoc Analysis

Treatment Group #1's weight is statistically different from Treatment Group #2

Appendix: using pingouin

# install pingouin https://pingouin-stats.org/ !pip install pingouin --user

T-Test & ANOVA

Test the normality of the data

T-test

We can't reject the null hypothesis

ANOVA

The detailed ANOVA summary table includes the following columns: SS : sums of squares DF : degrees of freedom MS : mean squares (= SS / DF) F : F-value (test statistic) p-unc : uncorrected p-values np2 : partial eta-square effect size * * In one-way ANOVA, partial eta-square is the same as eta-square and generalized eta-square. In the example above, there is a main effect of group (F(3, 15) = 6.79, p = .004)), so we can reject the null hypothesis that the groups have equal means.

Tukey post-hocs

Often, you will want to compute post-hoc tests to look at the pairwise differences between the groups. For one-way ANOVA with equal variances between groups, the optimal test is the pairwise_tukey post-hoc test.

As one can see from the post-hoc summary table below, the light blond group has a significantly higher pain threshold than the dark brunette (p=.0037) and light brunette (p=.0367) groups.

What if my groups have unequal variances? (Homogeneity of variances)

Traditional ANOVA can be quite unstable when the groups have unequal variances (see Liu 2015). Therefore, it is recommanded to use a Welch ANOVA instead, followed by Games-Howell post-hoc tests, which do not require the groups to have equal variances.

Perform Levene test for equal variances.

The Levene test tests the null hypothesis that all input samples are from populations with equal variances.

https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.levene.html

The above test is not significant meaning there is homogeneity of variances and you can proceed t-testing.

If the test were to be significant, you can use a Welch's t-test (from more than three groups) or Man Whitney U test (for two groups).

Non-parametric t-test

High p value --> We can't reject the null hypothesis

Non-parametric ANOVA test

kruskal test

welch test

Paired t- test: Same subject

normality test for the same subject t-test

https://pythonfordatascience.org/paired-samples-t-test-python/

conclusion: data is normally distributed. (high p-value 0.42 --> parametric data)

Conclusion of paired t-test: Fathers and sons have different heights.

if the data was non-parametric ... consider a non-parametric test (e.g., Wilcoxon)

Same conclusion: Fathers and sons have different heights.

References