Two way ANOVA (Analysis of Variance) is applied to test how two independent variables affects a dependent variable.
For example, suppose you want to compare the marks scored by urban, rural and semi-urban students, you will have three different means to compare with each other.
Here, appropriate test of significance will be one way ANOVA. ("One way" - signifying only one independent variable, residence). Also, we can deduce that there are three levels (or categories or groups) withing this independent variable.
Now, suppose that, another independent variable is added i. e. Gender, in the study. Now, we have two independent variables, residence - having three levels and gender having two levels.
If our first independent varible - residence - has significant impact on scores, then
means scores of students from these three categories of residents will have significantly different scores. Similarly, if gender
has significant impact on scores, then means scores in two genders will be significantly different.
But, the impact of gender on three different categories of residences can be different, e. g. rural and urban students can experience different impact of gender!
So this is interaction effect.
In this situation, we want to test whether both independent variables have significant effect on the continuous variable, separately. Additionally, we may also want to test whether there is an interaction effect. Two way ANOVA is the test applied in this situation.
Assumptions
1. There are two categorical independent variables, with two or more levels / categories in each independent variable.
2. Dependent variable is continuous and normally distributed in all the possible combinations of two independent variable levels.
3. Samples are drawn using random sampling technique.
4. The observations are independent. There are no observations, which are in more than one group.
5. The variance in all the groups is same. (homogeneity
of variance / homoscedasticity). It can be tested by Levene's test of homogeneity of variance.
6. There are no outliers.
Steps:
1. Null hypothesis and Alternate Hypothesis
Null Hypothesis
A. The sample means for all levels / categories in first independent variable (A) are equal . (μi = μj )
B. The sample means for all levels / categories in second independent variable (B) are equal . (μi = μj )
C. There is no interaction between factors A and B.
Alternate Hypothesis
A. The sample means for at least two levels / categories in first independent variable (A) are not equal . (μi≠ μj )
B. The sample means for at least two levels / categories in second independent variable (B) are not equal . (μi ≠ μj )
C. There is significant interaction between factors A and B.
2. Calculate Sum of Squares for IDV A (SSA Between)
where ni is the sample size of level / category / group i in IDV A.
x̄ = Grand mean (Mean of all the observations)
x̄i = Mean of level (group) i for IDV A.
ni = Sample size in group i for IDV A.
3. Calculate Sum of Squares for IDV B (SSB Between)
where nj is the sample size of level / category / group j in IDV B
x̄ = Grand mean (Mean of all the observations)
x̄j = Mean of level (group) j for IDV B.
nj = Sample size in group j for IDV B.
4. Calculate Sum of Squares Within (SSE) (Error)
xijk = kth observation in group ij. (Observation from level i for IDV A and level j for IDV B)
x̄ij = Mean of group ij (Mean of observations from level i for IDV A and level j for IDV B).
5. Calculate Interaction Sum of Squares (SS AB)
x̄ij = Mean of group ij (Mean of observations from level i for IDV A and level j for IDV B).
x̄i = Mean of group i from IDV A.
x̄j = Mean of group j from IDV B.
x̄ = Grand Mean (Mean of all the observations)
nij = Sample size in group ij.
6. Calculate Total Sum of Squares (SS T)
xijk = kth observation in group ij. (Observation from level i for IDV A and level j for IDV B).
x̄ = Grand mean
6. Calculate degrees of freedom
df_A = Degrees of freedom for IDV A= a - 1
df_B = Degrees of freedom for IDV B= b - 1
df_AB = Degrees of freedom interaction AB = (a -1) * (b - 1)
df_Error = Degrees of freedom for Error = N - a * b
df_total = Degrees of freedom Total = N -1
N = Grand Sample Size, a = Number of levels / groups / categories in IDV 1, b = Number of levels / groups / categories in IDV 2, IDV = Independent variable
7. Calculate Mean Sum of Squares for IDV A, IDV B, interaction
8. Calculate F value for IDV A, IDV B and interaction
6. Calculate p values, using F table for IDV A, IDV 2 and interaction AB.
7. Interpret all three p values
If, p < = alpha, then reject the respective Null hypothesis
If, p > alpha, then accept respective null hypthesis
8. Post hoc tests
If, p < = alpha, alternate hypothesis is accepted (at least two means are significantly different)
Which pair (or pairs) of means are significantly different is now identified by post hoc tests.
There are multiple post hoc tests available, out of which Benferroni and Tukey HSD are most commonly used.
@ Sachin Mumbare