Aim: Sample Size required to detect significant change in effect size when additional predictors are added: Multiple Linear Regression


Example:

A researcher conducted a study to predict the marks obtained in final examination by students. His predictor (independent) variables are physical attendance in percentage, score in pre-final examination, gender and score in a separate IQ test. The study has revealed R2 = 0.24. Then he wanted to add two more predictors; mother's education and father's education. A pilot study has revealed that, after adding these two new predictors, R2 was 0.40 (an addition of 0.16). What is the size of the sample required to achieve 80% power to detect significant increase in R2, after adding 2 independent variables, at confidence level of 95%?

Solution:

Here

Confidence level = 95%, power = 80%, Change in R2=0.16, k1=4, k2=2

After putting these values, we get required sample size = 54.


How sample size is calculated? (Exclusively for advanced users)

1. Calculation of sample size for Multiple Linear Regression, when additional predictors are added, requires a complex iterative approach. Let K1 be the previous predictors, K2 be the additional apredictors, New number of predictors = K1 + K2 = KN

2. Initially iteration process is started at temporary predecided lowest sample size = KN + 2.

3. Starting with the lowest sample size (N) of KN + 2 , degrees of freedom for numerator (K2) and denominator (N-KN-1) are calculated. Then using given confidence level and these degrees of freedom, F critical value is calculated. For example, if the number of previous predictors (K1) are 4, number of additional predictors = 2, then we will start with sample size of 4 + 2 +2 = 8. In this case, degrees of freedom for numerator (d1) will be k2 = 2. Degrees of freedom for denominator (d2) will be N – KN -1 = 1. Now F critical value (x) for 95% confidence level (alpha=0.05), 2 and 1 degrees of freedom is calculated using inverse F distribution function. (It is the F table value for given alpha and degrees of freedom). In this case,critical F value will be 199.5

4. Non centrality parameter λ (lambda) is calculated using following equation.

λ = f2 * N

. If effect size input is R2 or η2 or f, then f 2 is calculated as follows

f2 = R2/ (1 - R2)

f2 = η2/ (1 - η2)

f2 = f * f

5. Using the values of non-centrality parameter (λ), d1, d2 and x; power of the test is calculated using non central cumulative distribution function formula.

Where, I (q | a, c) is the regularized incomplete beta function.

6. If calculated power is less than desired, process is repeated by increasing the sample size by 1, till the desired power is achieved.


@ Sachin Mumbare