Method | Exploratory Factor Analysis

Brief introduction
Exploratory factor analysis
A simple example
Reference

Brief introduction

Factor analysis is a data reduction technique:
- many items → few factors
- item = observable variable
- factor = latent (or hidden) variable
- Assump: hidden variables → correlation between observable variables
Factor analysis groups items based on inter-correlations:
- which items can be grouped together and form a single factor
Exploratory factor analysis & Confirmatory factor analysis
- EFA: identify whether the categorization you have done stands or not - EFA for identifying data patterns
- CFA: test hypotheses (i.e. data fits your model or not) - CFA for confirming a established theory

Exploratory factor analysis

The task of EFA is to group the items that are most correlated:
- 组内高度相关，组间最好不相关
Two pre-tests before doing EFA:
- KMO measure of sampling adequacy: whether the sample you have collected is good enough to do EFA or not; need to greater than 0.6.
- Bartlett’s test of sphericity: whether the items are uncorrelated in the population (H0: uncorrelated); need to smaller than 0.05, which indicate to reject the H0, showing that the items are correlated.
Than do EFA and get results…
How to determine the number of factors? Factor’s eigenvalue = the variance explained by this factor (i.e. sum of squared loadings); Factors having eigenvalues over 1 are selected (i.e. Kaiser’s rule).
- Can often get the absolute (Proportion Var) and cumulative (Cumulative Var) proportion of variance explained by each factor.
Commumality (of an item)
- The proportion of this item’s variance explained by the factors
- if < 0.6, better to remove this item
- Commumality = 1 - Uniquenesses
  - Uniquenesses (of an item): The proportion of this item’s variance that cannot be explained by the factors
Factor loading (of a factor)
- Simple correlation between the item and the factor
- We can use different “Rotation” to adjust the loadings - e.g. Varimax - to make the loading patterns easy to understand.
  - In this case, the total amount of variation explained by the factors remains the same. However, the distribution of this explained variation will change (ref).
- We want one factor to have a big loading on some items and another to have big loadings on other items - no cross-loading.
  - set up thresholds for loadings, like 0.5 (e.g. 0.03 → 0.00)

A simple example

library(psych)
library(REdaS)

We aim to identify the optimal number of factors (1 or not) that can be extracted from three items related to welfare attitudes:

First, compute KMO and do Bartletts test of spherecity:

# KMO值为0.68，大于0.6，说明适合进行进一步的因子分析
bart_spher(sub_df)   # produces Bartletts test of spherecity (you want this to be significant)
KMO(sub_df)          # kaiser-Meyer-olkin measure, you want to be above 0.6

	Bartlett's Test of Sphericity
Call: bart_spher(x = sub_df)
	X2 = 11525.863
	df = 3
p-value < 2.22e-16

Kaiser-Meyer-Olkin factor adequacy
Call: KMO(r = sub_df)
Overall MSA =  0.68
MSA for each item =
wel_att_1 wel_att_2 wel_att_3
	0.81      0.63      0.64

Second, do factor analysis:

# do not use specific rotation
# because our objective is not to interpret the loadings
fa(sub_df, nfactors = 3, rotate = "none")

Factor Analysis using method =  minres
Call: fa(r = sub_df, nfactors = 3, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
		   MR1   MR2 MR3   h2   u2 com
wel_att_1 0.61  0.09   0 0.38 0.62   1
wel_att_2 0.86  0.00   0 0.74 0.26   1
wel_att_3 0.83 -0.07   0 0.69 0.31   1

					   MR1  MR2 MR3
SS loadings           1.79 0.01 0.0                    (← here)
Proportion Var        0.60 0.00 0.0
Cumulative Var        0.60 0.60 0.6
Proportion Explained  0.99 0.01 0.0                    (← here)
Cumulative Proportion 0.99 1.00 1.0

Mean item complexity =  1
Test of the hypothesis that 3 factors are sufficient.

df null model =  3  with the objective function =  1.07 with Chi Square =  11525.86
df of  the model are -3  and the objective function was  0

The root mean square of the residuals (RMSR) is  0
The df corrected root mean square of the residuals is  NA

The harmonic n.obs is  10808 with the empirical chi square  0  with prob <  NA
The total n.obs was  10808  with Likelihood Chi Square =  0  with prob <  NA

Tucker Lewis Index of factoring reliability =  1
Fit based upon off diagonal values = 1
Measures of factor score adequacy
												   MR1   MR2 MR3
Correlation of (regression) scores with factors   0.92  0.17   0
Multiple R square of scores with factors          0.85  0.03   0
Minimum correlation of possible factor scores     0.70 -0.95  -1

Third, interpret the results:

SS loadings: Since only one factor has an eigenvalue (i.e., explained variance) greater than 1, which accounts for 99% of the total explained variance, it is optimal to consider just one factor in this case.

At last, we can use derived loadings to do data transformation:

r = fa(sub_df, nfactors = 1, rotate = "none", scores=TRUE)
r$scores

A matrix: 10808 × 1 of type dbl

MR1
0.6934161
0.6934161
-1.2634223
0.6934161
0.6934161
...

Method | Exploratory Factor Analysis

Table of contents

Brief introduction

Exploratory factor analysis

A simple example

Reference