Many people often confuse factor analysis (FA) and principal component analysis (PCA). While both are dimensionality reduction techniques, they serve different purposes.
Principal Component Analysis (PCA)
Principal Component Analysis is a technique that transforms the original variables into a new set of uncorrelated variables called principal components. These principal components are linear combinations of the original variables, and they are ordered in such a way that the first principal component explains the maximum possible variance in the data, the second principal component explains the next highest variance, and so on. The main goals of PCA are:
- Variance Explanation: PCA aims to explain as much of the total variance in the dataset as possible. This is achieved by finding principal components that capture the maximum variance.
- Dimensionality Reduction: By selecting a subset of the principal components, PCA reduces the dimensionality of the data while retaining most of the variability present in the original variables.
- Orthogonality: Principal components are orthogonal to each other, ensuring that they capture distinct aspects of the data’s variance.
Factor Analysis (FA)
Factor Analysis is a statistical method used to identify latent variables, or factors, that explain the observed correlations among the original variables. These latent factors are not directly observed but are inferred from the patterns of covariance among the observed variables. The primary objectives of FA are:
- Covariance Explanation: FA focuses on explaining the covariance among the original variables. It seeks to uncover underlying factors that account for the shared variance.
- Latent Variables: The goal is to identify a smaller number of unobserved factors that can describe the relationships among the observed variables. These factors are assumed to be the source of the observed correlations.
- Model-Based Approach: FA is based on a specific model where the observed variables are expressed as linear combinations of the factors plus unique error terms.
Key Differences
- Purpose: PCA aims to reduce dimensionality by explaining the total variance in the data, while FA seeks to uncover latent factors that explain the covariance among variables.
- Components vs. Factors: PCA produces principal components that are linear combinations of the original variables and aim to capture as much variance as possible. FA identifies latent factors that are inferred from the observed variables and aims to explain the covariance structure.
- Variance vs. Covariance: PCA focuses on maximizing variance explained by the components, whereas FA focuses on modeling the covariance structure of the data.
- Orthogonality: Principal components in PCA are orthogonal, ensuring no correlation between them. Factors in FA may not be orthogonal, as the goal is to model the underlying relationships among variables.
In summary, while both PCA and FA are used for reducing the dimensionality of data, they serve different purposes and are based on different conceptual frameworks.