PCA (Principal Components Analysis)

PCA (Principal Components Analysis)

2023. 5. 19. 02:41ㆍData science/Machine Learning

Principal Components Analysis (PCA) is a popular technique for analysing high-dimensional data. PCA is a mathematical procedure that transforms a set of correlated variables into a new set of uncorrelated variables, called "Principal Components" while retaining most of the variability of the original data.

To perform PCA on a dataset of N, p-dimensional data point x_j ∈ Rᴾ, we need to follow these steps.

1. Standardize the data: To ensure all variables are in a similar range, subtract the mean and divide by the standard deviation for each variable.

2. Compute the covariance matrix: The covariance matrix summarizes the relationships between the variables in the data. The covariance matrix can be computed by using the formula, "cov(x)= 1/(n-1)* XᵀX" (X is the standardized data matrix).

3. Compute the eigenvectors and eigenvalues (Eigensystem): These represent the directions and magnitude of the maximum data variability. ( 𝛌 and 𝚺 )

4. Sort the eigenvectors in descending order of their corresponding eigenvalues: The eigenvectors with the highest eigenvalues represent the directions of maximum variability in the data. These directions are called "Principal components".

5. Choose the number of principal components: Use a scree plot to determine the number of components to keep. (scree plot: Plot of eigenvalues against the number of components)

6. Compute the principal components: by multiplying the standardized data matrix by the eigenvectors of the covariance matrix

7. Calculate the percentage of the total sample variance explained by the i-th principal components ( 𝛌_i / ∑ 𝛌_j ) * 100 %, j-th element of the i-th eigenvector v_i is called "loading" of the j-th variable onto the i-th principal component.

(*Loading: Relationship between the original variables and principal components. ; PC_i = v_i1 * x_1 + ... + v_ip *x_p )

The result of PCA will produce principal components, which are linear combinations of original variables. These PCs will be uncorrelated.

PCA can be used to reduce the dimensionality of data, making it easier to visualize and analyze. It can also be used to identify patterns and relationships in the data (correlation), and data compression.

저작자표시 비영리 변경금지

'Data science > Machine Learning' 카테고리의 다른 글

Markov chain Monte Carlo (0)	2023.05.20
Unsupervised or Supervised Classification (0)	2023.05.19
Machine Learning and Stats 2 - Univariate Exploratory Data Analysis (0)	2023.01.17
[IBM] What is Data Science? - Deep Learning & Machine Learning (0)	2021.05.15

Piccole Gioie🪴

Piccole Gioie🪴

태그

최근글

댓글

공지사항

아카이브

'Data science > Machine Learning' 카테고리의 다른 글

관련글

티스토리툴바