Skip to main content

Overview of PCA

 

Image from Pixabey
What is PCA?

Working with high-dimensional data is always a challenging task. In this modern technological era, we are more capable of capturing data in many aspects(variables) than ever. We can capture an instance with thousands of variables. Analyzing all these variables and finding each variable's effect(coefficients) on the target variable requires a huge computation power and electricity. We are not sure our built model is the best one even by doing so. Principal Component Analysis will address this very issue and provide us a solution, Principal Components. This blog discusses the Terminology and implementation of Principal Component Analysis.

Overview of PCA

Principal Component Analysis is an Unsupervised Machine Learning technique used to reduce the dimensionality of the data by preserving the statistical information. Principal Component Analysis tries to find new variables i.e. Principal Components which are linear combinations of existing variables in the dataset with a minimal loss of information. Which successively maximize the variance and are uncorrelated with each other. We have as many Principal Components as the number of variables in the data. But with successively maximizing the variance characteristic we assume enough Principal Components which can explain enough variability and further addition of those doesn’t add much variability.

Curse of Dimensionality

In any Machine Learning model adding more variables will result in a performance improvement, but too much of anything is bad. If we go on increasing variables which eventually becomes more than our instances, in such a case several algorithms strive hard to build efficient models. This is termed the Curse of Dimensionality.




Comments