Image from Pixabey |
Working with high-dimensional data is always a challenging
task. In this modern technological era, we are more capable of capturing data
in many aspects(variables) than ever. We can capture an instance with thousands
of variables. Analyzing all these variables and finding each variable's
effect(coefficients) on the target variable requires a huge computation power
and electricity. We are not sure our built model is the best one even by doing
so. Principal Component Analysis will address this very issue and provide us a
solution, Principal Components. This blog discusses the Terminology and implementation of Principal Component Analysis.
Overview of PCA:
Principal Component Analysis is an
Unsupervised Machine Learning technique used to reduce the dimensionality of
the data by preserving the statistical information. Principal Component
Analysis tries to find new variables i.e. Principal Components which are linear
combinations of existing variables in the dataset with a minimal loss of
information. Which successively maximize the variance and are uncorrelated with
each other. We have as many Principal Components as the number of variables in
the data. But with successively maximizing the variance characteristic we
assume enough Principal Components which can explain enough variability and
further addition of those doesn’t add much variability.
Curse of Dimensionality:
In any Machine Learning
model adding more variables will result in a performance improvement, but too
much of anything is bad. If we go on increasing variables which eventually
becomes more than our instances, in such a case several algorithms strive hard
to build efficient models. This is termed the Curse of Dimensionality.
Comments
Post a Comment