Skip to main content

Posts

Overview of PCA

  Image from  Pixabey What is PCA? Working with high-dimensional data is always a challenging task. In this modern technological era, we are more capable of capturing data in many aspects(variables) than ever. We can capture an instance with thousands of variables. Analyzing all these variables and finding each variable's effect(coefficients) on the target variable requires a huge computation power and electricity. We are not sure our built model is the best one even by doing so. Principal Component Analysis will address this very issue and provide us a solution, Principal Components. This blog discusses the Terminology and implementation of Principal Component Analysis. Overview of PCA :  Principal Component Analysis is an Unsupervised Machine Learning technique used to reduce the dimensionality of the data by preserving the statistical information. Principal Component Analysis tries to find new variables i.e. Principal Components which are linear combinations of existin...
Recent posts

Hypothesis Testing - Standard Error, Level of Significance, p-value and Critical Values.

  Image by Gerd Altmann from Pixabay " In Statistics we study the Chances or Probabilities of Occurrence or Happening of an Event or Phenomenon. In the world of Statistical Analysis presenting the results with 100% strong evident is impossible "   Introduction In general gaining the information regarding the characteristics of whole population is practically not possible. It incurs a lot of money, time, labor and other constraints. So we take a sample from the population, study about it's characteristics and try to draw conclusions or to estimate the population  characteristics from it. For example, A doctor records the Blood Pressure(BP) readings of 100 patients suffering with Hypertension and computes the average(Sample average) of those readings. Doctor uses this sample average information to draw conclusion about the average Blood Pressure (Population average) of whole patients who are suffering from Hypertension. Parameter and Statistic The Statistical characte...

Normal Distribution - Properties, Z Scores, Area Under the Curve and Central Limit Theorem.

Photo by Alex Knight on Unsplash Introduction The Normal Distribution was first discovered by Abraham de Moivre in 1733. Due to historical error it was credited to Carl Friedrich Guass , who made first reference to the Normal Distribution in 1809 as the distribution of errors in Astronomy. Since then the distribution was widely used in various fields and is in continuous development.  The Normal Distribution is the most important continuous probability distribution and plays a very important role in various statistical analysis. It fits the most of naturally occurring variables like Population Age, Height, Weight, Blood Pressure, IQ Scores etc. All these follow Normal Distribution when we have significantly large samples.  Definition A Continuous Variable "X" is said to follow Normal Distribution with parame...