INTRO TO DIMENSIONALITY REDUCTION AND LDA:

Prakhar Saxena
2 min readFeb 3, 2019

--

What is LDA and what is it used for?
LDA is a way to reduce ‘dimensionality’ while at the same time preserving as much of the class discrimination information as possible.

How does it work?
Basically, LDA helps you find the ‘boundaries’ around clusters of classes. It projects your data points on a line so that your clusters ‘are as separated as possible’, with each cluster having a relative (close) distance to a centroid.

The distance of the points from the decision boundary help in dimensionality reduction

What was that stuff about dimensionality?
Let’s say you have a group of data points in 2 dimensions, and you want to group them into 2 groups. LDA reduces the dimensionality of your set like so:
K(Groups) = 2. 2–1 = 1.

Why? Because “The K centroids lie in an at most K-1 dimensional affine subspace”. What is the affine subspace? Its a geometric concept or *structure* that says “I am going to generalize the affine properties of Euclidean space”. What are those affine properties of the Euclidean space? Basically, its the fact that we can represent a point with 3 coordinates in a 3 dimensional space (with a nod toward the fact that there may be more than 3 dimesions that we are ultimately dealing with).

So we should be able to represent a point with 2 coordinates in 2 dimensional space, and represent a point with 1 coordinate in a 1 dimensional space. LDA reduced our
dimensionality of our 2 dimension problem down to one dimension. So now we can get down to the serious business of listening to the data. We now have 2 groups, and 2 points in any dimension can be joined by a line. How many dimensions does a line have? 1! Now we are cooking with Crisco!

So we get a bunch of these data points, represented by their 2d representation (x,y). We are going to use LDA to group these points into either group 1 or group 2.

What its actually doing:
1. Calculating mean vectors of the data in all dimensions.
2. Calculates scatter from the whole group (to determine separability)
3. Calculates scatter from representatives of the same class (to determine ‘sameness’), using the whole group scatter as a normalizer.
4. Magical grouping around K centroids.

--

--