MATHEMATICS BEHIND LINEAR REGRESSION-

Prakhar Saxena

7 min readJan 14, 2019

Supervised Machine learning can be broadly classified as -

Regression
Classification

Regression — model basically predicts continuous valued outputs.

Classification —model provides discrete values or predictions for every input.

As can be seen above-

In the first we are predicting housing prices,that is we can predict the price of a house based on a certain features.

In second figure we can see that we are determining that whether the tumor is malign(cancer) or benign(normal tumor).

Well here are a few examples-

Tip-To determine whether the problem is classification problem or not ask yourself what the answer to the question can be?

If the answers can be in the form of 0 or 1,yes or no or true or false , it is a classification problem.

If answer is in the form of real values like price of a house, it is regression problem.

LINEAR REGRESSION

Let us consider the housing price prediction model,suppose we want to predict the value of house whose size is 1250 square feet , as can be seen from the graph it will be predicted as 220k by our graph.

The above pic shows the notation used in the equations below.

HOW DOES THE ALGO ACTUALLY WORK-

Flow of working of the algorithm.

The figure is self explanatory.

COST FUNCTION-

Htheta(x) as defined earlier is the hypothesis function.Theta0 and Theta1 are the values that define the equation of the hypothesis equation as can be seen above that the graph is completely different obtained in all these cases.

Hypothesis function is primarily the function that predicts the values closest to the actual value that is y.

The basic idea is written in the pic above.

Cost function is basically the function J(Theta0,Theta1). We have to minimize this function to obtain the value of Theta0 and Theta1 so that the summation of squared error function is minimum.

CASE 1->THETA0=0.

The above pic shows a particular example.

Further explanation of hypothesis function and cost function-

consider a particular case of cost function where Theta0=0.

consider for a second case-

When Theta1=1;

Working -

Similarly plotting for different values of Theta1 and assuming Theta0=0,we can plot the graph of the cost function as obtained above.

As can be seen in the graph Theta1=1 and Theta0=0,will give minimum value for J.Which is represented by the light blue straight line in the left graph and cross in the right one.

SUMMARY-

CASE 2-THETA0->NOT EQUAL TO 0

When Theta0 is not equal to Zero the cost function for the above problem is almost similar.

This 3D figure can also be represented as contour plots-

The left figure is self explanatory.

The ellipses in the right figure represent the values of the cost function.The three crosses marked above have same values of J.

EXPLANATION OF THE PLOT GIVEN ABOVE-

Consider an example marked in red above.

The values of Theta0 and Theta1 are marked.The hypothesis is not a very correct one as can be seen from the distance from the center.

GRADIENT DESCENT-

What do we aim to achieve?

How do we achieve the above goal?

Suppose this is the cost function for the values of Theta0 and Theta1.

As we run the gradient descent algorithm we move downhill from the point(starting point) to reach the minimum value-

Each step is marked by a cross which we reach from the starting point that is the point just before it.

There can be several cases for this depending on the starting point we choose-

Are there many local minimum in the graph ?

Yes, and so there are several values for min of J.

MATHEMATICS OF GRADIENT DESCENT ALGORITHM-

The gradient descent algorithm function is written above.Here we do Assignment in each step and also a new term alpha is introduced here which is the learning rate.