MLA Code #
Let’s get to the code. We have two choices, we can either use the scikit learn library to import the linear regression model and use it directly or we can write our own regression model based on the equations above. Instead of choosing one among the two, let’s do both :)
There are many datasets available online for linear regression. I used the one from this link. Let’s visualise the training and testing data.
We use pandas library to read the train and test files. We retrieve the independent(x)
and dependent(y)
variables and since we have only one feature(x)
we reshape them so that we could feed them into our linear regression model.
We use scikit learn to import the linear regression model. we fit the model on the training data and predict the values for the testing data. We use R2 score to measure the accuracy of our model.
Now, let’s build our own linear regression model from the equations above. We will be using only numpy library for the computations and the R2 score for metrics.
We initialize the value 0.0 for a_0 and a_1. For 1000 epochs we calculate the cost, and using the cost we calculate the gradients, and using the gradients we update the values of a_0 and a_1. After 1000 epochs, we would’ve obtained the best values for a_0 and a_1 and hence, we can formulate the best fit straight line.
The test set contains 300 samples, therefore we have to reshape a_0 and a_1 from 700x1 to 300x1. Now, we can just use the equation to predict values in the test set and obtain the R2 score.
We can observe the same R2 score as the previous method. We also plot the regression line along with the test data points to get a better visual understanding of how good our algorithm works.
Last updated