# MLA Code \#

Let’s get to the code. We have two choices, we can either use the scikit learn library to import the linear regression model and use it directly or we can write our own regression model based on the equations above. Instead of choosing one among the two, let’s do both :)

There are many datasets available online for linear regression. I used the one from this [link](https://www.kaggle.com/andonians/random-linear-regression/data). Let’s visualise the training and testing data.

```javascript
import pandas as pd
import numpy as np

df_train = pd.read_csv('/Users/{redacted}/Documents/Datasets/Linear_Regression/train.csv')
df_test = pd.read_csv('/Users/{redacted}/Documents/Datasets/Linear_Regression/test.csv')

x_train = df_train['x']
y_train = df_train['y']
x_test = df_test['x']
y_test = df_test['y']

x_train = np.array(x_train)
y_train = np.array(y_train)
x_test = np.array(x_test)
y_test = np.array(y_test)

x_train = x_train.reshape(-1,1)
x_test = x_test.reshape(-1,1)
```

We use pandas library to read the train and test files. We retrieve the `independent(x)` and `dependent(y)` variables and since we have only one `feature(x)` we reshape them so that we could feed them into our linear regression model.

```javascript
from sklearn.linear_model import LinearRegression 
from sklearn.metrics import r2_score

clf = LinearRegression(normalize=True)
clf.fit(x_train,y_train)
y_pred = clf.predict(x_test)
print(r2_score(y_test,y_pred))
```

We use scikit learn to import the linear regression model. we fit the model on the training data and predict the values for the testing data. We use [R2 score](http://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit) to measure the accuracy of our model.

<figure><img src="https://217580413-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F0ExMA5m46XwKbyd8RHNh%2Fuploads%2FfO7C7wFPSHXZte1ra5Ev%2Fimage.png?alt=media&#x26;token=9bce4e29-5c27-4722-8f83-2e254c7ebcc8" alt=""><figcaption><p>R2 Score</p></figcaption></figure>

Now, let’s build our own linear regression model from the equations above. We will be using only numpy library for the computations and the R2 score for metrics.

```javascript
## Linear Regression 
import numpy as np

n = 700
alpha = 0.0001

a_0 = np.zeros((n,1))
a_1 = np.zeros((n,1))

epochs = 0
while(epochs < 1000):
    y = a_0 + a_1 * x_train
    error = y - y_train
    mean_sq_er = np.sum(error**2)
    mean_sq_er = mean_sq_er/n
    a_0 = a_0 - alpha * 2 * np.sum(error)/n 
    a_1 = a_1 - alpha * 2 * np.sum(error * x_train)/n
    epochs += 1
    if(epochs%10 == 0):
        print(mean_sq_er)
```

We initialize the value 0.0 for a\_0 and a\_1. For 1000 epochs we calculate the cost, and using the cost we calculate the gradients, and using the gradients we update the values of a\_0 and a\_1. After 1000 epochs, we would’ve obtained the best values for a\_0 and a\_1 and hence, we can formulate the best fit straight line.

```javascript
import matplotlib.pyplot as plt 

y_prediction = a_0 + a_1 * x_test
print('R2 Score:',r2_score(y_test,y_prediction))

y_plot = []
for i in range(100):
    y_plot.append(a_0 + a_1 * i)
plt.figure(figsize=(10,10))
plt.scatter(x_test,y_test,color='red',label='GT')
plt.plot(range(len(y_plot)),y_plot,color='black',label = 'pred')
plt.legend()
plt.show()
```

The test set contains 300 samples, therefore we have to reshape a\_0 and a\_1 from 700x1 to 300x1. Now, we can just use the equation to predict values in the test set and obtain the R2 score.

<figure><img src="https://217580413-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F0ExMA5m46XwKbyd8RHNh%2Fuploads%2FyPtSkxXwjrTtTjmzZvSK%2Fimage.png?alt=media&#x26;token=4a4a7ca4-2a99-435d-8410-8bab61ab7c8e" alt=""><figcaption><p>R2 Score</p></figcaption></figure>

We can observe the same R2 score as the previous method. We also plot the regression line along with the test data points to get a better visual understanding of how good our algorithm works.

<figure><img src="https://217580413-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F0ExMA5m46XwKbyd8RHNh%2Fuploads%2FTkyFhGfg2CGrUBByoxl3%2Fimage.png?alt=media&#x26;token=dd56c374-d93b-46be-a8d7-878bfce4c5ac" alt=""><figcaption><p>Regression Line - Test Data</p></figcaption></figure>
