Sunday, September 17, 2017

Multiple linear regression in Python

Sometimes we need to do a linear regression, and we know most used spreadsheet software does not do it well nor easily.

In the other hand, a multiple regression in Python, using the scikit-learn library - sklearn - it is rather simple.


import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression

# Importing the dataset
dataset = pd.read_csv('data.csv')
# separate last column of dataset as dependent variable - y
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

# build the regressor and print summary results
regressor = LinearRegression()
regressor.fit(X,y)
print('Coefficients:\t','\t'.join([str(c) for c in regressor.coef_]))
print('R2 =\t',regressor.score(X,y, sample_weight=None))

#plot the results if you like
y_pred = regressor.predict(X)
plt.scatter(y_pred,y)
plt.plot([min(y_pred),max(y_pred)],[min(y_pred),max(y_pred)])
plt.legend()
plt.show()