Python in Hydrology and Hydraulics: regression

Showing posts with label regression. Show all posts

Sunday, April 21, 2019

Logarithmic and Exponential Curve Fit in Python - Numpy

With numpy function "polyfit":

X,y : data to be fitted

import numpy as np

1. Exponential fit

cf = np.polyfit(X, np.log(y), 1)

will return two coefficients, who will compose the equation:

exp(cf[1])*exp(cf[0]*X)

2. Logarithm fit:

cf = np.polyfit(np.log(X), y, 1)

will return two coefficients, who will compose the equation:

cf[0]*log(X)+cf[1]

Tuesday, January 23, 2018

Polynomial Curve Fitting

The code below shows how easily you can do a Polynomial Curve Fitting with Python and Numpy.

import numpy as np

# sample x and y data - example
x = [7.76,10.11,11.89,14.81,15.49]
y = [1.851,1.971,1.953,1.842,1.805]

# the polyfit functions does the nth degree polynomial best fit on the data, 
# returning the polynomial coefficients

n = 4   # 4th degree polynomial, you can change for whatever degree you want
coefs = np.polyfit(x,y,n)

# The poly1d function applies the polynomial function to our calculated coefficients
polyf = np.poly1d(coefs)

#if we want to apply our polynomial function to a range of x values
xf = np.linspace(0,20)
yf = polyf(xf)

Sunday, September 17, 2017

Multiple linear regression in Python

Sometimes we need to do a linear regression, and we know most used spreadsheet software does not do it well nor easily.

In the other hand, a multiple regression in Python, using the scikit-learn library - sklearn - it is rather simple.

import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression

# Importing the dataset
dataset = pd.read_csv('data.csv')
# separate last column of dataset as dependent variable - y
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values

# build the regressor and print summary results
regressor = LinearRegression()
regressor.fit(X,y)
print('Coefficients:\t','\t'.join([str(c) for c in regressor.coef_]))
print('R2 =\t',regressor.score(X,y, sample_weight=None))

#plot the results if you like
y_pred = regressor.predict(X)
plt.scatter(y_pred,y)
plt.plot([min(y_pred),max(y_pred)],[min(y_pred),max(y_pred)])
plt.legend()
plt.show()

Wednesday, June 21, 2017

Exponential curve fit in numpy

With numpy function "polyfit" we can easily fit diferent kind of curves, not only polynomial curves.

According to the users manual, the numpy.polyfit does:

"
Least squares polynomial fit.

Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y). Returns a vector of coefficients p that minimises the squared error.
"

If we use X and y as arrays with our data, the code:

coef = np.polyfit(X, np.log(y), 1)

will return two coefficients, who will compose the equation:

exp(coef[1])*exp(coef[0]*X)

Giving you the exponential curve that better fits our data - X and y.
The polyfit function can receive weight values, which we can use in case of giving less importance to very small values, for example. We can use a weight function as following:

coef = np.polyfit(X, np.log(y), 1, w=np.sqrt(y))

Giving more weight to higher values.

To retrieve the R-squared index of our exponenctial curve, we can use de scikit r2_score, as following:
y_pred = np.exp(coefs[1])*np.exp(coefs[0]*X)

from sklearn.metrics import r2_score

r2s = r2_score(y, y_pred, sample_weight=None, multioutput=None)

Wednesday, December 2, 2015

SciPy minimize example - Fitting IDF Curves

SciPy (pronounced “Sigh Pie”) is an open source Python library used by scientists, analysts, and engineers doing scientific computing and technical computing.

SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.

In this post I will show how to use a powerful function of SciPy - minimize.

Minimize has some methods of minimizing functions. Its official documentation is shown here.

The example used is to find coefficients of a standard rainfall IDF (intensity-duration-frequency) equation. The format of the equation is as following:

We have to input the intensity data to be fitted, and the equation as a function.

import numpy as np
from scipy.optimize import minimize

# This is the IDF function, for return periods of 10 and 25 years
def func2(par, res):
    f10 =  (par[0] * 10  **par[1])/((res[0,:]+par[2])**par[3])
    f25 =  (par[0] * 25  **par[1])/((res[0,:]+par[2])**par[3])
    erroTotQ = np.sum((f10-res[1,:])**2+(f25-res[2,:])**2)
    return erroTotQ

#durations array - in minutes
d=[5,10,15,20,25,30,35,40,60,120,180,360,540,720,900,1260,1440]

# Rainfall intensities, mm/h, same lenght as minutes
r10=[236.1,174.0,145.6,128.3,116.3,107.3,100.3,94.5,79.1,48.4,34.2,18.9,13.4,10.5,8.7,6.5,5.8]
r25=[294.5,217.1,181.6,160.0,145.1,133.9,125.1,118.0,98.7,60.4,42.7,23.6,16.7,13.1,10.8,8.1,7.2]

# the following line only gather the duration and the intensities as a numpy array,
# in order to pass all of the idf constants as one single parameter, named "valid"
valid = np.vstack((d, r10, r25))

#initial guess
param1 = [5000, 0.1, 10, 0.9]

res2 = minimize(func2, param1, args=(valid,), method='Nelder-Mead')
np.set_printoptions(formatter={'float': '{: 0.3f}'.format})
print res2