Python programming, with examples in hydraulic engineering and in hydrology.
Showing posts with label regression. Show all posts
Showing posts with label regression. Show all posts
Sunday, April 21, 2019
Logarithmic and Exponential Curve Fit in Python - Numpy
With numpy function "polyfit":
X,y : data to be fitted
import numpy as np
1. Exponential fit
cf = np.polyfit(X, np.log(y), 1)
will return two coefficients, who will compose the equation:
exp(cf[1])*exp(cf[0]*X)
2. Logarithm fit:
cf = np.polyfit(np.log(X), y, 1)
will return two coefficients, who will compose the equation:
cf[0]*log(X)+cf[1]
Labels:
curve fit,
exponential,
fit,
logarithmic,
numpy,
polyfit,
regression
Tuesday, January 23, 2018
Polynomial Curve Fitting
The code below shows how easily you can do a Polynomial Curve Fitting with Python and Numpy.
import numpy as np # sample x and y data - example x = [7.76,10.11,11.89,14.81,15.49] y = [1.851,1.971,1.953,1.842,1.805] # the polyfit functions does the nth degree polynomial best fit on the data, # returning the polynomial coefficients n = 4 # 4th degree polynomial, you can change for whatever degree you want coefs = np.polyfit(x,y,n) # The poly1d function applies the polynomial function to our calculated coefficients polyf = np.poly1d(coefs) #if we want to apply our polynomial function to a range of x values xf = np.linspace(0,20) yf = polyf(xf)
Labels:
curve fit,
numpy,
polyfit,
polynomial,
regression
Sunday, September 17, 2017
Multiple linear regression in Python
Sometimes we need to do a linear regression, and we know most used spreadsheet software does not do it well nor easily.
In the other hand, a multiple regression in Python, using the scikit-learn library - sklearn - it is rather simple.
In the other hand, a multiple regression in Python, using the scikit-learn library - sklearn - it is rather simple.
import matplotlib.pyplot as plt import pandas as pd from sklearn.linear_model import LinearRegression # Importing the dataset dataset = pd.read_csv('data.csv') # separate last column of dataset as dependent variable - y X = dataset.iloc[:, :-1].values y = dataset.iloc[:, -1].values # build the regressor and print summary results regressor = LinearRegression() regressor.fit(X,y) print('Coefficients:\t','\t'.join([str(c) for c in regressor.coef_])) print('R2 =\t',regressor.score(X,y, sample_weight=None)) #plot the results if you like y_pred = regressor.predict(X) plt.scatter(y_pred,y) plt.plot([min(y_pred),max(y_pred)],[min(y_pred),max(y_pred)]) plt.legend() plt.show()
Wednesday, June 21, 2017
Exponential curve fit in numpy
With numpy function "polyfit" we can easily fit diferent kind of curves, not only polynomial curves.
According to the users manual, the numpy.polyfit does:
"
Least squares polynomial fit.
Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y). Returns a vector of coefficients p that minimises the squared error.
"
If we use X and y as arrays with our data, the code:
coef = np.polyfit(X, np.log(y), 1)
will return two coefficients, who will compose the equation:
exp(coef[1])*exp(coef[0]*X)
Giving you the exponential curve that better fits our data - X and y.
The polyfit function can receive weight values, which we can use in case of giving less importance to very small values, for example. We can use a weight function as following:
coef = np.polyfit(X, np.log(y), 1, w=np.sqrt(y))
Giving more weight to higher values.
To retrieve the R-squared index of our exponenctial curve, we can use de scikit r2_score, as following:
y_pred = np.exp(coefs[1])*np.exp(coefs[0]*X)
from sklearn.metrics import r2_score
r2s = r2_score(y, y_pred, sample_weight=None, multioutput=None)
According to the users manual, the numpy.polyfit does:
"
Least squares polynomial fit.
Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y). Returns a vector of coefficients p that minimises the squared error.
"
If we use X and y as arrays with our data, the code:
coef = np.polyfit(X, np.log(y), 1)
will return two coefficients, who will compose the equation:
exp(coef[1])*exp(coef[0]*X)
Giving you the exponential curve that better fits our data - X and y.
The polyfit function can receive weight values, which we can use in case of giving less importance to very small values, for example. We can use a weight function as following:
coef = np.polyfit(X, np.log(y), 1, w=np.sqrt(y))
Giving more weight to higher values.
To retrieve the R-squared index of our exponenctial curve, we can use de scikit r2_score, as following:
y_pred = np.exp(coefs[1])*np.exp(coefs[0]*X)
from sklearn.metrics import r2_score
r2s = r2_score(y, y_pred, sample_weight=None, multioutput=None)
Wednesday, December 2, 2015
SciPy minimize example - Fitting IDF Curves
SciPy (pronounced “Sigh Pie”) is an open source Python library used by scientists, analysts, and engineers doing scientific computing and technical computing.
SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.
In this post I will show how to use a powerful function of SciPy - minimize.
Minimize has some methods of minimizing functions. Its official documentation is shown here.
The example used is to find coefficients of a standard rainfall IDF (intensity-duration-frequency) equation. The format of the equation is as following:
We have to input the intensity data to be fitted, and the equation as a function.
import numpy as np from scipy.optimize import minimize # This is the IDF function, for return periods of 10 and 25 years def func2(par, res): f10 = (par[0] * 10 **par[1])/((res[0,:]+par[2])**par[3]) f25 = (par[0] * 25 **par[1])/((res[0,:]+par[2])**par[3]) erroTotQ = np.sum((f10-res[1,:])**2+(f25-res[2,:])**2) return erroTotQ #durations array - in minutes d=[5,10,15,20,25,30,35,40,60,120,180,360,540,720,900,1260,1440] # Rainfall intensities, mm/h, same lenght as minutes r10=[236.1,174.0,145.6,128.3,116.3,107.3,100.3,94.5,79.1,48.4,34.2,18.9,13.4,10.5,8.7,6.5,5.8] r25=[294.5,217.1,181.6,160.0,145.1,133.9,125.1,118.0,98.7,60.4,42.7,23.6,16.7,13.1,10.8,8.1,7.2] # the following line only gather the duration and the intensities as a numpy array, # in order to pass all of the idf constants as one single parameter, named "valid" valid = np.vstack((d, r10, r25)) #initial guess param1 = [5000, 0.1, 10, 0.9] res2 = minimize(func2, param1, args=(valid,), method='Nelder-Mead') np.set_printoptions(formatter={'float': '{: 0.3f}'.format}) print res2
Subscribe to:
Posts (Atom)