Here is a simple code to generate synthetic time series.
import numpy as np
import pandas as pd
med = 15.5
dp = 8.2
sDays = np.arange('2001-01', '2016-12', dtype='datetime64[D]')
nDays = len(sDays)
s1 = np.random.gumbel(loc=med,scale=dp,size=nDays)
s1[s1 < 0] = 0
dfSint = pd.DataFrame({'Q':s1},index=sDays)
dfSint.plot()
Python programming, with examples in hydraulic engineering and in hydrology.
Friday, June 30, 2017
Saturday, June 24, 2017
Pandas - How to read text files delimited with fixed widths
With Python Pandas library it is possible to easily read fixed width text files, for example:
In this case, the text file has its first 4 lines without data and the 5th line with the header. The header and the data are delimeted with fixed char widths, being the widths sizes as following:
In this case, the text file has its first 4 lines without data and the 5th line with the header. The header and the data are delimeted with fixed char widths, being the widths sizes as following:
- 12 spaces , 10 spaces ,6 spaces ,9 spaces ,7 spaces,7 spaces ,7 spaces ,4 spaces
The following code will read the file as a pandas DataFrame, and also parse the dates in the datetime format:
import pandas as pd
ds2 = pd.read_fwf('yourtextfile.txt', widths=[12,10,6,9,7,7,7,4], skiprows=4, parse_dates=True)
Wednesday, June 21, 2017
Exponential curve fit in numpy
With numpy function "polyfit" we can easily fit diferent kind of curves, not only polynomial curves.
According to the users manual, the numpy.polyfit does:
"
Least squares polynomial fit.
Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y). Returns a vector of coefficients p that minimises the squared error.
"
If we use X and y as arrays with our data, the code:
coef = np.polyfit(X, np.log(y), 1)
will return two coefficients, who will compose the equation:
exp(coef[1])*exp(coef[0]*X)
Giving you the exponential curve that better fits our data - X and y.
The polyfit function can receive weight values, which we can use in case of giving less importance to very small values, for example. We can use a weight function as following:
coef = np.polyfit(X, np.log(y), 1, w=np.sqrt(y))
Giving more weight to higher values.
To retrieve the R-squared index of our exponenctial curve, we can use de scikit r2_score, as following:
y_pred = np.exp(coefs[1])*np.exp(coefs[0]*X)
from sklearn.metrics import r2_score
r2s = r2_score(y, y_pred, sample_weight=None, multioutput=None)
According to the users manual, the numpy.polyfit does:
"
Least squares polynomial fit.
Fit a polynomial p(x) = p[0] * x**deg + ... + p[deg] of degree deg to points (x, y). Returns a vector of coefficients p that minimises the squared error.
"
If we use X and y as arrays with our data, the code:
coef = np.polyfit(X, np.log(y), 1)
will return two coefficients, who will compose the equation:
exp(coef[1])*exp(coef[0]*X)
Giving you the exponential curve that better fits our data - X and y.
The polyfit function can receive weight values, which we can use in case of giving less importance to very small values, for example. We can use a weight function as following:
coef = np.polyfit(X, np.log(y), 1, w=np.sqrt(y))
Giving more weight to higher values.
To retrieve the R-squared index of our exponenctial curve, we can use de scikit r2_score, as following:
y_pred = np.exp(coefs[1])*np.exp(coefs[0]*X)
from sklearn.metrics import r2_score
r2s = r2_score(y, y_pred, sample_weight=None, multioutput=None)
Wednesday, June 7, 2017
Python and Pandas - How to plot Multiple Curves with 5 Lines of Code
In this post I will show how to use pandas to do a minimalist but pretty line chart, with as many curves we want.
In this case I will use a I-D-F precipitation table, with lines corresponding to Return Periods (years) and columns corresponding to durations, in minutes. as shown below:
For the code to work properly, the table must have headers in the columns and lines, and the first cell have to be blank. Select the table you want in your SpreadSheet Editor, and copy it to clipboard.
Then, run the following code:
And Voila!:
In this case I will use a I-D-F precipitation table, with lines corresponding to Return Periods (years) and columns corresponding to durations, in minutes. as shown below:
Then, run the following code:
import pandas as pd table = pd.read_clipboard() tabTr = table.transpose().convert_objects(convert_numeric=True) eixox = tabTr.index.values.astype(float) tabTr.set_index(eixox).plot(grid=True)
And Voila!:
Friday, June 2, 2017
What is PANDAS? - Pandas in Hydrology
As stated in the Wikipedia:
"...
"...
pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license.[2] The name is derived from the term "panel data", an econometrics term for multidimensional structured data sets...."Pandas is a library that can easily deal with datasets, and together with numpy and scipy, can solve a great number of hydrology and hydraulics problems.
"
Pandas can easily read text/csv files, and can categorize and make operations on its data with few lines of code.
First, we have always to import pandas library with:
To read a csv timeseries of precipitation daily data, we can write:
if the index column is the first one, and it have dates in standard format.
To get average and standard deviation, just write:
And to make an easy and beautiful histogram of this data, just write:
Pandas documentation is available on the site:http://pandas.pydata.org/pandas-docs/stable/install.html
Happy analyzing!
"
Pandas can easily read text/csv files, and can categorize and make operations on its data with few lines of code.
First, we have always to import pandas library with:
import pandas as pd
To read a csv timeseries of precipitation daily data, we can write:
dataSeries = pd.read_csv('csvfile.csv', index_col=0, parse_dates=True)
if the index column is the first one, and it have dates in standard format.
To get average and standard deviation, just write:
m1,d1 = serY.mean(), serY.std()
And to make an easy and beautiful histogram of this data, just write:
dataSeries.hist()
Pandas documentation is available on the site:http://pandas.pydata.org/pandas-docs/stable/install.html
Happy analyzing!
Subscribe to:
Posts (Atom)