Below is shown some panda commands for retrieving maximum, minimum and average monthly precipitation from daily precipitation data.
The daily precipitation is assumed to be in a pandas DataFrame, with its index in Datetime index format.
1 - Daily to monthly precipitation
df_m=df1.resample('M').sum()
2 - Maximum monthly precipitation
p_max=df_m.groupby(df_m.index.month).max()
3 - Minimum monthly precipitation
p_min=df_m.groupby(df_m.index.month).min()
4 - Average monthly precipitation
p_avg=df_m.groupby(df_m.index.month).mean()
Python programming, with examples in hydraulic engineering and in hydrology.
Showing posts with label timeseries. Show all posts
Showing posts with label timeseries. Show all posts
Tuesday, August 21, 2018
Sunday, November 26, 2017
Pandas sum column values according to another columns value
One-liner code to sum Pandas second columns according to same values in the first column.
df2 = df1.groupby(df1.columns[0])[df1.columns[1]].sum().reset_index()
For example, applying to a table listing pipe diameters and lenghts, the command will return total lenghts according to each unique diameters.
This functionality is similar to excel's pivot table sum.
Labels:
column,
one liner,
pandas,
pivot table,
python,
series,
timeseries
Tuesday, July 25, 2017
Make numpy array of 'datetime' between two dates
A simple way to create an array of dates (time series), between two dates:
We can use the numpy arange - https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html , function which is most used to create arrays using start / stop / step arguments.
Syntax:
numpy.arange([start, ]stop, [step, ]dtype=None)
In case of datetime values, we need to specify the step value, and the correct type and unit of the timestep in the dtype argument
. dtype='datetime64[m]' will set the timestep unit to minutes;
. dtype='datetime64[h]' will set the timestep unit to hours;
. dtype='datetime64[D]' will set the timestep unit to days;
. dtype='datetime64[M]' will set the timestep unit to months;
. dtype='datetime64[Y]' will set the timestep unit to months;
For example:
This example will create an array of 96 values, between 01jun2017 and 02jun2017, with a time step of 15 minutes.
We can use the numpy arange - https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html , function which is most used to create arrays using start / stop / step arguments.
Syntax:
numpy.arange([start, ]stop, [step, ]dtype=None)
In case of datetime values, we need to specify the step value, and the correct type and unit of the timestep in the dtype argument
. dtype='datetime64[m]' will set the timestep unit to minutes;
. dtype='datetime64[h]' will set the timestep unit to hours;
. dtype='datetime64[D]' will set the timestep unit to days;
. dtype='datetime64[M]' will set the timestep unit to months;
. dtype='datetime64[Y]' will set the timestep unit to months;
For example:
import numpy as np
dates = np.arange('2017-06-01', '2017-06-02', 15, dtype='datetime64[m]') # 15 is the timestep value, dtype='datetime64[m] means that the step is datetime minutes
This example will create an array of 96 values, between 01jun2017 and 02jun2017, with a time step of 15 minutes.
Wednesday, March 15, 2017
Numpy - Accumulated and Incremental series
In Hydrology, it is always needed to deal with time-series of variables, as flow series or precipitation series, with the variable being incremental or accumulated.
Numpy has a great way to transform between accumulated and incremental series.
To accumulate a incremental series use the method
numpy.cumsum(incrementalSeries)
And to transform a accumulated array to a incremental one, use:
numpy.diff(accumulatedSeries)
Numpy has a great way to transform between accumulated and incremental series.
To accumulate a incremental series use the method
numpy.cumsum(incrementalSeries)
And to transform a accumulated array to a incremental one, use:
numpy.diff(accumulatedSeries)
Thursday, April 14, 2016
Time-Series in Python
Dealing with timeseries is a very common task in Hydrology.
One of the possibilities to process timeseries in python is to use a simple list.
For example, we can have a list of lists like this:
series1 = [ ['01/01/1900',0.0],['01/02/1900',0.1],['01/03/1900',0.3],['01/04/1900',0.4],['01/05/1900',2.2]...]
In this case, the ['01/01/1900',0.0] is composed of lists with a string representing the date, and a float number representing a value.
To properly make computations with dates, including sorting and grouping, it is necessary to interpret the string as a datetime format.
datetime objects accepts being sorted, making possible to sort the list based on the date, for example:
And we can make sums or averages based on specific months or years:
One of the possibilities to process timeseries in python is to use a simple list.
For example, we can have a list of lists like this:
series1 = [ ['01/01/1900',0.0],['01/02/1900',0.1],['01/03/1900',0.3],['01/04/1900',0.4],['01/05/1900',2.2]...]
In this case, the ['01/01/1900',0.0] is composed of lists with a string representing the date, and a float number representing a value.
To properly make computations with dates, including sorting and grouping, it is necessary to interpret the string as a datetime format.
import datetime for i in series1: i[0]=datetime.datetime.strptime(i[0], '%m/%d/%Y')
datetime objects accepts being sorted, making possible to sort the list based on the date, for example:
series1.sort(key=lambda x: x[0])
And we can make sums or averages based on specific months or years:
#eg. List of year 1900 lst1900 = [item for item in series1 if item[0].year==1900] #Sum of 1900's values: sum1900 = sum[item[1] for item in series1 if item[0].year==1900] # avg of 1900's values avg1900 = sum1900 / float(len(lst1900))
Subscribe to:
Posts (Atom)