Showing posts with label timeseries. Show all posts
Showing posts with label timeseries. Show all posts

Tuesday, August 21, 2018

Maximum, minimum and average monthly precipitation

Below is shown some panda commands for retrieving maximum, minimum and average monthly precipitation from daily precipitation data.

The daily precipitation is assumed to be in a pandas DataFrame, with its index in Datetime index format.

1 - Daily to monthly precipitation
df_m=df1.resample('M').sum()

2 - Maximum monthly precipitation
p_max=df_m.groupby(df_m.index.month).max()

3 - Minimum monthly precipitation
p_min=df_m.groupby(df_m.index.month).min()

4 - Average monthly precipitation
p_avg=df_m.groupby(df_m.index.month).mean()

Sunday, November 26, 2017

Pandas sum column values according to another columns value


One-liner code to sum Pandas second columns according to same values in the first column.

df2 = df1.groupby(df1.columns[0])[df1.columns[1]].sum().reset_index()

For example, applying to a table listing pipe diameters and lenghts, the command will return total lenghts according to each unique diameters.

This functionality is similar to excel's pivot table sum.

Tuesday, July 25, 2017

Make numpy array of 'datetime' between two dates

A simple way to create an array of dates (time series), between two dates:

We can use the numpy arange - https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html , function which is most used to create arrays using start / stop / step arguments.

Syntax:
numpy.arange([start, ]stop, [step, ]dtype=None)

In case of datetime values, we need to specify the step value, and the correct type and unit of the timestep in the dtype argument

. dtype='datetime64[m]' will set the timestep unit to minutes;
. dtype='datetime64[h]' will set the timestep unit to hours;
. dtype='datetime64[D]' will set the timestep unit to days;
. dtype='datetime64[M]' will set the timestep unit to months;
. dtype='datetime64[Y]' will set the timestep unit to months;

For example:

import numpy as np
dates = np.arange('2017-06-01', '2017-06-02', 15, dtype='datetime64[m]') # 15 is the timestep value, dtype='datetime64[m] means that the step is datetime minutes


This example will create an array of 96 values, between 01jun2017 and 02jun2017, with a time step of 15 minutes.

Wednesday, March 15, 2017

Numpy - Accumulated and Incremental series

In Hydrology, it is always needed to deal with time-series of variables, as flow series or precipitation series, with the variable being incremental or accumulated.

Numpy has a great way to transform between accumulated and incremental series.

To accumulate a incremental series use the method

   numpy.cumsum(incrementalSeries)

And to transform a accumulated array to a incremental one, use:

    numpy.diff(accumulatedSeries)

Thursday, April 14, 2016

Time-Series in Python

Dealing with timeseries is a very common task in Hydrology.

One of the possibilities to process timeseries in python is to use a simple list.

For example, we can have a list of lists like this:

series1 = [ ['01/01/1900',0.0],['01/02/1900',0.1],['01/03/1900',0.3],['01/04/1900',0.4],['01/05/1900',2.2]...]

In this case, the ['01/01/1900',0.0] is composed of lists with a string representing the date, and a float number representing a value.

To properly make computations with dates, including sorting and grouping, it is necessary to interpret the string as a datetime format.

import datetime



for i in series1:

    i[0]=datetime.datetime.strptime(i[0], '%m/%d/%Y')


datetime objects accepts being sorted, making possible to sort the list based on the date, for example:

series1.sort(key=lambda x: x[0])

And we can make sums or averages based on specific months or years:

#eg. List of year 1900

lst1900 = [item for item in series1 if item[0].year==1900]



#Sum of 1900's values:

sum1900 = sum[item[1] for item in series1 if item[0].year==1900]



# avg of 1900's values

avg1900 = sum1900 / float(len(lst1900))