Python in Hydrology and Hydraulics: datetime

Showing posts with label datetime. Show all posts

Thursday, August 11, 2022

Pandas- Creating a blank dataframe between two dates

import pandas as pd

dtmin,dtmax=pd.Timestamp("2022-08-10 00:00"),pd.Timestamp("2022-08-11 00:00")

df00=pd.DataFrame(index=pd.date_range(dtmin,dtmax,freq='10min'),columns=['value'])

Frequency can be: '1d', '10min', '30min', among others in the right format.

Tuesday, May 28, 2019

Find common timespan/ years in multiple time series/ dataframes

#concatenate vertically all dataframes
dfAlldf=pd.concat([df1, df2,df3,df4], axis=1)

#sort them by the date (datetimeindex)

dfAlldf=dfAlldf.sort_index()

#group dataframe by years

grps=dfAlldf.groupby(dfAlldf.index.year)

#empty dataframe for populating with complete years

dfCompl=pd.DataFrame()

# for each group of years

for g in grps:

#if don't have any null values in year, in any column

if not any(g[1].isnull().any(axis=1)):

#concatenate in dfCompl

dfCompl=pd.concat([dfCompl, g[1]], axis=0)

# re-sort by index

dfCompl=dfCompl.sort_index()

Thursday, April 25, 2019

Pandas - Reading headers and dates correctly from Clipboard/ CSV

When using pandas funcions read_clipboard() or read_csv() you have to define if your data has headers (column headers) and indexes (row headers).

If you're passing indexes with datetime format, make sure if it will be parsed correctly, indicating it's a datetime and if it has dayfirst format (dd/mm/YYYY).

For example:

pd.read_clipboard(index_col=0, headers=None,parse_dates=True, dayfirst=True)

Is telling pandas that the table in clipboard has no column headers, but have index (row headers) in the first column and it is in datetime format with day first (dd/mm/YYYY).

Tuesday, August 21, 2018

Maximum, minimum and average monthly precipitation

Below is shown some panda commands for retrieving maximum, minimum and average monthly precipitation from daily precipitation data.

The daily precipitation is assumed to be in a pandas DataFrame, with its index in Datetime index format.

1 - Daily to monthly precipitation
df_m=df1.resample('M').sum()

2 - Maximum monthly precipitation
p_max=df_m.groupby(df_m.index.month).max()

3 - Minimum monthly precipitation
p_min=df_m.groupby(df_m.index.month).min()

4 - Average monthly precipitation
p_avg=df_m.groupby(df_m.index.month).mean()

Thursday, February 8, 2018

Daily precipitation data to monthly and yearly

With pandas, we can group datetime values according to datetime units, as months and years.

If we have a pandas dataframe with 2 columns - DATE and VALUE

Certify that the column DATE is recognized as datetime64 format - you can use, for example:

df1['DATE'] = pd.to_datetime(df1['DATE'])

and then you can group and make operations on this group.

Getting the average of total monthly precipitation:

monthly = df1.groupby(df1['DATE'].dt.month).sum() / len(pd.unique(df1['DATE'].dt.year))

(groups by month sum and then divide by the number of years)

Getting the average yearly precipitation:

PAnual = df1.groupby(df1[0].dt.year).sum().mean()

Tuesday, July 25, 2017

Make numpy array of 'datetime' between two dates

A simple way to create an array of dates (time series), between two dates:

We can use the numpy arange - https://docs.scipy.org/doc/numpy/reference/generated/numpy.arange.html , function which is most used to create arrays using start / stop / step arguments.

Syntax:
numpy.arange([start, ]stop, [step, ]dtype=None)

In case of datetime values, we need to specify the step value, and the correct type and unit of the timestep in the dtype argument

. dtype='datetime64[m]' will set the timestep unit to minutes;
. dtype='datetime64[h]' will set the timestep unit to hours;
. dtype='datetime64[D]' will set the timestep unit to days;
. dtype='datetime64[M]' will set the timestep unit to months;
. dtype='datetime64[Y]' will set the timestep unit to months;

For example:

import numpy as np

dates = np.arange('2017-06-01', '2017-06-02', 15, dtype='datetime64[m]')
# 15 is the timestep value, dtype='datetime64[m] means that the step is datetime minutes

This example will create an array of 96 values, between 01jun2017 and 02jun2017, with a time step of 15 minutes.

Thursday, April 14, 2016

Time-Series in Python

Dealing with timeseries is a very common task in Hydrology.

One of the possibilities to process timeseries in python is to use a simple list.

For example, we can have a list of lists like this:

series1 = [ ['01/01/1900',0.0],['01/02/1900',0.1],['01/03/1900',0.3],['01/04/1900',0.4],['01/05/1900',2.2]...]

In this case, the ['01/01/1900',0.0] is composed of lists with a string representing the date, and a float number representing a value.

To properly make computations with dates, including sorting and grouping, it is necessary to interpret the string as a datetime format.

import datetime



for i in series1:

    i[0]=datetime.datetime.strptime(i[0], '%m/%d/%Y')

datetime objects accepts being sorted, making possible to sort the list based on the date, for example:

series1.sort(key=lambda x: x[0])

And we can make sums or averages based on specific months or years:

#eg. List of year 1900

lst1900 = [item for item in series1 if item[0].year==1900]



#Sum of 1900's values:

sum1900 = sum[item[1] for item in series1 if item[0].year==1900]



# avg of 1900's values

avg1900 = sum1900 / float(len(lst1900))