Python in Hydrology and Hydraulics: DataFrame

Showing posts with label DataFrame. Show all posts

Thursday, August 11, 2022

Pandas- Creating a blank dataframe between two dates

import pandas as pd

dtmin,dtmax=pd.Timestamp("2022-08-10 00:00"),pd.Timestamp("2022-08-11 00:00")

df00=pd.DataFrame(index=pd.date_range(dtmin,dtmax,freq='10min'),columns=['value'])

Frequency can be: '1d', '10min', '30min', among others in the right format.

Monday, March 15, 2021

Return values that meet some criteria based in other columns - Pandas

Some simple tasks are much faster and simpler in pandas than in Excel.

For example: return values that meet some criteria based in other columns.

If we want to list all the basins with area greater than 100ha, a simple code will do.

Assuming we had copied this table from excel:

import pandas as pd

df0=pd.read_clipboard()

df0[df0.iloc[:,-1]>100].to_clipboard()

This code will put the result in the clipboard, to paste back into Excel (for ex.).

Monday, October 21, 2019

Delete row/ column in DataFrame with Null/ NaN values

if df0 is a Pandas DataFrame with null Values:

df0=df0.dropna(axis=0,how='all') - will remove rows that have only 'NaN' values

df0=df0.dropna(axis=1,how='all') - will remove columns that have only 'NaN' values

Thursday, June 13, 2019

Find Maximum Values by year in timeseries dataframe, keeping the date

df2=df1.ix[df1.groupby(df1.index.year).idxmax().iloc[:,0]]

Tuesday, May 28, 2019

Find common timespan/ years in multiple time series/ dataframes

#concatenate vertically all dataframes
dfAlldf=pd.concat([df1, df2,df3,df4], axis=1)

#sort them by the date (datetimeindex)

dfAlldf=dfAlldf.sort_index()

#group dataframe by years

grps=dfAlldf.groupby(dfAlldf.index.year)

#empty dataframe for populating with complete years

dfCompl=pd.DataFrame()

# for each group of years

for g in grps:

#if don't have any null values in year, in any column

if not any(g[1].isnull().any(axis=1)):

#concatenate in dfCompl

dfCompl=pd.concat([dfCompl, g[1]], axis=0)

# re-sort by index

dfCompl=dfCompl.sort_index()

Wednesday, January 23, 2019

Interpolate missing values in pandas DataFrame

If we have a dataframe with dates and flows - with missing values, as example below:

0
2019-01-31 50.208308
2019-02-28 50.623457
2019-03-31 56.203933
2019-04-30 NaN
2019-05-31 NaN
2019-06-30 117.727655
2019-07-31 62.273259
2019-08-31 49.054898
2019-09-30 55.612575
2019-10-31 54.187409

We can use the function pandas interpolate, and interpolate the data with different methods

dfIn.interpolate() - will fill noData with linear interpolation;
dfIn.interpolate(method='polynomial', order=3) - will fill noData with 3rd degree polinomial interpolation;

Result:
linear polinomial original
2019-01-31 50.208308 50.208308 50.208308
2019-02-28 50.623457 50.623457 50.623457
2019-03-31 56.203933 56.203933 56.203933
2019-04-30 76.711840 89.513986 NaN
2019-05-31 97.219748 124.233259 NaN
2019-06-30 117.727655 117.727655 117.727655
2019-07-31 62.273259 62.273259 62.273259
2019-08-31 49.054898 49.054898 49.054898
2019-09-30 55.612575 55.612575 55.612575
2019-10-31 54.187409 54.187409 54.187409

Wednesday, November 14, 2018

Pandas - Select only numeric columns

df=df.select_dtypes(include=['float64'])

will return only the float values of the 'df' dataframe.

Tuesday, August 21, 2018

Maximum, minimum and average monthly precipitation

Below is shown some panda commands for retrieving maximum, minimum and average monthly precipitation from daily precipitation data.

The daily precipitation is assumed to be in a pandas DataFrame, with its index in Datetime index format.

1 - Daily to monthly precipitation
df_m=df1.resample('M').sum()

2 - Maximum monthly precipitation
p_max=df_m.groupby(df_m.index.month).max()

3 - Minimum monthly precipitation
p_min=df_m.groupby(df_m.index.month).min()

4 - Average monthly precipitation
p_avg=df_m.groupby(df_m.index.month).mean()

Saturday, June 24, 2017

Pandas - How to read text files delimited with fixed widths

With Python Pandas library it is possible to easily read fixed width text files, for example:

In this case, the text file has its first 4 lines without data and the 5th line with the header. The header and the data are delimeted with fixed char widths, being the widths sizes as following:

12 spaces , 10 spaces ,6 spaces ,9 spaces ,7 spaces,7 spaces ,7 spaces ,4 spaces

The following code will read the file as a pandas DataFrame, and also parse the dates in the datetime format:

import pandas as pd

ds2 = pd.read_fwf('yourtextfile.txt', widths=[12,10,6,9,7,7,7,4], skiprows=4, parse_dates=True)

Friday, June 2, 2017

What is PANDAS? - Pandas in Hydrology

As stated in the Wikipedia:
"...

pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license.[2] The name is derived from the term "panel data", an econometrics term for multidimensional structured data sets...."Pandas is a library that can easily deal with datasets, and together with numpy and scipy, can solve a great number of hydrology and hydraulics problems.
"

Pandas can easily read text/csv files, and can categorize and make operations on its data with few lines of code.

First, we have always to import pandas library with:

import pandas as pd

To read a csv timeseries of precipitation daily data, we can write:

dataSeries = pd.read_csv('csvfile.csv', index_col=0, parse_dates=True)

if the index column is the first one, and it have dates in standard format.

To get average and standard deviation, just write:

m1,d1 = serY.mean(), serY.std()

And to make an easy and beautiful histogram of this data, just write:

dataSeries.hist()

Pandas documentation is available on the site:http://pandas.pydata.org/pandas-docs/stable/install.html

Happy analyzing!