Wednesday, November 14, 2018

Pandas - Select only numeric columns

df=df.select_dtypes(include=['float64'])

will return only the float values of the 'df' dataframe.

Tuesday, September 4, 2018

Excel to Python Pandas DataFrame

The most easy way is to use the clipboard. Just copy your table in Excel to the clipboard, and then call the read_clipboard method from pandas:

df1 = pd.read_clipboard()

Pandas can read Excel files directly too, with read_excel

df1 = pd.read_excel('tmp.xlsx',index_col=0, header=0)

Taking care with header and indexes, and other options as described here.

Tuesday, August 21, 2018

Maximum, minimum and average monthly precipitation

Below is shown some panda commands for retrieving maximum, minimum and average monthly precipitation from daily precipitation data.

The daily precipitation is assumed to be in a pandas DataFrame, with its index in Datetime index format.

1 - Daily to monthly precipitation
df_m=df1.resample('M').sum()

2 - Maximum monthly precipitation
p_max=df_m.groupby(df_m.index.month).max()

3 - Minimum monthly precipitation
p_min=df_m.groupby(df_m.index.month).min()

4 - Average monthly precipitation
p_avg=df_m.groupby(df_m.index.month).mean()

Tuesday, July 24, 2018

How to extract all numbers / floats from a string with python regex

import re

stringWithNumbers='anystring2with 6.89 numbers3.55 3.141312'
digits = re.findall("[-+]?\d+\.?\d*", stringWithNumbers)

Tuesday, July 10, 2018

Fitting IDF curves with Scipy and Pandas

# -*- coding: utf-8 -*-
"""
Created on Tue Jul 10 11:19:37 2018

@author: rodrigo.goncalves
"""
import pandas as pd
from scipy.optimize import minimize
 
# This is the IDF function, returning the sum of squared errors (SSE)
def func2(par, res):
    p1 =  (par[0] * res.index.values  **par[1])
    p2 = ((res.columns.values.astype(float)+par[2])**par[3])
    f = pd.DataFrame([p/p2 for p in p1],index=res.index,columns=res.columns)
    erroTotQ=((f-res)**2).sum(axis=1).sum()    
    return erroTotQ
 
# copy your rainfall intensities table from excel
# with column headers with rainfall durations
# and row names with Return period value in years
dfInt=pd.read_clipboard()

#initial guess
param1 = [5000, 0.1, 10, 0.9]

res2 = minimize(func2, param1, args=(dfInt,), method='Nelder-Mead')
print(res2)
cs=['K=','a=','b=','c=']
dfResult=pd.DataFrame(res2.x,index=cs).transpose()
print(dfResult)
dfResult.to_clipboard(index=None)

Friday, June 8, 2018

CSV file to Python list of lists

import csv

with open('file.csv', 'r') as f1:
    reader = csv.reader(f1)
    your_list = list(reader)

Wednesday, May 30, 2018

Pandas - Operations between rows - distance between 2 points

If we have a table with a column with xy coordinates, for example:



We can get the difference between consecutive rows by using  Pandas SHIFT function on columns.
".shift(-1)" will roll the rows 1 position backwards, and ".shift(1)" or simply ".shift()" will roll down your column by 1 position of the rows.

In our example, df1['x'].shift() will return:

0              NaN
1    455395.996360
2    527627.076641
3    536278.269190
4    553932.441097
5    568699.553239
6    569709.130272
7    573016.302437
8    575141.096777
9    580107.934566

if we want to calculate the euclidean distance between consecutive points, we can use the shift associated with numpy functions numpy.sqrt and numpy.power as following:

df1['diff']= np.sqrt(np.power(df1['x'].shift()-df1['x'],2)+
   np.power(df1['y'].shift()-df1['y'],2))

Resulting in:

0              NaN
1     89911.101224
2     21323.016099
3    204394.524574
4     37767.197793
5     46692.771398
6     13246.254235
7      2641.201366
8     15153.187527
9     15853.974422


Wednesday, May 2, 2018

Solve Manning's Equation with Python Scipy library

from scipy.optimize import root

def manningC(d, args):
    Q, w,h,sSlopeL,sSlopeR,nMann,lSlope = args
    #left side slope can be different from right side slope
    area = ((((d*sSlopeL)+(d*sSlopeR)+w)+w)/2)*d
    # wet perimeter
    wPer = w+(d*(sSlopeL*sSlopeL+1)**0.5)+(d*(sSlopeR*sSlopeR+1)**0.5)
    #Hydraulic Radius
    hR = area/ wPer
    # following formula must be zero
    # manipulation of Manning's formula
    mannR = (Q*nMann/lSlope**0.5)-(area*hR**(2.0/3.0))
    return mannR
    
###### MAIN CODE
# the following are input data to our open channel manning calculation
# flow, width, height, left side slope, right side slope, 
# Manning coefficient, longitudinal slope
args0 = [2.5,2,.5,1.0,1.0,.015,.005]
initD = .00001  # initial water depth value

# then we call the root scipy function to the manningC
sol =root(manningC,initD, args=(args0,))    
# print the root found
print(sol.x)

Friday, April 13, 2018

IDF table from equation - Pandas


How to fill a table (pandas DataFrame) with rainfall intensity values, extracted from idf equation:


import pandas as pd

durs = [5,10,15,20,30,60,120,180,360,540,720,1440] # rainfall durations
rps=[5,10,25,50,100] # return periods


t1 = pd.DataFrame(index=rps,columns=durs) # initialize DataFrame with rp's and durations

cf = [942.76,.1242,17.0,.650] #example idf equation coefficients

table = t1.apply(lambda x: (cf[0]*x.index**cf[1])/((x.name+cf[2])**cf[3]))

Wednesday, February 28, 2018

Installing Python in Windows


WinPython is a free open-source portable distribution of the Python programming language for Windows 7/8/10 and scientific and educational usage.


Designed for scientists, data-scientists, and education.


WinPython is a portable application, so the user should not expect any integration into Windows explorer during installation. However, the WinPython Control Panel allows to "register" your distribution to Windows.


Free Download at:

Thursday, February 8, 2018

Daily precipitation data to monthly and yearly

With pandas, we can group datetime values according to datetime units, as months and years.

If we have a pandas dataframe with 2 columns  - DATE and VALUE

Certify that the column DATE is recognized as datetime64 format - you can use, for example:

df1['DATE'] = pd.to_datetime(df1['DATE'])

and then you can group and make operations on this group.

Getting the average of total monthly precipitation:

monthly = df1.groupby(df1['DATE'].dt.month).sum() / len(pd.unique(df1['DATE'].dt.year))

(groups by month sum and then divide by the number of years)

Getting the average yearly precipitation:

PAnual = df1.groupby(df1[0].dt.year).sum().mean()

Friday, January 26, 2018

Accumulating Precipitation Data

If we have a pandas dataframe named df1 with a column '15min' containing 15 minute precipitation data, we can easily accumulate for other durations, using the rolling method as shown in the example below:

df1['01 h'] =df1['15min'].rolling(window=4,center=False).sum()

Note that the window=4 parameter means that it will accumulate 4 lines of 15 minute data, resulting in 1 hour precipitation. This parameter can be changed to whatever duration you want.

Tuesday, January 23, 2018

Polynomial Curve Fitting

The code below shows how easily you can do a Polynomial Curve Fitting with Python and Numpy.

import numpy as np

# sample x and y data - example
x = [7.76,10.11,11.89,14.81,15.49]
y = [1.851,1.971,1.953,1.842,1.805]

# the polyfit functions does the nth degree polynomial best fit on the data, 
# returning the polynomial coefficients

n = 4   # 4th degree polynomial, you can change for whatever degree you want
coefs = np.polyfit(x,y,n)

# The poly1d function applies the polynomial function to our calculated coefficients
polyf = np.poly1d(coefs)

#if we want to apply our polynomial function to a range of x values
xf = np.linspace(0,20)
yf = polyf(xf)