Thursday, August 11, 2022

Pandas- Creating a blank dataframe between two dates

import pandas as pd

dtmin,dtmax=pd.Timestamp("2022-08-10 00:00"),pd.Timestamp("2022-08-11 00:00")

df00=pd.DataFrame(index=pd.date_range(dtmin,dtmax,freq='10min'),columns=['value'])

Frequency can be: '1d', '10min', '30min', among others in the right format.

Tuesday, April 20, 2021

Pandas - Reading CSV file with variable number of columns

 If the CSV/ text file has different number of columns along the rows, it will fail reading with simple "pd.read_csv(file)".

Instead of it, try naming the columns. It will read until the number of columns supplied.

For example:

df0=pd.read_csv(filename, sep=';',names=['a', 'b', 'c', 'd', 'e'])

It will read only until 5 columns wide.

Monday, March 15, 2021

Return values that meet some criteria based in other columns - Pandas

Some simple tasks are much faster and simpler in pandas than in Excel.

For example: return values that meet some criteria based in other columns.


If we want to list all the basins with area greater than 100ha, a simple code will do.

Assuming we had copied this table from excel:


import pandas as pd

df0=pd.read_clipboard()

df0[df0.iloc[:,-1]>100].to_clipboard()


This code will put the result in the clipboard, to paste back into Excel (for ex.).





Wednesday, July 8, 2020

Downloading table frow website to Pandas Dataframe - html to pandas

In some cases it is possible to download only with pandas command:

pd.read_html(url)

If the command returns with the html response "forbidden" we can use the requests library, sending some header information to prevent this error.

Example: downloading world population data from site www.worldometers.info: 

import requests
import pandas as pd

url = r'https://www.worldometers.info/world-population/population-by-country/'

header = {
  "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
  "X-Requested-With": "XMLHttpRequest"
}

r = requests.get(url, headers=header)

#world population - first table of the website
dfpop=pd.read_html(r.text)[0]


Monday, October 21, 2019

Delete row/ column in DataFrame with Null/ NaN values

if df0 is a Pandas DataFrame with null Values:

df0=df0.dropna(axis=0,how='all')  - will remove rows that have only 'NaN' values

df0=df0.dropna(axis=1,how='all')  - will remove columns that have only 'NaN' values


Tuesday, May 28, 2019

Find common timespan/ years in multiple time series/ dataframes


#concatenate vertically all dataframes
dfAlldf=pd.concat([df1, df2,df3,df4], axis=1)     

#sort them by the date (datetimeindex)
dfAlldf=dfAlldf.sort_index()    

#group dataframe by years
grps=dfAlldf.groupby(dfAlldf.index.year)           

#empty dataframe for populating with complete years
dfCompl=pd.DataFrame()

# for each group of years                                
for g in grps:
    #if don't have any null values in year, in any column 
    if not any(g[1].isnull().any(axis=1)):           
        #concatenate in dfCompl
        dfCompl=pd.concat([dfCompl, g[1]], axis=0)     

# re-sort by index
dfCompl=dfCompl.sort_index()                      


Thursday, April 25, 2019

Pandas - Reading headers and dates correctly from Clipboard/ CSV

When using pandas funcions read_clipboard() or read_csv() you have to define if your data has headers (column headers) and indexes (row headers).

If you're passing indexes with datetime format, make sure if it will be parsed correctly, indicating it's a datetime and if it has dayfirst format (dd/mm/YYYY).

For example:

pd.read_clipboard(index_col=0, headers=None,parse_dates=True, dayfirst=True)

Is telling pandas that the table in clipboard has no column headers, but have index (row headers) in the first column and it is in datetime format with day first (dd/mm/YYYY).

Sunday, April 21, 2019

Logarithmic and Exponential Curve Fit in Python - Numpy


With numpy function "polyfit":

X,y : data to be fitted

import numpy as np

1. Exponential fit

cf = np.polyfit(X, np.log(y), 1)

will return two coefficients, who will compose the equation:

exp(cf[1])*exp(cf[0]*X)


2. Logarithm fit:

cf = np.polyfit(np.log(X), y, 1)

will return two coefficients, who will compose the equation:

cf[0]*log(X)+cf[1]

Wednesday, January 23, 2019

Interpolate missing values in pandas DataFrame

If we have a dataframe with dates and flows - with missing values, as example below:

        0
2019-01-31 50.208308
2019-02-28 50.623457
2019-03-31 56.203933
2019-04-30 NaN
2019-05-31 NaN
2019-06-30 117.727655
2019-07-31 62.273259
2019-08-31 49.054898
2019-09-30 55.612575
2019-10-31 54.187409


We can use the function pandas interpolate, and interpolate the data with different methods

dfIn.interpolate() - will fill noData with linear interpolation;
dfIn.interpolate(method='polynomial', order=3) - will fill noData with 3rd degree polinomial interpolation;

Result:
                linear  polinomial    original
2019-01-31   50.208308   50.208308   50.208308
2019-02-28   50.623457   50.623457   50.623457
2019-03-31   56.203933   56.203933   56.203933
2019-04-30   76.711840   89.513986         NaN
2019-05-31   97.219748  124.233259         NaN
2019-06-30  117.727655  117.727655  117.727655
2019-07-31   62.273259   62.273259   62.273259
2019-08-31   49.054898   49.054898   49.054898
2019-09-30   55.612575   55.612575   55.612575
2019-10-31   54.187409   54.187409   54.187409