Showing posts with label Time series. Show all posts
Showing posts with label Time series. Show all posts

Wednesday, January 23, 2019

Interpolate missing values in pandas DataFrame

If we have a dataframe with dates and flows - with missing values, as example below:

        0
2019-01-31 50.208308
2019-02-28 50.623457
2019-03-31 56.203933
2019-04-30 NaN
2019-05-31 NaN
2019-06-30 117.727655
2019-07-31 62.273259
2019-08-31 49.054898
2019-09-30 55.612575
2019-10-31 54.187409


We can use the function pandas interpolate, and interpolate the data with different methods

dfIn.interpolate() - will fill noData with linear interpolation;
dfIn.interpolate(method='polynomial', order=3) - will fill noData with 3rd degree polinomial interpolation;

Result:
                linear  polinomial    original
2019-01-31   50.208308   50.208308   50.208308
2019-02-28   50.623457   50.623457   50.623457
2019-03-31   56.203933   56.203933   56.203933
2019-04-30   76.711840   89.513986         NaN
2019-05-31   97.219748  124.233259         NaN
2019-06-30  117.727655  117.727655  117.727655
2019-07-31   62.273259   62.273259   62.273259
2019-08-31   49.054898   49.054898   49.054898
2019-09-30   55.612575   55.612575   55.612575
2019-10-31   54.187409   54.187409   54.187409








Friday, June 30, 2017

Simple code to generate synthetic time series data in Python / Pandas

Here is a simple code to generate synthetic time series.

import numpy as np
import pandas as pd

med = 15.5
dp = 8.2
sDays = np.arange('2001-01', '2016-12', dtype='datetime64[D]')
nDays = len(sDays)

s1 = np.random.gumbel(loc=med,scale=dp,size=nDays)
s1[s1 < 0] = 0

dfSint = pd.DataFrame({'Q':s1},index=sDays)
dfSint.plot()