python - Fill NaN in candlestick OHLCV data -
i have dataframe this
open high low close vol 2012-01-01 19:00:00 449000 449000 449000 449000 1336303000 2012-01-01 20:00:00 nan nan nan nan nan 2012-01-01 21:00:00 nan nan nan nan nan 2012-01-01 22:00:00 nan nan nan nan nan 2012-01-01 23:00:00 nan nan nan nan nan ... open high low close vol 2013-04-24 14:00:00 11700000 12000000 11600000 12000000 20647095439 2013-04-24 15:00:00 12000000 12399000 11979000 12399000 23997107870 2013-04-24 16:00:00 12399000 12400000 11865000 12100000 9379191474 2013-04-24 17:00:00 12300000 12397995 11850000 11850000 4281521826 2013-04-24 18:00:00 11850000 11850000 10903000 11800000 15546034128
i need fill nan
according rule
when open, high, low, close nan,
- set vol 0
- set open, high, low, close previous close candle value
else keep nan
here's how via masking
simulate frame holes (a 'close' field)
in [20]: df = dataframe(randn(10,3),index=date_range('20130101',periods=10,freq='min'), columns=list('abc')) in [21]: df.iloc[1:3,:] = np.nan in [22]: df.iloc[5:8,1:3] = np.nan in [23]: df out[23]: b c 2013-01-01 00:00:00 -0.486149 0.156894 -0.272362 2013-01-01 00:01:00 nan nan nan 2013-01-01 00:02:00 nan nan nan 2013-01-01 00:03:00 1.788240 -0.593195 0.059606 2013-01-01 00:04:00 1.097781 0.835491 -0.855468 2013-01-01 00:05:00 0.753991 nan nan 2013-01-01 00:06:00 -0.456790 nan nan 2013-01-01 00:07:00 -0.479704 nan nan 2013-01-01 00:08:00 1.332830 1.276571 -0.480007 2013-01-01 00:09:00 -0.759806 -0.815984 2.699401
the ones nan
in [24]: mask_0 = pd.isnull(df).all(axis=1) in [25]: mask_0 out[25]: 2013-01-01 00:00:00 false 2013-01-01 00:01:00 true 2013-01-01 00:02:00 true 2013-01-01 00:03:00 false 2013-01-01 00:04:00 false 2013-01-01 00:05:00 false 2013-01-01 00:06:00 false 2013-01-01 00:07:00 false 2013-01-01 00:08:00 false 2013-01-01 00:09:00 false freq: t, dtype: bool
ones want propogate a
in [26]: mask_fill = pd.isnull(df['b']) & pd.isnull(df['c']) in [27]: mask_fill out[27]: 2013-01-01 00:00:00 false 2013-01-01 00:01:00 true 2013-01-01 00:02:00 true 2013-01-01 00:03:00 false 2013-01-01 00:04:00 false 2013-01-01 00:05:00 true 2013-01-01 00:06:00 true 2013-01-01 00:07:00 true 2013-01-01 00:08:00 false 2013-01-01 00:09:00 false freq: t, dtype: bool
propogate first
in [28]: df.loc[mask_fill,'c'] = df['a'] in [29]: df.loc[mask_fill,'b'] = df['a']
fill 0's
in [30]: df.loc[mask_0] = 0
done
in [31]: df out[31]: b c 2013-01-01 00:00:00 -0.486149 0.156894 -0.272362 2013-01-01 00:01:00 0.000000 0.000000 0.000000 2013-01-01 00:02:00 0.000000 0.000000 0.000000 2013-01-01 00:03:00 1.788240 -0.593195 0.059606 2013-01-01 00:04:00 1.097781 0.835491 -0.855468 2013-01-01 00:05:00 0.753991 0.753991 0.753991 2013-01-01 00:06:00 -0.456790 -0.456790 -0.456790 2013-01-01 00:07:00 -0.479704 -0.479704 -0.479704 2013-01-01 00:08:00 1.332830 1.276571 -0.480007 2013-01-01 00:09:00 -0.759806 -0.815984 2.699401
Comments
Post a Comment