问题描述:

I have a dataframe that looks like this

 Idnumber Parent Date Other variables

1 a 2005 x

1 a 2007 x

2 b 2005 x

2 b 2006 x

2 b 2007 x

I need it to look like this:

 Idnumber Parent Date Other variables

1 a 2005 x

1 NaN 2006 NaN

1 a 2007 x

2 b 2005 x

2 b 2006 x

2 b 2007 x

Considering that I need to be able to perform checks on the added values later I cannot simply add them. I need to verify they did not exist and copy various remaining variables as well which will be interpolated. These need to come out empty.

My idea was to create an empty row between all existing rows and simply backwards and forwards fill. Thereby ensuring no other information was copied.

I would not know how to do this though.

Preferably I would skip the introduction of the empty row and do the whole thing in one go.

But I have even less of an idea how to get started on that

网友答案:

For overall approach, you could first define which rows should exist and then merge with the original dataset.

>>> orig

   Idnumber Parent  Date Other
0         1      a  2005     x
1         1      a  2007     x
2         2      b  2005     x
3         2      b  2006     x
4         2      b  2007     x

Now use itertools.product to define all the rows that should exist. (You could alternatively use pd.MultiIndex.from_product.)

>>> import itertools
>>> df = pd.DataFrame(list(itertools.product(orig['Idnumber'].unique(),
                                             orig['Date'].unique())))
>>> df.columns = ['Idnumber','Date']

   Idnumber  Date
0         1  2005
1         1  2006
2         1  2007
3         2  2005
4         2  2006
5         2  2007

Then merge with the original data:

>>> df.merge(orig,how='outer',on=['Idnumber','Date'])

   Idnumber  Date Parent Other
0         1  2005      a     x
1         1  2006    NaN   NaN
2         1  2007      a     x
3         2  2005      b     x
4         2  2006      b     x
5         2  2007      b     x

After this you can then use fillna, interpolate, etc.

相关阅读:
Top