问题描述:

I'm working on python 2.7, pandas ( version 0.18.1 ) data frames.

I have to modify a column in the data frame based on several columns in the same data frame.

For that I have written my code as below

Sample data is like below

`data`

is my dataframe

My sample data is like

`+---+---+----+----+---+---------+---+----+----+---+----------+`

| a | b | c | d | e | f | g | h | i | j | discount |

+---+---+----+----+---+---------+---+----+----+---+----------+

| 0 | | | | | 65497.6 | | | | | 0 |

| 0 | | | | | 73882.8 | | | | | 0 |

| 0 | | | | | 88588 | | 22 | | | 0 |

| 0 | | | | | 106480 | | 20 | 10 | | 0 |

| 0 | | | | | 52500 | | | | | 0 |

| 0 | | 20 | 10 | | 22997.5 | | | | | 0 |

| | | | | | | | | | | 0 |

| 0 | | | 20 | | 0 | | | | | 0 |

| 0 | | | | | 10520 | | | | | 0 |

+---+---+----+----+---+---------+---+----+----+---+----------+

And my code is like below

`columns1 = ['a','b','c','d','e']`

columns2 = ['f','g','h','i','j']

data['discount'] = 0

for i, row in data.iterrows():

a = 0

b = 0

for col1 in columns1 :

value = row[col1]

if value > 0:

a = value

break;

for col2 in columns2 :

value = row[col2]

if value > 0:

b = value

break;

if( a != 0 and b != 0):

data.loc[i, 'discount'] = abs(a-b)

As I'm doing this way it is taking lot of time and lot of memory on the large dataset. I have 700MB of data, It is taking more than 120GB of RAM to process and approximately after 10 hours process is giving the exception saying `Memory Error`

according to this http://stackoverflow.com/a/24871316, I should not use like that, Please let me know how can I write this code more efficient.

**Please let me know the reason for down voting my question, so that I can learn**

Assuming your empty cells are `NaN`

values, this gives you the first non-NA value of each row for the group of columns you are interested in:

```
df[df>0][columns1].bfill(axis=1).iloc[:,0]
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 20.0
6 NaN
7 20.0
8 NaN
```

Thus, this will give you the `abs(a-b)`

you're searching for:

```
res = (df[df>0][columns1].bfill(axis=1).iloc[:,0]
-df[df>0][columns2].bfill(axis=1).iloc[:,0]).abs()
res
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 22977.5
6 NaN
7 NaN
8 NaN
```

You can either combine it with your initialized `discount`

column:

```
res.combine_first(df.discount)
```

or fill the blanks:

```
res.fillna(0)
```