问题描述:

I want to use regex with pandas to replace values in a column to mark correct answer for the question.

Values in this column are '1943' - the correct one, and other years - incorrect.

The code I have now is:

incorrect_dict= {'Q1':{'^(?!1943$).*': 0}}

df = df.replace(incorrect_dict, regex=True)

and it doesn't replace values in pandas.

The regex itself seems ok, since it works when I use:

string ="1933"

regex = re.compile("^(?!1943$).*")

regex.findall(string)

i get:

[u'1933']

for string = '1943' i get 'No match was found:' so I assume the regex is ok. but when I use if with df.replace the values are not replaced.

thanks for any suggestions

网友答案:

I suspect the years were parsed as integers. See how it fails:

In [17]: df = DataFrame({'Q1': [1933, 1943]})

In [18]: df.replace(incorrect_dict, regex=True)
Out[18]: 
     Q1
0  1933
1  1943

But if I convert the years to strings, it works as you expect.

In [19]: df['Q1'] = df['Q1'].map(str)

In [20]: df.replace(incorrect_dict, regex=True)
Out[20]: 
     Q1
0     0
1  1943

Incidentally, I'm not convinced that treating the responses as strings and using regex is the way to go. Why not take the years as integer and evaluate df['Q1'] == 1943? The result will be True/False, meaning correct/incorrect. Seems more useful to me.

相关阅读:
Top