问题描述:

I have a data frame, extracted from a .csv file using Data = pandas.read_csv

One of the columns of the data frame are dates, such as '14/09/2015', the type of data is str.

I need to create a subset, for which I use: NewDataFrame = DataFrame['DatesColumn'][DataFrame['DatesColumn']==desired date]

But I have two main problems:

  1. Since the dates are strings, I have tried to use a slice [-1]. But I get the error: KeyError : -1L

I tried to use this code to select 2014:

NewDataFrame = DataFrame['DatesColumn'][DataFrame['DatesColumn'][-1]==4]

  1. I have empty fields that have been imported as nan values. If I try to perform a for loop to transform the data, I get the error:

TypeError: 'float' object has no attribute '__getitem__'

Q: How can I subset the data (or clean it) by year?

Many thanks.

网友答案:

For the NaN values you can use fillna().

# to fill NaNs with zeros
noNans = withNans.fillna(0)

And for the date issue, instead of handling the date strings yourself you should let the already existing libraries handle them for you. In this case the read_csv() function can do it for you. See the documentation here.

Here's a little example:

Csv file:

1,14/09/2016,dataa
1,14/09/2015,dataa
2,14/10/2014,dataa2

Code:

import pandas as pd
from datetime import date

df = pd.read_csv("test.csv", header=None, parse_dates=[1])
df[df[1] > date.today()]

Prints only

   0          1      2
0  1 2016-09-14  dataa
相关阅读:
Top