问题描述:

I have a file with 3'502'379 rows and 3 columns. The following script is supposed to be executed but raises and error in the date handling line:

import matplotlib.pyplot as plt

import numpy as np

import csv

import pandas

path = 'data_prices.csv'

data = pandas.read_csv(path, sep=';')

data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')

This is the error that occurs:

Traceback (most recent call last):

File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1945, in get_loc

return self._engine.get_loc(key)

File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)

File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)

File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)

File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)

KeyError: 'DATE'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "C:\data\script.py", line 15, in <module>

data['DATE'] = pandas.to_datetime(data['DATE'], format='%Y%m%d')

File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 1997, in __getitem__

return self._getitem_column(key)

File "C:\Program Files\Python35\lib\site-packages\pandas\core\frame.py", line 2004, in _getitem_column

return self._get_item_cache(key)

File "C:\Program Files\Python35\lib\site-packages\pandas\core\generic.py", line 1350, in _get_item_cache

values = self._data.get(item)

File "C:\Program Files\Python35\lib\site-packages\pandas\core\internals.py", line 3290, in get

loc = self.items.get_loc(item)

File "C:\Program Files\Python35\lib\site-packages\pandas\indexes\base.py", line 1947, in get_loc

return self._engine.get_loc(self._maybe_cast_indexer(key))

File "pandas\index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas\index.c:4066)

File "pandas\index.pyx", line 159, in pandas.index.IndexEngine.get_loc (pandas\index.c:3930)

File "pandas\hashtable.pyx", line 675, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12408)

File "pandas\hashtable.pyx", line 683, in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:12359)

KeyError: 'DATE'

网友答案:

the '\ufeffDATE' in the first column name shows that your CSV file has a UTF-16 Byte Order Mark (BOM) signature so it must be read accordingly.

so try this when reading your CSV:

df = pd.read_csv(path, sep=';', encoding='utf-8-sig')

or as @EdChum suggested:

df = pd.read_csv(path, sep=';', encoding='utf-16')

both variants should work properly

PS this answer shows how to deal with BOMs

相关阅读:
Top