问题描述:

I need to read xml file as a string (not to parse it). Problem is, it is all in Cyrillic, and I failed reading (or, at least, printing) the string in a good way.

my attempts:

with open (path, "r" ) as myfile:

return myfile.read().replace('\n', '')

with open (path, "r" ) as myfile:

return unicode(myfile.read().replace('\n', ''),encoding='utf8')

both work, and I've been able to operate on string in first case - but still cannot print it

UPDATE

It looks like I pointed in the wrong direction with this problem: I use Jupyther notebooks, so the same issue even with "ordinal" cases:

import re

text = '<p id="p755">После Смоленска Наполеон'

m = re.search('(?:<p.*>)(.*)', text)

if m:

found = m.group(1)

found

'\xd0\x9f\xd0\xbe\xd1\x81\xd0\xbb\xd0\xb5 \xd0\xa1\xd0\xbc\xd0\xbe\xd0\xbb\xd0\xb5\xd0\xbd\xd1\x81\xd0\xba\xd0\xb0 \xd0\x9d\xd0\xb0\xd0\xbf\xd0\xbe\xd0\xbb\xd0\xb5\xd0\xbe\xd0\xbd'

相关阅读:
Top