问题描述:

i am learning python, and i am having troubles with saving the output of a small function to file. My python function is the following:

#!/usr/local/bin/python

import subprocess

import codecs

airport = '/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport'

def getAirportInfo():

arguments = [airport, "--scan" , "--xml"]

execute = subprocess.Popen(arguments, stdout=subprocess.PIPE)

out, err = execute.communicate()

print out

return out

airportInfo = getAirportInfo()

outFile = codecs.open('wifi-data.txt', 'w')

outFile.write(airportInfo)

outFile.close()

I guess that this would only work on a Mac, as it references some PrivateFrameworks.

Anyways, the code almost works as it should. The print statement prints a huge xml file, that i'd like to store in a file, for future processing. And here start the problems.

In the version above, the script executes without any errors, however, when i try to open the file, i get an error message, along the lines of error with utf-8 encoding. Ignoring this, opens the file, and most of the things look fine, except for a couple of things:

  • some SSID have non-ascii characters, like ä, ö and ü. When printing those on the screen, they are correctly displayed as \xc3\xa4 and so on. When I open the file it is displayed incorrectly, the usual random garbage.

  • some of the xml values look like these when printed on screen: Data("\x00\x11WLAN-0024FE056185\x01\x08\x82\x84\x8b\x96\x0c\ … x10D\x00\x01\x02") but like this when read from file: //8AAAAAAAAAAAAAAAAAAA==

I thought it could be an encoding error (seen as the Umlauts have problems, the error message says something about the utf-8 encoding being messed up, and the text containing \x type of characters), and i tried looking here for possible solutions. However, no matter what i do, there are still errors:

  • adding an additional argument 'utf-8' to the codecs.open yields a

    UnicodeDecodeError: 'ascii' codec can't decode byte 0x9a in position 24227: ordinal not in range(128) and the generated file is empty.

  • explicitly encoding to utf-8 with outFile.write(airportInfo.encode('utf-8')) before saving results in the same error

  • not being an expert, i tried decoding it, maybe i was just doing the exact opposite of what needed to be done, but i got an UnicodeDecodeError: 'utf8' codec can't decode byte 0x8a in position 8980: invalid start byte

The only the thing that worked (unsurprisingly), was to write the repr() of the string to file, but that is just not what i need, and also i can't make a nice .plist of a file full with escape symbols.

So please, please, can somebody help me? What am i missing?

If it helps, the type that gets saved in airportInfo is str (as in type(airportInfo) == str) and not u

网友答案:

You don't need re-encoding when your text is already unicode. Just write the text to a file. It should just work.

In [1]: t = 'äïöú'

In [2]: with open('test.txt', 'w') as f:
    f.write(t)
   ...:     

Additionally, you can make getAirportInfo simpler by using subprocess.check_output(). Also, mixed case names should only be used for classes, not functions. See PEP8.

import subprocess

def get_airport_info():
    args = ['/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport', 
            '--scan', '--xml']
    return subprocess.check_output(args)

airportInfo = get_airport_info()
with open('wifi-data.txt', 'w') as outf:
   outf.write(airportinfo)
网友答案:

I should have read this before my original answer: what's the difference between encode/decode? (python 2.x)

I always get confused between string and unicode conversion. On my mac, import sys; sys.getfilesystemencoding() suggests that subprocess returns a 'utf-8' string - so I don't know why airportInfo.encode('utf-8') fails. Is it possible to do airportInfo.encode('utf-8', 'ignore') and throw out the invalid characters?

Also - have you tried writing your file as wb: outFile = codecs.open('wifi-data.txt', 'wb') - doesn't 'w' open an ascii file?

Regarding your text editor - that may handle unicode characters differently. If it's reading a unicode text file as ascii, then the unicode characters may appear a garbled mess. You might try naming the file .xml, in which depending on your text editor may read it better as unicode.

相关阅读:
Top