i am learning python, and i am having troubles with saving the output of a small function to file. My python function is the following:
airport = '/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport'
arguments = [airport, "--scan" , "--xml"]
execute = subprocess.Popen(arguments, stdout=subprocess.PIPE)
out, err = execute.communicate()
airportInfo = getAirportInfo()
outFile = codecs.open('wifi-data.txt', 'w')
I guess that this would only work on a Mac, as it references some PrivateFrameworks.
Anyways, the code almost works as it should. The print statement prints a huge xml file, that i'd like to store in a file, for future processing. And here start the problems.
In the version above, the script executes without any errors, however, when i try to open the file, i get an error message, along the lines of error with utf-8 encoding. Ignoring this, opens the file, and most of the things look fine, except for a couple of things:
some SSID have non-ascii characters, like ä, ö and ü. When printing those on the screen, they are correctly displayed as \xc3\xa4 and so on. When I open the file it is displayed incorrectly, the usual random garbage.
some of the xml values look like these when printed on screen: Data("\x00\x11WLAN-0024FE056185\x01\x08\x82\x84\x8b\x96\x0c\ … x10D\x00\x01\x02") but like this when read from file: //8AAAAAAAAAAAAAAAAAAA==
I thought it could be an encoding error (seen as the Umlauts have problems, the error message says something about the utf-8 encoding being messed up, and the text containing \x type of characters), and i tried looking here for possible solutions. However, no matter what i do, there are still errors:
adding an additional argument 'utf-8' to the codecs.open yields a
UnicodeDecodeError: 'ascii' codec can't decode byte 0x9a in position 24227: ordinal not in range(128) and the generated file is empty.
explicitly encoding to utf-8 with outFile.write(airportInfo.encode('utf-8')) before saving results in the same error
not being an expert, i tried decoding it, maybe i was just doing the exact opposite of what needed to be done, but i got an UnicodeDecodeError: 'utf8' codec can't decode byte 0x8a in position 8980: invalid start byte
The only the thing that worked (unsurprisingly), was to write the repr() of the string to file, but that is just not what i need, and also i can't make a nice .plist of a file full with escape symbols.
So please, please, can somebody help me? What am i missing?
If it helps, the type that gets saved in airportInfo is str (as in type(airportInfo) == str) and not u
You don't need re-encoding when your text is already unicode. Just write the text to a file. It should just work.
In : t = 'äïöú' In : with open('test.txt', 'w') as f: f.write(t) ...:
Additionally, you can make
getAirportInfo simpler by using
subprocess.check_output(). Also, mixed case names should only be used for classes, not functions. See PEP8.
import subprocess def get_airport_info(): args = ['/System/Library/PrivateFrameworks/Apple80211.framework/Versions/Current/Resources/airport', '--scan', '--xml'] return subprocess.check_output(args) airportInfo = get_airport_info() with open('wifi-data.txt', 'w') as outf: outf.write(airportinfo)
I should have read this before my original answer: what's the difference between encode/decode? (python 2.x)
I always get confused between string and unicode conversion. On my mac, import sys; sys.getfilesystemencoding() suggests that subprocess returns a 'utf-8' string - so I don't know why airportInfo.encode('utf-8') fails. Is it possible to do airportInfo.encode('utf-8', 'ignore') and throw out the invalid characters?
Also - have you tried writing your file as wb: outFile = codecs.open('wifi-data.txt', 'wb') - doesn't 'w' open an ascii file?
Regarding your text editor - that may handle unicode characters differently. If it's reading a unicode text file as ascii, then the unicode characters may appear a garbled mess. You might try naming the file .xml, in which depending on your text editor may read it better as unicode.