问题描述:

I'm trying to decode this file that is in IBM437 into readable UTF I'm at the point where I think I've almost got it but I'm getting an ArgumentError where the string contains nul bytes, I'm aware of how to gsub out nul bytes using:

.gsub("\u0000", '') however I can't figure out where to gsub the bytes out.

Here's the source:

def gather_info

file = './lib/SETI_message.txt'

File.read(file).each_line do |gather|

packed = [gather].pack('b*')

ec = Encoding::Converter.new(packed, 'utf-8')

encoding_forced = packed.encode(ec)

File.open('packed.txt', 'a+'){ |s| s.puts(encoding_forced.gsub("\u0000", '')) }

end

end

gather_info

And here's the file

Can anyone tell me what I'm doing wrong here?

网友答案:

The following works for me :

file = File.read('SETI.txt')
packed = file.scan(/......../).map{|s| s.to_i(2)}.pack('U*')
File.write('packed.txt', packed)

Let's break file.scan(/......../).map{|s| s.to_i(2)}.pack('U*') down :

  1. file.scan(/......../)

Here we break the huge string of 0s and 1s (the file) into an array of strings containing 8 characters each. It looks like that : ['00001111', '11110000', ...].

  1. arr.map{|s| s.to_i(2)}

From step 1 we got an array of strings representing the different characters in binary notation. We can convert one of those strings (called s) by applying s.to_i(2) because the parameter '2' says to the method to_i to use base 2. So '00000011'.to_i(2) returns 3.

We apply this to all the characters by using map. So we now have an array that looks like [98, 82, 49, 39, ...].

  1. arr.pack('U*')

From step 2 we have an array of integers representing each a character. We can now use the pack method to transform our array of integers into a string. The parameter we use for pack is U to tell him that the integers are in fact UTF-8 characters.

相关阅读:
Top