问题描述:

I want to process one particular column and I want to generate word cloud. for example, consider column name to be "names" . I have 5 records under that column . They are "abc xyz" , "abc qpr xyz", "qpr xyz", "xyz" , "abc qpr" . So what I am expecting is like a tokenizer where I can get information as mentioned below: "abc" => 3 , "qpr" => 3, "xyz" => 4, "abc xyz" => 1 , "abc qpr xyz" => 1, "qpr xyz" => 2, "abc qpr" => 2. So I want to maintain frequencies for not a particular word but also for combination of words.

网友答案:

Suppose your CSV looks like this:

x,y,names,...
1,2,abc xyz,...
2,3,abc qpr xyz,...
3,4,qpr xyz,...
4,5,xyz,...
5,6,abc qpr,...

Here's one way to do it:

require 'csv'

CSV.foreach('data.csv', headers: true).with_object(Hash.new(0)) do |row, f|
  names = row['names']        # obtain names from csv row
  f[names] += 1               # increase counter for combined names
  names.split.each do |name|  # split names at whitespace
    f[name] += 1              # increase counter for single name
  end
end
#=> {"abc xyz"=>1, "abc"=>3, "xyz"=>5, "abc qpr xyz"=>1, "qpr"=>3, "qpr xyz"=>1, "abc qpr"=>1}

For customization, you might want to take a look at the documentation for the CSV library, there are tons of options available regarding the CSV format, header conversions etc.

http://ruby-doc.org/stdlib/libdoc/csv/rdoc/CSV.html

网友答案:

Assuming

  • str is the String where you loaded the whole file.
  • num is the column number that you wanted.

To have a Hash that counts all different combinations of names:

count = Hash.new(0)
str.split('\n').each do |line| 
    cols = line.split(',')
    count[cols[num]] += 1
end
return count

You instantiate a Hash object (count) that returns 0 by default, then add 1 for each key found.

相关阅读:
Top