Ruby hashes and arrays - questions

ruby
bestpractice

#1

Consider the following example:

=begin
Word frequencies study by Roy
=end

phrase = "some kind of magic of some kind"

txt = phrase.split(" ")

frequencies = Hash.new(0)

txt.each do |word|
    frequencies[word] += 1
end

txt = frequencies.sort
txt = txt.sort{|a, b| b[1] <=> a[1]} 

puts phrase

for i in 0 ... txt.size
  puts "#{txt[i][0]}, #{txt[i][1]}"
end

Anyone who has completed the Ruby track will recognize this as coming from the Histogram lesson. The question I have centers around this line:

txt = frequencies.sort

I discovered that one can use this to capture the internal array that Ruby instantiates.

 > txt
=> [["kind", 2], ["magic", 1], ["of", 2], ["some", 2]]

My question is simply, how legitimate is this? Should I be using some other method to create an array from a hash?


#2

I can't say I understand fully what you mean by legitimate. I'm sure you understand how the Hash works inside of an array, and that splitting the text is necessary to get the result. However what I don't understand is more of the question of creating an array from a hash.

I have attached a link below. The example under "Creating Arrays¶ ↑" that may be what you're looking for in terms of hash type arrays. Again, I'm not certain this is what you're looking for but I hope it helps.


Example:

  • Array.new(4) { Hash.new } #=> [{}, {}, {}, {}]



    Class: Array

#3

A hash can be sorted, but it can't be stored that way, as far as I know. That's why I went with an array sampled from the hash (shorter list with no repeats). It can be sorted and stored that way.


#4

Translate to => best practice


#5

Ah, I can't say. I don't have enough practice to know for sure. From what I can tell that's best practice, but my word doesn't mean much in this situation. Sorry for wasting your time.


#6

Not at all. This is an open discussion, so anything can be tossed in and reviewed.


#7

A more functional approach that should have been in the first post.


#8

Went back and reviewed that lesson. That helped to somewhat resolve the methodology. I've added some filtering and refined the metrics.

Who's On First?

The code:

def word_freq(phrase)
    txt = phrase.downcase
    txt = txt.gsub(/([!?.,;])/,"").split(" ")
    
    wco = txt.length
    
    frequencies = Hash.new(0)

    txt.each do |word|
        frequencies[word] += 1
    end

    txt = frequencies.sort{ |a,b| b[1] <=> a[1] }

    wcf = txt.dup
    wcf = wcf.length

    puts phrase

    txt.each { |w,f| puts w + " " + f.to_s }

    wcu = txt.count {|w,f| f == 1 }

    return [ wco, wcf, wcu ]
end

phrase = "some kind of magic of some kind"

wf = word_freq(phrase)
wco, wcf, wcu = wf

puts "=" * 28
puts "Word Count Original: #{wco}"
puts "Word Count Final   : #{wcf}"
puts "Word Count Uniques : #{wcu}"

wco = wco * 1.0
wcf = wcf * 1.0

uto = wcu / wco
utf = wcu / wcf

puts "Uniques / Original : %.4f" % uto
puts "Uniques / Final    : %.4f" % utf