Collects miscellaneous descriptive statistic methods that may be useful. These are generally not hooked up to the primary processing stream, and need to be called on an ad-hoc basis.
Public class methods
Count the number of times a given token s occurs in text.
# File cass/lib/cass/stats.rb, line 34 def self.string_tokens(text, s) text.scan(/#{s}/).size end
Takes a string as input and prints out a list of all words encountered, sorted by their frequency count (in descending order). Words are separated by whitespace; no additional processing will be performed, so if you don’t want special characters to define words, you need to preprocess the string before you call this method. Arguments:
- text: the string to count token occurrences in.
- stopwords: optional location of stopword file. Words in file will be excluded from count.
- save: the filename to save the results to. If left nil, will print to screen.
# File cass/lib/cass/stats.rb, line 18 def self.word_count(text, stopwords=nil, save=nil) sw = {} text = text.join(" ") if text.class == Array File.new(stopwords).readlines.each { |l| sw[l.strip] = 1 } if !stopwords.nil? words = text.split(/\s+/) counts = Hash.new(0) words.each { |w| counts[w] += 1 if !sw.key?(w) } counts = counts.sort { |a,b| b[1] <=> a[1] }.each { |l| "#{l[0]}: #{l[1]}" } if save.nil? puts counts else File.new(save, 'w').puts counts end end