Word and number frequency, finding uniques

Finding uniques implies returning a list of items that only occur once in the original data. For practice I took a crack at it

Code
# study to find_only_uniques by Roy (mtf)
# https://discuss.codecademy.com/t/remove-duplicates-error-list-index-out-of-range-code-posted/33007/6

f = """
"You are old, Father William," the young man said, \
"And your hair has become very white; \
And yet you incessantly stand on your head— \
Do you think, at your age, it is right?" \
 \
"In my youth," Father William replied to his son, \
"I feared it might injure the brain; \
But now that I'm perfectly sure I have none, \
Why, I do it again and again." \
 \
"You are old," said the youth, "As I mentioned before, \
And have grown most uncommonly fat; \
Yet you turned a back-somersault in at the door— \
Pray, what is the reason of that?" \
 \
"In my youth," said the sage, as he shook his grey locks, \
"I kept all my limbs very supple \
By the use of this ointment—one shilling a box— \
Allow me to sell you a couple?" \
 \
"You are old," said the youth, "And your jaws are too weak \
For anything tougher than suet; \
Yet you finished the goose, with the bones and the beak— \
Pray, how did you manage to do it?" \
 \
"In my youth," said his father, "I took to the law, \
And argued each case with my wife; \
And the muscular strength which it gave to my jaw, \
Has lasted the rest of my life." \
 \
"You are old," said the youth, "one would hardly suppose \
That your eye was as steady as ever; \
Yet you balanced an eel on the end of your nose— \
What made you so awfully clever?" \
 \
"I have answered three questions, and that is enough," \
Said his father; "don't give yourself airs! \
Do you think I can listen all day to such stuff? \
Be off, or I'll kick you down stairs!" \
"""

def word_hash(x):
    y = x.split(' ') if isinstance(x, str) or isinstance(x, unicode) else x[:]
    hist = {}
    for k in set(y):
        hist[k] = 0
    return hist
    
def word_freq(x):
    y = x.split(' ') if isinstance(x, str) or isinstance(x, unicode) else x[:]
    freq_hash = word_hash(y)
    for word in y:
        freq_hash[word] += 1
    return freq_hash
    
def return_uniques(x):
    uniques = []
    for k in x:
        if x[k] == 1:
            uniques.append(k)
    return uniques

print return_uniques(word_freq(f))

Needs more refining and criticism…

f = """
"You are old, Father William," the young man said, \
"And your hair has become very white; \
And yet you incessantly stand on your head— \
Do you think, at your age, it is right?" \
 \
"In my youth," Father William replied to his son, \
"I feared it might injure the brain; \
But now that I'm perfectly sure I have none, \
Why, I do it again and again." \
 \
"You are old," said the youth, "As I mentioned before, \
And have grown most uncommonly fat; \
Yet you turned a back-somersault in at the door— \
Pray, what is the reason of that?" \
 \
"In my youth," said the sage, as he shook his grey locks, \
"I kept all my limbs very supple \
By the use of this ointment—one shilling a box— \
Allow me to sell you a couple?" \
 \
"You are old," said the youth, "And your jaws are too weak \
For anything tougher than suet; \
Yet you finished the goose, with the bones and the beak— \
Pray, how did you manage to do it?" \
 \
"In my youth," said his father, "I took to the law, \
And argued each case with my wife; \
And the muscular strength which it gave to my jaw, \
Has lasted the rest of my life." \
 \
"You are old," said the youth, "one would hardly suppose \
That your eye was as steady as ever; \
Yet you balanced an eel on the end of your nose— \
What made you so awfully clever?" \
 \
"I have answered three questions, and that is enough," \
Said his father; "don't give yourself airs! \
Do you think I can listen all day to such stuff? \
Be off, or I'll kick you down stairs!" \
"""

def word_dict(x):
    y = x.split(' ') if isinstance(x, str) or isinstance(x, unicode) else x[:]
    hist = {}
    for k in set(y):
        hist[k] = 0
    return hist
    
def word_freq(x):
    y = x.split(' ') if isinstance(x, str) or isinstance(x, unicode) else x[:]
    freq_dict = word_dict(y)
    for word in y:
        freq_dict[word] += 1
    return freq_dict
    
def return_uniques(x):
    uniques = []
    for k in x:
        if x[k] == 1:
            uniques.append(k)
    return uniques

print return_uniques(word_freq(f))

May have to move this to Corner Bar since it is drifting off topic. The above code has been refined and refactored, slightly:


def word_freq(x):
    y = x.split(' ') if isinstance(x, str) or isinstance(x, unicode) else x[:]
    freq_dict = { k:0 for k in y }
    for k in y:
        freq_dict[k] += 1
    return freq_dict
    
def return_uniques(x):
    uniques = []
    for k in x:
        if x[k] == 1:
            uniques.append(k)
    return uniques

#print return_uniques(word_freq(f))   # requires f = heredoc
#print word_freq(f)                   # requires f = heredoc
print return_uniques(word_freq([3,6,9,5,7,8,9,2,3,4,7,6,0,2,3,5,8,9]))
# [ 0, 4 ]

After some struggle, I finally got the punctuation removed from most of the keys.

Code
# study to find_only_uniques by Roy (mtf)
# https://discuss.codecademy.com/t/remove-duplicates-error-list-index-out-of-range-code-posted/33007/6

import re, string

f = """
"You are old, Father William," the young man said, \
"And your hair has become very white; \
And yet you incessantly stand on your head— \
Do you think, at your age, it is right?" \
 \
"In my youth," Father William replied to his son, \
"I feared it might injure the brain; \
But now that I'm perfectly sure I have none, \
Why, I do it again and again." \
 \
"You are old," said the youth, "As I mentioned before, \
And have grown most uncommonly fat; \
Yet you turned a back-somersault in at the door— \
Pray, what is the reason of that?" \
 \
"In my youth," said the sage, as he shook his grey locks, \
"I kept all my limbs very supple \
By the use of this ointment—one shilling a box— \
Allow me to sell you a couple?" \
 \
"You are old," said the youth, "And your jaws are too weak \
For anything tougher than suet; \
Yet you finished the goose, with the bones and the beak— \
Pray, how did you manage to do it?" \
 \
"In my youth," said his father, "I took to the law, \
And argued each case with my wife; \
And the muscular strength which it gave to my jaw, \
Has lasted the rest of my life." \
 \
"You are old," said the youth, "one would hardly suppose \
That your eye was as steady as ever; \
Yet you balanced an eel on the end of your nose— \
What made you so awfully clever?" \
 \
"I have answered three questions, and that is enough," \
Said his father; "don't give yourself airs! \
Do you think I can listen all day to such stuff? \
Be off, or I'll kick you down stairs!" \
"""

regex = re.compile('[%s]' % re.escape(string.punctuation))

def strip(s):
    return regex.sub('', s)
    
def word_freq(x):
    y = x.split(' ') if isinstance(x, str) or isinstance(x, unicode) else x[:]
    for k in range(len(y)):
        y[k] = strip(y[k])
    freq_dict = { k:0 for k in y }
    for k in y:
        freq_dict[k] += 1
    return freq_dict

def num_freq(x):
    y = x[:]
    freq_dict = { k:0 for k in y }
    for k in y:
        freq_dict[k] += 1
    return freq_dict

def return_uniques(x):
    uniques = []
    for k in x:
        if x[k] == 1:
            uniques.append(k)
    return uniques
    
def show_list(x):
    for v in x:
        print v
        
def show_dict(x):
    for k, v in sorted(x.items()):
        print '%s : %d' % (k,v)

def show_freq(x):
    u = [ (v,k) for k,v in x.iteritems() ]
    u.sort(reverse=True)
    for v, k in u:
        print '%s : %d' % (k,v)
        
#print return_uniques(word_freq(f))
#print word_freq(f)

#print return_uniques(num_freq([3,6,9,5,7,8,9,2,3,4,7,6,0,2,3,5,8,9]))
show_dict(num_freq([3,6,9,5,7,8,9,2,3,4,7,6,0,2,3,5,8,9]))
show_freq(num_freq([3,6,9,5,7,8,9,2,3,4,7,6,0,2,3,5,8,9]))
show_list(return_uniques(num_freq([3,6,9,5,7,8,9,2,3,4,7,6,0,2,3,5,8,9])))

show_freq(word_freq(f))

There are now two functions, one for number frequency and one for word frequency. I had to break them up so I could concentrate on string data in the filtering process (which is still not perfect).

import re, string

f = """
 above heredoc or any block of text
"""

regex = re.compile('[%s]' % re.escape(string.punctuation))

def strip(s):
    return regex.sub('', s)
    
def word_freq(x):
    y = x.split(' ') if isinstance(x, str) or isinstance(x, unicode) else x[:]
    for k in range(len(y)):
        y[k] = strip(y[k])
    freq_dict = { k:0 for k in y }
    for k in y:
        freq_dict[k] += 1
    return freq_dict

def num_freq(x):
    y = x[:]
    freq_dict = { k:0 for k in y }
    for k in y:
        freq_dict[k] += 1
    return freq_dict

def return_uniques(x):
    uniques = []
    for k in x:
        if x[k] == 1:
            uniques.append(k)
    return uniques
    
def show_list(x):
    for v in x:
        print v
        
def show_dict(x):
    for k, v in sorted(x.items()):
        print '%s : %d' % (k,v)

def show_freq(x):
    u = [ (v,k) for k,v in x.iteritems() ]
    u.sort(reverse=True)
    for v, k in u:
        print '%s : %d' % (k,v)
        
#print return_uniques(word_freq(f))
#print word_freq(f)

#print return_uniques(num_freq([3,6,9,5,7,8,9,2,3,4,7,6,0,2,3,5,8,9]))
show_dict(num_freq([3,6,9,5,7,8,9,2,3,4,7,6,0,2,3,5,8,9]))
show_freq(num_freq([3,6,9,5,7,8,9,2,3,4,7,6,0,2,3,5,8,9]))
show_list(return_uniques(num_freq([3,6,9,5,7,8,9,2,3,4,7,6,0,2,3,5,8,9])))

show_freq(word_freq(f))
1 Like