Counting Word Frequency in lists with grouping


#1

I have a list that contains text and title of section.

Example:

lists = [['earth total surface area land Jellicle', 'first_section']
,['university earth surface pleasant Jellicle', 'first_section']
,['first university east france north', 'second_section']
,['first north university', 'second_section']]

I would like to get top of 3 most common words in text groups by first_section and second_section

result:

[['first_section'], ['earth', 'surface', 'jellicle']
['second_section'],['first' , 'university', 'north']]

I’m new to this and I couldn’t find anything on Google that could help me.


#2

How sophisticated does the program needs to be? For example here:

lists = [['earth total surface area land Jellicle', 'first_section']

can we assume that the second element in the list is the section and its always valid?

Can we use imports? And do you want to?

Is this an assignment?

Please answer those questions, then i can help you further


#3
  1. The program can create any complexity
  2. The second element of the list is always a section and it is always valid
  3. You can use the import
  4. This is part of my assignment in my organization.

#4

python has collections:

https://docs.python.org/2/library/collections.html

which includes Counter, which you can decide to use. Which really makes things a lot easier

i would create a dictionary as intermediate, then loop over lists. Lets say the name of the dictionary d (i am lazy, not a good variable name, come up with something better please)

the dictionary will have key/value pair of: section/counter (section will be the key, Counter the value)

if section key doesn’t exist yet, add it to dictionary and give it Counter value of all the words in the list.

else, add Counter to already existing keys. yes, you can just do Counter + Counter to merge the two counters

i am already doing too much of the logic for you, please figure this out, if you need more help, post an updated version of your code. But its your assignment, i am only helping

Counter even supports most_common, wow, that makes this a piece of cake/walk in the park