Boggled...a little


Take a look at this please. Can someone explain to me why, after I loop through pantry, do I then have to check if the item is in pantry_counts. As pantry_counts is an empty dictionary.

pantry = ["apple", "orange", "grape", "apple", "orange", "apple", "tomato", "potato", "grape"]
pantry_counts = {}
for item in pantry:
    if item in pantry_counts:
        pantry_counts[item] += 1
        pantry_counts[item] = 1

Sorry if I sound stupid but its confusing me a little.




Your code is constructing a histogram or frequency table. If the item is not in the dictionary, then create the key and set its value to 1, otherwise add 1 to the value.


Thanks a lot MTF. I understand it now :slight_smile:


Hey MTF,

Can you possibly tell me if the = False on this function will only apply to the third argument (header_row) please.

def feature_counter(input_lst,index,header_row = False):

or is it applied to all arguments, also does it have to go at the end or can it be anywhere?


Yes, only the one it is directly assigned to. That is a default value in case the third argument is left out of the call argument.

def feature_counter(input_lst, index, header_row = False):
    print (header_row)

feature_counter([],1)    # False


awesome thanks, could i also do this:

          def feature_counter(input_lst,header_row = False,index,input_str)


As I understand it, we can only have one default parameter (needs checking) and it must appear last in the parameter list.


Ok man, it does look a bit odd I have to say, lol…


Got there in the end. :slight_smile:

def feature_counter(input_lst,index, input_str, header_row = False):
num_elt = 0
if header_row == True:
    input_lst = input_lst[1:len(input_lst)]
for indx in input_lst:
    if indx[index] == input_str:
        num_elt += 1
return num_elt

num_of_us_movies = feature_counter(movie_data, 6, "USA", True)

Phew…took a little while to figure that one out :confused:


This can be simplified to,


When we leave out the second index the slice takes everything to the right of and including the first index.

Not sure what your program is doing, or if it will work without raising some errors. Can you give it some more context?


It basically removes any headers within a data set (index 0) and returns a total value of any requested values at specific indexes matching specific strings…Providing you know whether or not the data set contains a header or not.


What does the data set look like?

[['movie_title', 'director_name', 'color', 'duration', 'actor_1_name', 'language', 'country', 'title_year'], ['Avatar', 
'James Cameron', 'Color', '178', 'CCH Pounder', 'English', 'USA', '2009'], ["Pirates of the Caribbean: At World's 
End", 'Gore Verbinski', 'Color', '169', 'Johnny Depp', 'English', 'USA', '2007'], ['Spectre', 'Sam Mendes', 'Color', 
'148', 'Christoph Waltz', 'English', 'UK', '2015'], ['The Dark Knight Rises', 'Christopher Nolan', 'Color', '164', 
'Tom Hardy', 'English', 'USA', '2012'], ['Star Wars: Episode VII - The Force Awakens', 'JJ Abrams', 'Color', '136', 
'Harrison Ford', 'English', 'USA', '2015'].........

4933 altogether…


I was creating a function to grab statistical data from the data set. I ended up with:

def feature_counter(input_lst,index, input_str, header_row = False):
    num_elt = 0
    if header_row == True:
        input_lst = input_lst[1:len(input_lst)]
    for each in input_lst:
        if each[index] == input_str:
            num_elt = num_elt + 1
    return num_elt

def summary_statistics(input_lst):
    num_japan_films = feature_counter(input_lst,6, "Japan", True)
    num_color_films = feature_counter(input_lst,2, "Color", True)
    num_films_in_english = feature_counter(input_lst,5, "English", True)
    summary_dict = {"japan_films" : num_japan_films, "color_films" : num_color_films, "films_in_english" : 
    return summary_dict

summary = summary_statistics(movie_data)

I works well :slight_smile: could probably compress it but I dont know how yet.


Feel free to leave it how it is (obviously), but:

I’d get rid of the header row from your list of films.
It’s a list of films. That’s not a film.

Additionally, those numbers 6 2 5 are really mysterious and require reading somewhere else in the code to understand. Better if each film is a dictionary so that you can get its country instead of seventh value

Which lets you remove a parameter and two lines in feature_counter. All it does now is filter and count (super common operations) which might be more clearly expressed directly in those operations, for example:

sum(film['country'] == 'Japan' for film in films)

(True behaves like 1, films that meet the condition would therefore add 1 to the sum. (and False would add 0))


Static data doesn’t lend itself to statistics unless we can link something dynamic to it.


That’s awesome, thank you.


Perhaps I meant to say, grab data from the data set :joy::joy::joy:


This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.