Dictionary Project: Converting Dictionary to sort by years

Ok I can not for the life of me understand how to reorder an existing dictionary by one of the values in each sub dictionary. The instructions say:

In addition to organizing the hurricanes in a dictionary with names as the key, you want to be able to organize the hurricanes by year.

Write a function that converts the current dictionary of hurricanes to a new dictionary, where the keys are years and the values are lists containing a dictionary for each hurricane that occurred in that year.

For example, the key 1932 would yield the value: [{'Name': 'Bahamas', 'Month': 'September', 'Year': 1932, 'Max Sustained Wind': 160, 'Areas Affected': ['The Bahamas', 'Northeastern United States'], 'Damage': 'Damages not recorded', 'Deaths': 16}, {'Name': 'Cuba II', 'Month': 'November', 'Year': 1932, 'Max Sustained Wind': 175, 'Areas Affected': ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], 'Damage': 40000000.0, 'Deaths': 3103}] .

Test your function on your hurricane dictionary.

I’ve tried several things without any success. Here is my code up to that challenge:

# names of hurricanes
names = ['Cuba I', 'San Felipe II Okeechobee', 'Bahamas', 'Cuba II', 'CubaBrownsville', 'Tampico', 'Labor Day', 'New England', 'Carol', 'Janet', 'Carla', 'Hattie', 'Beulah', 'Camille', 'Edith', 'Anita', 'David', 'Allen', 'Gilbert', 'Hugo', 'Andrew', 'Mitch', 'Isabel', 'Ivan', 'Emily', 'Katrina', 'Rita', 'Wilma', 'Dean', 'Felix', 'Matthew', 'Irma', 'Maria', 'Michael']

# months of hurricanes
months = ['October', 'September', 'September', 'November', 'August', 'September', 'September', 'September', 'September', 'September', 'September', 'October', 'September', 'August', 'September', 'September', 'August', 'August', 'September', 'September', 'August', 'October', 'September', 'September', 'July', 'August', 'September', 'October', 'August', 'September', 'October', 'September', 'September', 'October']

# years of hurricanes
years = [1924, 1928, 1932, 1932, 1933, 1933, 1935, 1938, 1953, 1955, 1961, 1961, 1967, 1969, 1971, 1977, 1979, 1980, 1988, 1989, 1992, 1998, 2003, 2004, 2005, 2005, 2005, 2005, 2007, 2007, 2016, 2017, 2017, 2018]

# maximum sustained winds (mph) of hurricanes
max_sustained_winds = [165, 160, 160, 175, 160, 160, 185, 160, 160, 175, 175, 160, 160, 175, 160, 175, 175, 190, 185, 160, 175, 180, 165, 165, 160, 175, 180, 185, 175, 175, 165, 180, 175, 160]

# areas affected by each hurricane
areas_affected = [['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], ['Lesser Antilles', 'The Bahamas', 'United States East Coast', 'Atlantic Canada'], ['The Bahamas', 'Northeastern United States'], ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], ['The Bahamas', 'Cuba', 'Florida', 'Texas', 'Tamaulipas'], ['Jamaica', 'Yucatn Peninsula'], ['The Bahamas', 'Florida', 'Georgia', 'The Carolinas', 'Virginia'], ['Southeastern United States', 'Northeastern United States', 'Southwestern Quebec'], ['Bermuda', 'New England', 'Atlantic Canada'], ['Lesser Antilles', 'Central America'], ['Texas', 'Louisiana', 'Midwestern United States'], ['Central America'], ['The Caribbean', 'Mexico', 'Texas'], ['Cuba', 'United States Gulf Coast'], ['The Caribbean', 'Central America', 'Mexico', 'United States Gulf Coast'], ['Mexico'], ['The Caribbean', 'United States East coast'], ['The Caribbean', 'Yucatn Peninsula', 'Mexico', 'South Texas'], ['Jamaica', 'Venezuela', 'Central America', 'Hispaniola', 'Mexico'], ['The Caribbean', 'United States East Coast'], ['The Bahamas', 'Florida', 'United States Gulf Coast'], ['Central America', 'Yucatn Peninsula', 'South Florida'], ['Greater Antilles', 'Bahamas', 'Eastern United States', 'Ontario'], ['The Caribbean', 'Venezuela', 'United States Gulf Coast'], ['Windward Islands', 'Jamaica', 'Mexico', 'Texas'], ['Bahamas', 'United States Gulf Coast'], ['Cuba', 'United States Gulf Coast'], ['Greater Antilles', 'Central America', 'Florida'], ['The Caribbean', 'Central America'], ['Nicaragua', 'Honduras'], ['Antilles', 'Venezuela', 'Colombia', 'United States East Coast', 'Atlantic Canada'], ['Cape Verde', 'The Caribbean', 'British Virgin Islands', 'U.S. Virgin Islands', 'Cuba', 'Florida'], ['Lesser Antilles', 'Virgin Islands', 'Puerto Rico', 'Dominican Republic', 'Turks and Caicos Islands'], ['Central America', 'United States Gulf Coast (especially Florida Panhandle)']]

# damages (USD($)) of hurricanes
damages = ['Damages not recorded', '100M', 'Damages not recorded', '40M', '27.9M', '5M', 'Damages not recorded', '306M', '2M', '65.8M', '326M', '60.3M', '208M', '1.42B', '25.4M', 'Damages not recorded', '1.54B', '1.24B', '7.1B', '10B', '26.5B', '6.2B', '5.37B', '23.3B', '1.01B', '125B', '12B', '29.4B', '1.76B', '720M', '15.1B', '64.8B', '91.6B', '25.1B']

# deaths for each hurricane
deaths = [90,4000,16,3103,179,184,408,682,5,1023,43,319,688,259,37,11,2068,269,318,107,65,19325,51,124,17,1836,125,87,45,133,603,138,3057,74]

# function for converting damages strings to floats:
updated_damage_values = []

def damage_conversion(list_of_damages):
    for damage_value in list_of_damages:
        if damage_value == 'Damages not recorded':
            updated_damage_values.append('Damages not recorded')
        elif damage_value[-1] == 'B':
            billion_conversion = float(damage_value[:-1]) * 1000000000
            # test conversion
            # print(billion_conversion)
            updated_damage_values.append(billion_conversion)
        elif damage_value[-1] == 'M':
            million_conversion = float(damage_value[:-1]) * 1000000
            # test conversion
            # print(million_conversion)
            updated_damage_values.append(million_conversion)
    return updated_damage_values
        

print(damage_conversion(damages))


# hurricane dictionary function:
def create_hurricane_dictionary(names_list, months_list, years_list, max_sustained_winds_list, areas_affected_list, updated_damage_values_list, deaths_list):
    zipped_hurricane_info = zip(names, months, years, max_sustained_winds, areas_affected, updated_damage_values, deaths)
    # list comprehension, \ for readibility 
    hurricane_dictionary = [{name: {'Name': name, 'Month': month, 'Year': year, 'Max Sustained Winds': max_wind,\
    'Areas Affected': areas, 'Damages': damage, 'Deaths': death_count}}\
    for (name, month, year, max_wind, areas, damage, death_count) in zipped_hurricane_info]
    return hurricane_dictionary


hurricane_dictionary = create_hurricane_dictionary(names, months, years, max_sustained_winds, areas_affected, updated_damage_values, deaths)

print(hurricane_dictionary)

# write your construct hurricane by year dictionary function here:








# write your count affected areas function here:







# write your find most affected area function here:







# write your greatest number of deaths function here:







# write your catgeorize by mortality function here:







# write your greatest damage function here:







# write your catgeorize by damage function here:

1 Like

The fact that we have a ‘dictionary with names as the key’ shows that we can build a dictionary from the same data source, using the same structure.

The earlier response was vague, I admit. It did not take into account the only unique piece of data we have is the name given to each storm. We’ve got nested and overlapping data all over the place. Given that, we need somewhere to start from, and I chose a list of dictionaries made from the seven lists we are given. Witness, as we transform that into a dictionary of storms,

db = zip(names, months, years, max_sustained_winds, areas_affected, damages, deaths)

dlu = []    # dictionary lookup
nlu = {}    # name lookup

kys = ['name', 'month', 'year', 'max_sustained_winds', 'areas_affected', 'damages', 'deaths']

try:
  while db:
    dlu.append(dict(zip(kys, next(db))))    # consume db zip object
except StopIteration:
  print (len(dlu))

for x in dlu:
  nlu[x['name']] = x
  
print (nlu['Michael'])
34
{'name': 'Michael', 'month': 'October', 'year': 2018, 'max_sustained_winds': 160, 'areas_affected': ['Central America', 'United States Gulf Coast (especially Florida Panhandle)'], 'damages': '25.1B', 'deaths': 74}
>>> 

Of particular, and important note is that the data object has only been given a key. The original dictionary constructed from the data lists is unchanged.

This is crucial once we begin to assemble lookups from month or year data. Those tables will not have a lot of rows, but the rows will be populated by a lot of storm dictionaries, meaning they will be arrays. Keep this closely in mind when creating other representations, and don’t mutate your initial dictionaries. Can’t stress that enough. We want to reuse them.

1 Like

From now on we should pretend that the original seven lists no longer exist. All our data is in the the DLU, and likewise, with named keys, in the NLU. It’s too simple to rely upon those beginning lists. The challenge here is to work with the data objects we have assembled, and forget the rest.

Now that we have structured the data, we need to be able to query it. What storms happened in 2001? Which storm had the most damage? Which storm had the highest mortality? Which storm had the highest winds? You get the picture.

1 Like

Ok so the I guess where I’m confused is that I thought I already have a dictionary with this code:

def create_hurricane_dictionary(names_list, months_list, years_list, max_sustained_winds_list, areas_affected_list, updated_damage_values_list, deaths_list):
    zipped_hurricane_info = zip(names, months, years, max_sustained_winds, areas_affected, updated_damage_values, deaths)
    # list comprehension, \ for readibility 
    hurricane_dictionary = [{name: {'Name': name, 'Month': month, 'Year': year, 'Max Sustained Winds': max_wind,\
    'Areas Affected': areas, 'Damages': damage, 'Deaths': death_count}}\
    for (name, month, year, max_wind, areas, damage, death_count) in zipped_hurricane_info]
    return hurricane_dictionary


hurricane_dictionary = create_hurricane_dictionary(names, months, years, max_sustained_winds, areas_affected, updated_damage_values, deaths)

print(hurricane_dictionary)

But I guess what I really have is a list because of the square brackets in this line:

hurricane_dictionary = [{name: {'Name': name, 'Month': month, 'Year': year, 'Max Sustained Winds': max_wind,\
    'Areas Affected': areas, 'Damages': damage, 'Deaths': death_count}}\
    for (name, month, year, max_wind, areas, damage, death_count) in zipped_hurricane_info]

but if I change the square brackets to parentheses, that gives me a dictionary, correct? I tried changing that above line of code in my function to this:

hurricane_dictionary = ({name: {'Name': name, 'Month': month, 'Year': year, 'Max Sustained Winds': max_wind,\
    'Areas Affected': areas, 'Damages': damage, 'Deaths': death_count}}\
    for (name, month, year, max_wind, areas, damage, death_count) in zipped_hurricane_info)

and then printing it to see if I indeed have a dictionary. But I still receive a list. Maybe I need to go back through the dictionary section again but I thought there was a way to make a dictionary through with zip lists and list comprehension.

But if I go with your way, can I print the entire dictionary sorted by names? Maybe there’s not really a point to do that.
Also lets say I do that and then I want to sort by the year wouldn’t that method cause an error when sorting by year because some years have two hurricanes?

Careful. A lot might depend on the current configuration. We cannot just go changing things. Always go back to the foundations of how the structure comes into being; how is it constructed?

In our example we constructed a data array (the list of dictionaries) for only one purpose… To store the static data. It is the repository of our data ordered by event. As mentioned earlier, we can construct keyed lookup tables (dict) centered on any of the keys we have.

{
  'Cuba': {
    ...
  },
  ...,
  'Michael': {
    'name': 'Michael',
    'month': 'October', 
    'year': 2018, 
    'max_sustained_winds': 160,
    'areas_affected': [
      'Central America',
      'United States Gulf Coast (especially Florida Panhandle)'
    ],
    'damages': '25.1B',
    'deaths': 74
  }
}

That’s a table with name keys. There are six others we can use, paying close attention to the data structure within our main repository. Recall that it is not our intention to mutate the repository, only create data models from it.


Slight contradiction…

While it is not our intention to mutate the repository, it is preferable to mutating the source data (the seven lists). In the case of a meltdown we can always refer to that data and trust it.

So in this case, better we mutate the repository. Case in point, '25.1B' becoming 25100000000. We’ve only interpreted the initial data and given it numeric form. That would be a justifiable mutation.

Let’s consider,

mag = {'M': 1000000, 'B': 1000000000}

for name, value in nlu.items():
  d = value['damages']
  if d == 'Damages not recorded': continue
  d = int(float(d[:-1]) * mag[d[-1]])
  nlu[name]['damages'] = d
  
print (nlu['Michael'])
{'name': 'Michael', ..., 'damages': 25100000000, 'deaths': 74}

Accessing the data is a piece of cake…

print ([value['damages'] for name, value in nlu.items()])

Furthermore, analyze it,

print (max([(name, value['damages']) for name, value in nlu.items() if not type(value['damages']) == str]))
('Wilma', 29400000000)
1 Like

Thank you for all your help mtf I really appreciate it! It’s helped me to see the other ways of doing it and that I also have a lot to work on. This is what I came up with.
There’s a lot of comments it helped my to visualize it. I think this way of creating a dictionary was easier for me to understand so I went with it.

# function for converting damages strings to floats:
updated_damages = []

def damage_conversion(list_of_damages):
    for damage_value in list_of_damages:
        if damage_value == 'Damages not recorded':
            updated_damages.append('Damages not recorded')
        elif damage_value[-1] == 'B':
            billion_conversion = float(damage_value[:-1]) * 1000000000
            updated_damages.append(billion_conversion)
        elif damage_value[-1] == 'M':
            million_conversion = float(damage_value[:-1]) * 1000000
            updated_damages.append(million_conversion)
    return updated_damages
        

print(damage_conversion(damages), '\n')


# hurricane dictionary function:
def create_dictionary():
    hurricanes = {}                                 # creates an empty dictionary, in the scope of the create_dictionary function
    for i in range(len(names)):                     # for each index in the range of the list names
        data = {}                                   # create an empty dictionary for the data
        data['Name'] = names[i]                     # in the dictionary data set 'Name' = to the name at the current index i (in this case) in the range
        data['Month'] = months[i]                   # in the dictionary data set 'Month' = to the month at the current index in the range
        data['Year'] = years[i]                     # in the dictionary data set 'Year' = to the year at the current index in the range
        data['Max Wind'] = max_sustained_winds[i]   # in the dictionary data set 'Max Wind' = to the max sustained wind at the current index in the range
        data['Areas Affected'] = areas_affected[i]  # in the dictionary data set 'Areas Affected' = to the area affected at the current index in the range
        data['Damages'] = updated_damages[i]        # in the dictionary data set 'Damages' = to the damages at the current index in the range
        data['Deaths'] = deaths[i]                  # in the dictionary data set 'Deaths' = to the number of deaths at the current index in the range
        hurricanes[names[i]] = data                 # in the empty dictionary hurricanes set each name at the current index in the range of the list of names = to the new dictionary called data
    return hurricanes                               # return the dictionary hurricanes

hurricanes = create_dictionary()                    # creates a dictionary called hurricanes by calling the function create_dictionary

print(hurricanes,'\n')                                   # prints the dictionary hurricanes

# sort hurricanes by year
hurricanes_year = {}
for cane in hurricanes:
    current_cane = hurricanes[cane]
    if current_cane['Year'] not in hurricanes_year:
        hurricanes_year[current_cane['Year']] = current_cane
    elif current_cane['Year'] in hurricanes_year:
        multiple_cane = hurricanes_year[current_cane['Year']]
        hurricanes_year[current_cane['Year']] = [multiple_cane, current_cane]

I do have one other question in this project. Later on in the project I wrote this code:

def highest_death_count(dic):
    num_highest_deaths = 0
    highest_death_cane_name = ''
    for cane in dic:
        if dic[cane]['Deaths'] > num_highest_deaths:
            num_highest_deaths = dic[cane]['Deaths']
            highest_death_cane_name = dic[cane]['Name']
    return highest_death_cane_name, num_highest_deaths

greatest_death_count = highest_death_count(hurricanes)
print(greatest_death_count)
print('The hurricane with the highest death count is ' +  + ' with ')

for the return in the function I return 2 things highest_death_cane_name and num_highest_deaths. I was trying to print a string the one that says 'The hurricane with the highest death count is ’ is there a way to make highest_death_cane_name and num_highest_deaths global variables so that I can use them in a string like that at specific parts?

1 Like

Be aware that correctness bias can lead us down some strange paths.

maxk, maxv = None, 0
for n, v in nlu.items():
  if v['deaths'] > maxv: 
    maxk = n
    maxv = v['deaths']
print (nlu[maxk])
{'name': 'Mitch', 'month': 'October', 'year': 1998, 'max_sustained_winds': 180, 'areas_affected': ['Central America', 'Yucatn Peninsula', 'South Florida'], 'damages': 6200000000, 'deaths': 19325}
print (f"The hurricane with the greatest human loss was, '{maxk}', in {nlu[maxk]['month']}, {nlu[maxk]['year']}, recording a death toll of, {nlu[maxk]['deaths']}.")
The hurricane with the greatest human loss was, 'Mitch', in October, 1998, recording a death toll of, 19325.

Once we reduce everything down to the brass tacks, we get,

maxk, maxv = None, 0
for n, v in nlu.items():
  if v['deaths'] > maxv: 
    maxk = n
    maxv = v['deaths']
x = nlu[maxk]
print (f"The hurricane with the greatest human loss was, '{x['name']}', in {x['month']}, {x['year']}, recording a death toll of, {x['deaths']}.")
1 Like

What do you mean by “correctness bias”

1 Like

It’s the tunnel vision we get when we are so certain we have the correct approach we are blind to possible alternatives or improvements. It’s a common trap we all get caught up in.

To answer your question, take a look at the return line of highest_death_count. Notice it is a tuple (gets returned as one)? Both variables are passed back up to calling scope so they are both global, albeit packaged.

greatest_death_count = highest_death_count(hurricanes)

This can be unpacked in the assignment. We know there are two returned values so we set up two variables to receive them…

cane_name, death_count = highest_death_count(hurricanes)

Now both values are exposed and can be interpolated into the output string.

2 Likes

Ooh I think I see. Do you mean there is a simpler way? Maybe my code is a bit too redundant? Like maybe I should be figuring out the highest death first in the function then be comparing it to the hurricanes?

Also on the part with
`cane_name, death_count = highest_death_count(hurricanes)’

Doesn’t this just set both cane_name and death_count to the same thing?

1 Like

The way your code runs is very much like mine, only a bit more verbose. I’m rather fond of symbols if one can easily determine what they mean or represent. It makes for more compact code that is easier to read, follow and understand. That’s me. Lots of people prefer verbose variable names for one reason or another. To me that just makes it harder to get a clear picture of the moving parts of the code.

Can it be simpler? I don’t see how. The only thing you missed was the packaged return, which we see now is easy to accommodate, once we understand what unpacking is.

c = (6, 7)       #  package
a, b = c         #  unpacking into respective variables
print (a * b)    #  42

As we see above, no. There are two values in the return package. Unpacking exposes them separately. It is akin to x[0] and x[1] if we are accessing the tuple by index.

2 Likes

Oh wow, that is very cool! I just tried it. Simplified my code a bit too. It’s a bit easier to read, I think. I’ll go through the rest of my project and see if I can come up with to maybe make it better. Not going long with the variables helps a lot I think.

def highest_death_count(dic):
    name, deaths = '', 0
    for cane in dic:
        if dic[cane]['Deaths'] > deaths:
            deaths = dic[cane]['Deaths']
            name = dic[cane]['Name']
    return name, deaths

max_death_name, max_deaths = highest_death_count(hurricanes)
print('The hurricane with the higest death count is hurricane ' + max_death_name + ' with ' + str(max_deaths) + ' deaths.')
1 Like

Unpacking can be applied here, too, if we have tuples in the mix.

for key, obj in dic.items():
    if obj['Deaths'] > deaths:
...

See my example above. This brings everything around with simpler constructs and logic. Spend some time playing with this to make it your own.


Aside

Just now noticed the capital letter on ‘Deaths’. Recommend against that. Leave your keys in lower case to reduce potential errors.

2 Likes

That was actually the next thing I was working on I decided on n for names and v for values like in the example above. Also fixed the capitalization on the keys!

def highest_death(dic):
    name, deaths = '', 0
    for n, v in dic.items():
        if v['deaths'] > deaths:
            name = n
            deaths = v['deaths']
    return name, deaths

max_death_name, max_deaths = highest_death(hurricanes)
print('The hurricane with the higest death count is hurricane ' + max_death_name + ' with ' + str(max_deaths) + ' deaths.')

Now the only thing to address is the names given to the datasets. We know I have a penchant for symbols, hence, dlu, nlu, ylu, mlu and alu. The lu means look up which means at a glance it is a data structure. One array (list of dict objects) and four dictionaries. Which one deserves the name, 'dic'? Answer: None of them. We know they are dictionaries, but more importantly they are data stores. Name them after what they store and your code will be easier still to read and decipher.

Symbols address this need a lot better than verbose variable names do, any day. My opinion, of course.

Code that is thoughtfully composed is largely self documenting if we know the inputs and expected outputs. The purpose is easy enough to put in a name. And, there is nothing stopping us from literally documenting…

dlu  =>  Data Look Up => list of storm dictionaries
nlu  =>  Name Look Up => dictionary of named storms
ylu  =>  Year Look Up
mlu  =>  Month Look Up
alu  =>  Affected regions Look Up

The latter three we know are only indexs.

2 Likes

11 posts were split to a new topic: Getting a “list index out of range” error

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.