Hello everyone,
I am working my way through the Data Scientist Career Path and I am currently on the first non step-by-step project which I am working locally using Jupyter Notebooks:
Project: Hurricane Analysis
I spent almost half the day for completing task #2 and task #3, but I finally succeed to get the required results and I have to say, it felt fulfilling!
Despite that, I believe that the way I approach the problem might not be the most efficient, although after a lot of googling it seems like the only way that achieves the desired result! Thus, I would like to share my code so far and ask if there is an alternative to that.
I have uploaded my code on my GitHub page, as I am working locally for this project.
I would appreciate any advice and feedback!
Hello!
For step 2 you can use list comprehension (5. Data Structures β Python 3.10.2 documentation) and conditional expressions (6. Expressions β Python 3.10.2 documentation):
updated_damages = [
x if x == 'Damages not recorded' else float(x[:-1]) * conversion[x[-1]]
for x in damages
]
Step 3 is more complicated. As start I create a list of dictionaries using zip
(Built-in Functions β Python 3.10.2 documentation) and list comprehension:
# I think there is more cleaner way to do this
non_indexed_list = [
{ "Name": x[0], "Month": x[1], "Year": x[2], "Max Sustained Wind": x[3], "Areas Affected": x[4], "Damage": x[5], "Deaths": x[6] }
for x in zip(names, months, years, max_sustained_winds, areas_affected, updated_damages, deaths)
]
Next wrote two helper functions:
# For unique fields. Returns { field: dict }
def key_by(data: list, field: str) -> dict:
return { x[field]: x for x in data }
# For grouping by non-unique field. Returns { field: [dict, dict, ...] }
def group_by(data: list, field: str) -> dict:
grouped = {}
for r in data:
key = r[field]
if key not in grouped:
grouped[key] = []
grouped[key].append(r)
return grouped
And then simply called helpers with non_indexed_list
:
hurricanes = key_by(non_indexed_list, "Name")
hurricanes_by_year = group_by(non_indexed_list, "Year")
Note that values of hurricanes_by_year
have to be lists even in there is only one hurricane in this year.
Hello @9509706156,
Thank you for taking the time to help!
You are a live example that is one step to just achieve the required result, and another to write efficient code alltogether!
I would never have thought of your so succint solution for step 2! I wonβt even comment for the 3rd
!
Hello!
Two more solutions to generate non_indexed_list
:
With columns and double zip
:
columns = [ "Name", "Month", "Year", "Max Sustained Wind", "Areas Affected", "Damage", "Deaths" ]
non_indexed_list = [
dict(zip(columns, x))
for x in zip(names, months, years, max_sustained_winds, areas_affected, updated_damages, deaths)
# The sequence of arrays in zip must be the same as names in columns array
]
And another one with dict
-table:
table = {
"Name": names,
"Month": months,
"Year": years,
"Max Sustained Wind": max_sustained_winds,
"Areas Affected": areas_affected,
"Damage": updated_damages,
"Deaths": deaths,
}
non_indexed_list = [
dict(zip(table.keys(), x))
for x in zip(*table.values())
]
1 Like