Complex JSON to CSV

Since the NHL season is about to start I am planning on downloading some stats and team info through the NHL API. However, I got stuck trying to get the first JSON response to a CSV for further analysis. I have Googled this quite a bit and the code for flattening the json/dict is from this post https://stackoverflow.com/questions/41180960/convert-nested-json-to-csv-file-in-python. However, the CSV that I get only has two cells. “copyright” (Cell A2) and “teams” (A3) so it doesnt flatten the dict as I want it to.

The desired result would be to like this (this is a shortened version):
id, name, link, venue_id, venue_name, venue_link…
1, New Jersey Devils, /api/v1/teams/1, Prudential Center, /api/v1/venues/null
2, New York Islanders, /api/v1/teams/2, 5026, Barclays Center, /api/v1/venues/5026
Notice how the first team did not have the “Venue.ID” which makes things even more difficult.

Here is the code that should work apart from not being able to have a CSV friendly dict.

import json
import csv
import requests

response = requests.get('https://statsapi.web.nhl.com/api/v1/teams')
json_data = response.json()

def get_leaves(item, key=None):
    if isinstance(item, dict):
        leaves = {}
        for i in item.keys():
            leaves.update(get_leaves(item[i], i))
        return leaves
    elif isinstance(item, list):
        leaves = {}
        for i in item:
            leaves.update(get_leaves(i, key))
        return leaves
    else:
        return {key : item}

# First parse all entries to get the complete fieldname list
fieldnames = set()

for entry in json_data:
    fieldnames.update(get_leaves(entry).keys())

with open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=sorted(fieldnames))
    csv_output.writeheader()
    csv_output.writerows(get_leaves(entry) for entry in json_data)

If you want 1 row per team, a start would be to iterate through the teams

2 Likes

(type(teams_dict[‘teams’])) is a list and for example
type(teams_dict[‘teams’][1]) is a dict, so I have trouble looping through these two types of data structures and picking the fields I want.

I can get the “1st level” data such as team id, name etc out, but what about the inner data, for example timezone.id?
https://statsapi.web.nhl.com/api/v1/teams

I have been banging my head against this virtual wall for a better part of the week but havent gained much progress :confused:

dicts are collections of keyvalue pairs.
if you give a dict a key, it gives you the corresponding value back

mydict = {'a': 3}
print(mydict['a'])  # 3

If they’re nested then keep doing the same thing.

Thanks, but I am fully aware of that as I have done the Python 3 “basic course” here. In practice, if inside a list there are dicts as a value for the outer dicts the looping just isnt so easy. At least not for me.

Could you give an example how you would get any of the values from the inner dicts, for example venue.id?

Inside the dict you have dicts and lists. It’s the same thing, same operations. Nothing new.

If you’re able to obtain dicts from the outermost dict, then you have dicts, meaning you can obtain their values…and so on.

So I think we are both on the same boat here. Its easy to think its easy, but its not, and I assume you actually also have no idea how to get it done in practise :slight_smile: thanks anyway.

@leville,

I assure you with 100% certainty that @ionatan knows how to:

His point is that if you know how to access a value in a dictionary by it’s key, then you already posses the knowledge to access nested dictionary values. Here’s a hint if you don’t want to google it.

Hint
mydict = {
    'a': 3,
    'b': {
       'c': 4,
       'd': 5,
       'e': {
          'f': 6  
        }
    }


}
print(mydict['a'])  # 3
print(mydict['b']['e']['f']) # 6
1 Like

I’ve read through the topic and indeed @ionatan is correct about the fact that this JSON response is basically a dictionary with some keys being assigned value of other dictionaries and this goes a couple of “dimentios” down. (Although, I admit it was a bit of a confusing reply to read…:thinking:)
As for the moment, I have only completed Python3 course in 66%, so there could be a more optimized way to do this, but just using what I already learned I was able to accomplish all the described results.

First things first - I think there is no other way to build a code that works with JSON file from API without analyzing the structure of the reply. The structure of the data here and it’s formatting is very simple from python3 programmer’s point of view
we receive:

  1. a dictionary that consists of 2 keys (with names copyright and teams),
  2. the values of those keys both happen to also be dictionaries
  3. these have “inside” more specific keys and values such as team name, team id, api link but also keys of which value are other dictionaries e.g as the dictionary under key name venue
  4. And those, in turn, have their corresponding keys and values.
  • EDIT: If you’re not sure what do I mean, see the reply to your ‘‘timeZone’’ question in EDIT: TimeZone.

We’re interested in teams, so we look at the response[‘teams’].
It’s easy to see that some of the team information is available here, right away by just calling the dictionary with the appropriate key name, but the venue information, which we also want to retrieve selectively is included all together in the “venue” dictionary.
All we have to do is extract the values by using key names some if statements and just one loop.
I’ve written the code for you quickly (it was a nice exercise) and with extensive commentary so you should be able to clearly understand everything and be able to make changes (which you will probably want to make…).

For best readability copy and paste this code to your text editor to read it, it’s going to be way more comfortable, becuse of the lenght of the comment lines cause scroll bar on the bottom in the post… :confused:

import requests

###########################JSON TO PYTHON LST FOR NHL.COM API##############################
#======Loading API responce============================================================
response = requests.get('https://statsapi.web.nhl.com/api/v1/teams')
if response.status_code != 200: #checks if json was properly downloaded
	print("Promblem retreiving data from API, status code: "+response.status_code)
#=======intializing variables==========================================================
full_list = []
jsondata = response.json()	#selecting the actual data from JSON file
teams = jsondata['teams']	#choosing the 'teams' dictionary from the available data
for t in teams: #loading details of each team in teams data
#====== Initiating lists to hold read data ============================================
#i've used extra lists for easier debuging,
#in reality it can be done with just one list
	team_details = [] #will hold all wanted information about specific team
	venue_details = [] #will hold all wanted information about the specific team's venue
	complete_entry = [] #will hold compiled entry of info about the team and it's venue

#====== Creating Team Details List ====================================================
	if 'id' in t: #checks if there is an 'id' key in the current team data
		team_id = t['id'] #if it's found team_id is equal to the value of that team id
	else: #otherwise the team ID is set 'manually' to 0
		#print("No team id found, setting to 0")
		team_id = 0
	if 'name' in t: #checks for presence of 'team name', if not found set manually
		team_name = t['name']
	else:
		#print("No team name found, setting to Unknown")
		team_name = 'Unknown'
	if 'link' in t: #checks for presence of api link
		team_api  = t['link']
	else:
		#print("No team API found, setting to null")
		team_api = 'null'
	team_details.extend([team_id, team_name, team_api]) #combines the team details into a list
	#print(team_details)
#==========Team Details List Complete=========================================================

#====== Creating Venue Details List ==========================================================
	if 'venue' in t: #checks if there is venue entry for the team
		venue = t['venue'] #if there is we take those details into a variable
		if 'id' in venue: #checks venue details for ID
			venue_id = venue['id']
		else:
			#print("no venue id available, setting to 0")
			venue_id = 0
		if 'name' in venue: #checks venue details for name
			venue_name = venue['name']
		else:
			#print("Name not present, setting to Unknown")
			venue_name = 'Unknown'
		if 'link' in venue: #checks for api
			venue_link = venue['link']
		else:
			#print("no venue api, setting to null")
			venue_link = 'null'
		venue_details.extend([venue_id, venue_name, venue_link])
	else: #if there was no venue details at all it sets the complete details infromation to
		venue_details = [0,"No venue details found","No venue details found"]
	#print(venue_details)
#==========Venue Details List Complete====================================================	

#===========Building entry==============================================================
	complete_entry = team_details+venue_details
#===========Entry complete==============================================================

#==========Adding entry to the final list==================================================
	full_list.append(complete_entry)
#============FULL LIST COMPLETE===========================================================
#print(full_list)

#========QUERY EXAMPLE AND FORMAT==============================
#each team and it's venue information is at separate index
# each index has 6 sub index holding the information such as:
# 0 - team id, 1 - team name, 2 - team_api,
# 3 - venue_id, 4 - venue name, 5 - venue_api

#to print a complete information for a single team at index 0 (and leave an empty line under):
print(str(full_list[0])+".\n")
#to print team name and venue name of a team at index 10:
index10 = full_list[10]
print("Team "+index10[1]+", playing at home at "+index10[4]+".\n")


#to find team's venue name by typing some part of the team's name:
for x in full_list:
	team_src = 'Oil'#change str value to search for name of different team's venue, you can also drop an input command here instead
	current_team_name = x[1] #setting team's name to the currently loaded team's name
	if team_src in current_team_name: #checks if the search querry is in the current team name
		print("Found a match: "+current_team_name+ ". Searched phrase: "+team_src+".\n")
		print("Team's venu name: "+x[4]) # fith position holds the team name

#to call venue API by looking up venues name:
for x in full_list:
	venue_name = 'One'#change str value to search for api of different venue, you can drop an input here instead
	current_venue_name = x[4] #setting current venue name to the name currently loaded in the loop variable
	if venue_name in current_venue_name: #checks if the give venue name is included in the currently loaded venue name
		print("Found a match: "+current_venue_name)#if it finds a match it gives the information which name matched the querry
		print("Pulling out API: "+x[5]+"\n") #and prints the api which is at sixth position



#Feel free to make these into functions or whatever is more comfortable,
#I didn't bother, just wanted to show you the mechanics of worknig solutions

EDIT: timeZone

If you want to access the time zone ID or any other nested value you will need to apropriately extract it. To to this - again -we need to follow the reply structure.
Let’s see a whole respense for a single team using this code:

response = requests.get('https://statsapi.web.nhl.com/api/v1/teams')
jsondata = response.json()
only_teams_data = jsondata['teams']
our_example_team = only_teams_data[1]  #an example team, at index 1
print(our_example_team) 

The result:

{'id': 2, 'name': 'New York Islanders', 'link': '/api/v1/teams/2', 'venue': {'id': 5026, 'name': 'Barclays Center', 'link': '/api/v1/venues/5026', 'city': 'Brooklyn', 'timeZone': {'id': 'America/New_York', 'offset': -4, 'tz': 'EDT'}}, 'abbreviation': 'NYI', 'teamName': 'Islanders', 'locationName': 'New York', 'firstYearOfPlay': '1972', 'division': {'id': 18, 'name': 'Metropolitan', 'nameShort': 'Metro', 'link': '/api/v1/divisions/18', 'abbreviation': 'M'}, 'conference': {'id': 6, 'name': 'Eastern', 'link': '/api/v1/conferences/6'}, 'franchise': {'franchiseId': 22, 'teamName': 'Islanders', 'link': '/api/v1/franchises/22'}, 'shortName': 'NY Islanders', 'officialSiteUrl': 'http://www.newyorkislanders.com/', 'franchiseId': 22, 'active': True}

We can now see that we will need to acces the venue key which is directly in the team dictionary (you can recognize a dictionary by butterfly brackets { } ):
{[BEGINNING OF THE DATA] 'venue': {'id': 5026, 'name': 'Barclays Center', 'link': '/api/v1/venues/5026', 'city': 'Brooklyn', 'timeZone': {'id': 'America/New_York', 'offset': -4, 'tz': 'EDT'}}, [...REST OF THE DATA]}

We can assing that part of the responce to a variable using command:

venue_data = our_example_team['venue']

So now our venue_data holds the following data:
{'id': 5026, 'name': 'Barclays Center', 'link': '/api/v1/venues/5026', 'city': 'Brooklyn', 'timeZone': {'id': 'America/New_York', 'offset': -4, 'tz': 'EDT'}}

Our timeZone details are in a dictionary assigned as value of the timeZone key.
In order to acces the key we use the syntax:

dictionary_name['key-name']

Just like we did before to get the venue data.
In case of time zone code reads:

timezone_data = venue_data['timeZone']

Now our timezone_data variable holds:
{'id': 'America/New_York', 'offset': -4, 'tz': 'EDT'}

We’re home. Now we can use timezone[‘key-name’] to extract any information from this dictionary. For example:

timezone_name = timezone_data['id']
timezone_offset= timezone_data['offset']

Let’s print our extracted information:

print(timezone_name, timezone_offset)

The printed message will say:

America/New York -4

If you have any questions let me know!

EDIT Extracting more values:
If you’d like to extract any more values just retrieve its key parent value to a temporary dictionary and point it to the key name. Add another if statement to check if it’s key is present. I wrote If statements for each value to avoid errors in case some parts of the dataset were inconsistent or incomplete.

EDIT2 advanced: If you want to have a really flexible tool for looking into this kind of data you could probably build a function that iterates through all elements in the response JSON and checks for the number of received keys and records their names into a list and for each of them checks if the value is of the type dictionary
If it is a dictionary it’s name is recorded and then it’s key names are read and checked for the same condition. It also should check if the same key name was already recorded under that parent key name and only recorded new keys. All this would go on until the last key name in the last dictionary was recorded. All of these names (along with any other data recorded, e.g. corresponding keys datatype) are saved to a multi-dimensional list (with elements made of other lists). You can iterate a print of this list formatted in a very readable way and this would give you a complete understanding of the structure and have a full iterable keys reference that you can retrieve from any json.

1 Like

:heart_eyes:
Thank you so much @ljh1830181547. I will return your kindness on somebody else on this forum later.

I added a small piece of code to get the results on a CSV file and now I have a way to get the data I want. I am sure there is a better way to get the headers than typing them manually everytime I change which data is included, but that can wait as I really want to get this project moving forward ie. to start analyzing some actual results.

headers = ['team_id', 'team_name', 'team_api' ,'venue_id', 'venue_name', 'venue_link']

with open('191001-1-output.csv','w', newline='') as result_f:
    writer = csv.writer(result_f, dialect='excel')
    writer.writerow(headers)
    writer.writerows(full_list)
1 Like

No worries, my pleasure bro. :slight_smile:

I have also looked into other solutions, and you are 100% right that there is a better way.
It is called a recursive function.
I have been learning about it for a few hours already and honestly, it still feels so abstract…:confounded:
I understand the mechanics, but to apply it to the current example is difficult, not to say very.

So far I have managed to write a function that

  1. Reads dictionary keys
  2. Checks the values of those keys
  3. if the value is a list it converts it to a dictionary
  4. if the value is a dictionary it calls back on itself (that’s called recursion) providing newly discovered dictionary as an input, instead of the main dictionary
  5. if the value is a singular value it prints the key and the value

I’ve run it on an example dictionary I wrote by hand as it’s easier to see this way.

import requests
#======Loading API responce============================================================
response = requests.get('https://statsapi.web.nhl.com/api/v1/teams')
if response.status_code != 200: #checks if json was properly downloaded
	print("Promblem retreiving data from API, status code: "+response.status_code)
response = response.json()


#Example dictionary, small enough to understand what's up
mydic = {'disclaimer':'legal content', 'person':{'name':'baltazar', 'surname':'babuszko', 'nickname':'bladzioch',
 'favorite':{'foods':{'sovery':'mcdonalds', 'sweet':'candyflosk', 'spicy':'habaneros'}, 
 'car':'cheap skoda', 'boat':'the one on the lake', 'sweets':['sugar', 'sweetener', 'salt?']}}}
#print(mydic)


def l2d(dic): #LIST TO DICTIONARY CONVERTER -> INDICES BECOME KEYS
	if isinstance(dic, list):
		print("detected a list value, converting indexes to keys...")
		cdictionary = {}
		lindex = 0
		for each in dic: #takes every value in the list
			cdictionary.update({lindex:each}) #creates new dictionary entery, using index integers as dictionary keys
			lindex += 1 #updating list index to the next element
		print(cdictionary) #once complete debug print of the outcome dictionary
		return cdictionary	


def rf(dictionary, parent='none'):
	if isinstance(dictionary, dict): #if the received variable is a dictionary
		print("All keys available in the current dictionary at current depth: "+str(dictionary.keys())) #prints out its keys
		print("iterating keys values")
		for key, value in dictionary.items(): #cycles through all keys and corresponding values in the current dictionary
			if isinstance(value, list): #if value is a list, runs l2d converter and updates the value
				value = l2d(value)
			
			if isinstance(value, dict):# in case of every value that turns to be a dictionary as well
				print("***["+str(key)+"] key has a dictionary value, key's parent key: ["+str(parent)+"]" )
				print("opening ["+str(key)+"] dictionary to view keys") #prints notification
				rf(value, key) #calls itself (recursion) passing in arguments newly discovered dictionary and current parent key
			else:#if the keys value is not a dictionary, just print the value
				print("***["+str(key)+"] key, has a singular value: >>"+str(value)+"<< nested under parent key: ["+str(parent)+"]")

rf(mydic)
#rf(response)

If you plug your JSON result instead it will print out all keys and their values for each possible entry keeping their chronology and the parent key info. I recommend you first read mydic dictionary variable (or uncomment print(mydic)) then see function output on this dictionary, to understand how the function works, only then plug your JSON.

You could tweak this code to save the key values from the last else statement to a mono-dimensional list, notice how all the possible data runs on just three variables that you would need to retreive -key, value, parent, however, I am not sure how to make the code know, when information of separate entries is done and when it should move on to the next index…
Could have something to do with checking for change of the parent key… MAYBE.
It would be nice to get a real pro on the mic here. :smiley:
I’ll leave this tinkering for you. Let me know if you choose to follow this path at all and of course if you find results because I admit it is a difficult concept to implement and I am curious for the right solution too.

If you are interested in recursions, the best lesson I found so far on the subject is
MIT OpenCourseware - Recursion and Dictionaries
It has the “factorial of n” example, which every other recursion explanation uses, but 3 or 4 more, completely unique examples and overview of working with dictionaries and recursion.

Just found this, probably a better source to learn than anything else I found on the internet (couldn’t edit the last post anymore):
CodeCademy PRO - Learn Recursion with Python

Thanks for the link! I am sure that dictionaries would be the way to go because then you could just iterate through the keys. I think I will try to do the mini course tomorrow because that could potentially save a lot of time and blood pressure later.

I am still stuck with this. Thanks to @lael.j.h 's recursive function I am able to work with a nested json IF each key is unique (or I basically only want data for one team --> one data for in the CSV).

Because there are 31 teams I (think) I want to create a list of dictionaries that the code now produces and then use csv.DictWriter to create a CSV from there. But the problem is that for some reason the code does not produce the result I want. My guess is that I dont fully understand how the recursion part of the code works…

I checked the code with pythontutor.com visualizer and the problem seems to be that when dict2list() calls rf() it doesnt actually get the dictionary from rf(). I have no idea why.

I have attached here a shorter “two level” dictionary for two teams which makes debugging a lot easier.

nhl_dict = {'copyright': 'NHL and the NHL Shield are registered trademarks of the National Hockey League. NHL and NHL team marks are the property of the NHL and its teams. © NHL 2019. All Rights Reserved.', 
'teams': [
{'id': 1, 'name': 'New Jersey Devils', 'link': '/api/v1/teams/1',
 'venue': {'name': 'Prudential Center', 'link': '/api/v1/venues/null'}}, 
{'id': 2, 'name': 'New York Islanders', 'link': '/api/v1/teams/2', 
'venue': {'id': 5026, 'name': 'Barclays Center', 'link': '/api/v1/venues/5026'}}]}

def l2d(dic): #LIST TO DICTIONARY CONVERTER -> INDICES BECOME KEYS
    if isinstance(dic, list):
        print("detected a list value, converting indexes to keys...")
        cdictionary = {}
        lindex = 0
        for each in dic: #takes every value in the list
            cdictionary.update({lindex:each}) #creates new dictionary entery, using index integers as dictionary keys
            lindex += 1 #updating list index to the next element
            print(cdictionary) #once complete debug print of the outcome dictionary
    return cdictionary    


def rf(dictionary, parent='none', team_dict = {}): #NEW CODE - team_dict + teams_list instantiation
    
    if isinstance(dictionary, dict): #if the received variable is a dictionary
        #print("All keys available in the current dictionary at current depth: "+str(dictionary.keys())) #prints out its keys
        #print("iterating keys values")
        for key, value in dictionary.items(): #cycles through all keys and corresponding values in the current dictionary
            if isinstance(value, list): #if value is a list, runs l2d converter and updates the value
                value = l2d(value)
            
            if isinstance(value, dict):# in case of every value that turns to be a dictionary as well
                #print("***["+str(key)+"] key has a dictionary value, key's parent key: ["+str(parent)+"]" )
                #print("opening ["+str(key)+"] dictionary to view keys") #prints notification
                rf(value, key) #calls itself (recursion) passing in arguments newly discovered dictionary and current parent key
            else:#if the keys value is not a dictionary, just print the value
                #print("***["+str(key)+"] key, has a singular value: >>"+str(value)+"<< nested under parent key: ["+str(parent)+"]")
                
                if key in team_dict.keys():
                    key += str('-x') 
                    team_dict[key] = value  #NEW CODE - assign new key-value pairs to dict-to-be-returned
                else:
                    team_dict[key] = value
    return team_dict

#WORKS ONLY WITH THE NHL TEAM DATA. MAKES A LIST OF DICTIONARIES WHICH LATER IS USED TO WRITE THE CSV
def dict2list(dict):
    teams_list = []
    for team in nhl_dict['teams']:
        team_data = rf(team)
        teams_list.append(team_data)
    print(teams_list)
    
dict2list(nhl_dict)

This code is getting too complex for me to yet really understand it, but yet I would really want to get some data to analyze and to visualize because I am working my way through the “Analyze data with Python” path. Thanks for the help guys :slight_smile:

def rf( ... , team_dict = {})

You might not want to use a mutable value as a default parameter because that’s never going to reset to empty once you’ve populated it.

Not sure why you’d want that parameter at all, actually. But if you really did want it, and you probably don’t, then you’d set it to None, then check whether it’s None and if so set it to an empty dict. Don’t use it as a second return value if that’s what you’re doing.

if isinstance(dictionary, dict):

The rest of the function doesn’t need to be nested under this. You can instead:

if isinstance(fromjson, list):
    return l2d(fromjson)

# ... do the rest (not indented)

If you test for whether it’s a dict then it seems reasonable to also test for list/plain value and deal with those there.

This is really suspicious:

if
if
else
    if
    else

At the very least the first else if should be elif

if
if
elif
else

After changing only some indentation:

def rf(dictionary, parent='none', team_dict = {}):
    if not isinstance(dictionary, dict):
        return team_dict
    for key, value in dictionary.items():
        if isinstance(value, list):
            value = l2d(value)
        if isinstance(value, dict):
            rf(value, key)
        elif key in team_dict.keys():
            key += str('-x')
            team_dict[key] = value
        else:
            team_dict[key] = value
    return team_dict

Which I think is easier to look at and see what is going on. Next up would be to get rid of the team_dict parameter (use the return value instead), and parent isn’t used either. There are two checks for whether something is a dict, that probably only needs to happen once.

This is all just cleanup without paying any attention to what the code actually does, so it’s still nowhere closer to making anything “work”, obviously. But I would do these things before I even try to read anything.

As for the algorithm itself, you would flatten each child (by calling itself on each child), then join all children into one value to be returned.
Another way of doing it is to return a key-value pair each time a leaf (plain value) is encountered, and to propagate them upwards and finally collect them in a dict when the function has finished. Likely you would keep a list of all the current parent keys, so that when reaching a leaf you can concat the parent keys and use that as the key for the value.

But, considering that a team is a very small structure, there’s no real need for a generic solution, you could just as well extract each value through hard-coded lookups.

1 Like

Thanks for the comments ionatan!

I had a little brain fart because I couldn’t find a way to instantiate the dictionary inside the function because of recursion which would have set it empty on every recursion, so I just decided to use a default paramenter in case. :roll_eyes:
team_dict is now instantiated before the functions (ie. global scope).

Good points about the if, else horror. I havent tested your code yet but to me personally this

is not a natural way to think so I wont do that yet, but I will for sure use the elif and make the structure better.

I am a bit confused. Doesnt this part do exactly that?

if key in team_dict.keys():
                    key += str('-x') 
                    team_dict[key] = value  #NEW CODE - assign new key-value pairs to dict-to-be-returned
                else:
                    team_dict[key] = value

This is something I have thought of as well but as this little project has already taken far too long I now want to skip as much “fancyness” and just get this working.

If this was the case I would for sure do that.
Teams data however looks like this https://statsapi.web.nhl.com/api/v1/teams
And the match data I later intend to use is here https://statsapi.web.nhl.com/api/v1/game/ID/boxscore

Thanks again.

It works :heart_eyes::upside_down_face:

import requests
import csv

#======Loading API responce============================================================
response = requests.get('https://statsapi.web.nhl.com/api/v1/teams')
if response.status_code != 200: #checks if json was properly downloaded
    print("Promblem retreiving data from API, status code: "+response.status_code)
response = response.json()

team_dict = {}

def l2d(dic): #LIST TO DICTIONARY CONVERTER -> INDICES BECOME KEYS
    if isinstance(dic, list):
        print("detected a list value, converting indexes to keys...")
        cdictionary = {}
        lindex = 0
        for each in dic: #takes every value in the list
            cdictionary.update({lindex:each}) #creates new dictionary entery, using index integers as dictionary keys
            lindex += 1 #updating list index to the next element
            print(cdictionary) #once complete debug print of the outcome dictionary
    return cdictionary	


def rf(dictionary, parent='none'):
    global team_dict
    if not isinstance(dictionary, dict): #if the received variable is a dictionary
        return dictionary
    for key, value in dictionary.items(): #cycles through all keys and corresponding values in the current dictionary
        if isinstance(value, list): #if value is a list, runs l2d converter and updates the value
            value = l2d(value)
			
        elif isinstance(value, dict):# in case of every value that turns to be a dictionary as well
            #print("***["+str(key)+"] key has a dictionary value, key's parent key: ["+str(parent)+"]" )
            #print("opening ["+str(key)+"] dictionary to view keys") #prints notification
            rf(value, key) #calls itself (recursion) passing in arguments newly discovered dictionary and current parent key
        elif key in team_dict.keys():
            key += str('-x') 
            team_dict[key] = value  #NEW CODE - assign new key-value pairs to dict-to-be-returned
        else:
            team_dict[key] = value
            #teams_list.append(team_dict)
    return team_dict



#WORKS ONLY WITH THE NHL TEAM DATA. MAKES A LIST OF DICTIONARIES WHICH THEN IS USED TO WRITE THE CSV
def dict2list2csv(dict):
    teams_list = []
    for team in dict['teams']:
        team_data = rf(team)
        teams_list.append(team_data)
        global team_dict           #team_dict is a global variable so we need to make it global here as well
        team_dict = {}              #before emptying it for the next team
    print("teams_list: {}".format(teams_list))
       
    
    headers = list(teams_list[1].keys())

#====Writes the teams_list LIST to a CSV=====================================
    with open('team-data-2210-6.csv', 'w', newline='') as output:
        csvFile = csv.DictWriter(output, fieldnames = headers)
        csvFile.writeheader()
        for value in teams_list:
            csvFile.writerow(value)
 
dict2list2csv(response)

There is of course still room for improvement. At least:

  1. like @ionatan mentioned it would be better if instead of adding “-x” the name of an existing key (or perhaps any nested key) would include the name of its parent.

  2. I am not sure if my handling of the variables is the best… But that can also wait.

  3. If the headers list doesnt have all the keys every team/match (ie. row) has, the code doesnt work. Thats why I used
    headers = list(teams_list[1].keys()) instead of [0] in the example as New Jersey Devils don’t have a “Venue ID”. This one is something I might have to fix pretty soon when I start working with game data.

Now its time for part two which is working with the box score! :muscle:

That’s exactly what you shouldn’t be doing here. It was in effect global before. You don’t need it to be global, you don’t even need it to exist in any form.

Your function takes a json object and flattens it.

A team is a json object to be flattened. So you’d send it to your function, and get a flat version back. – there’s no need for a backchannel here.


And then the loop, it has this in it:

rf(..)  # changes global var
if this then that # also changes the same global var

Those two operations are both doing something with the same global variable. They are interleaved with each other because they’re in a loop, and further yet, rf also has this interweaving within it. This is insane. Multiple instances of your function is going to be in progress of manipulating that same value at the same time.

What you would do instead is:

Flatten each child:

children = []
for child in json:
    children.append(flatten(child))

Then merge them: (each child is flat at this point, so merging them is no longer difficult)

return union(children)

(What’s missing in this is that the children of lists, dicts, and values are extracted differently, and that’s something you would do as a separate step before looping through them)
(to include the keys, you would pass the key as a second argument, when a leaf (value) is reached it can couple the key with the value)


Regarding the size of the structure, there are only about 30 things in a team. Very hard-codeable!

Thanks for the comments again.
I don’t YET understand why what I did was so bad. Could you point me to an article or a link or something so in the future I will learn to make less insane solutions? (I have only done Python 3 and Recursion in Python so far)

Meanwhile, I will just hard code some game data (box score) to get on with this and get to know Pandas & Friends better.