Python Graph Search on Jupiter Notebook – How is this possible?

I run into something weird.

I have the habit of copying the exercises from my CS path into a Jupiter Notebook, so I have them for future reference and I can experiment with them more freely.

In the depth first graph search exercise, at the end, somehow the output I get on the notebook is different than the one I get on Codecademy, The one on CC is clearly the right one, because in this graph there is no connection between “lasers” and “bees” and yet the notebook prints that path out.

For reference, the output on CC is: ['crocodiles', 'piranhas', 'lava', 'sharks', 'bees'] as it should be

Could you link the lesson? It could be something simple like a lack of order in the sets but there’s no easy way to run a screenshot (code as text either via code hosting e.g. repl/gists etc. or with How do I format code in my posts?) is very helpful.

Yes, sorry, my bad.

I hope this works:

def dfs(graph, current_vertex, target_value, visited=None):
  if visited is None:
    visited = []
	
  visited.append(current_vertex)
  
  if current_vertex == target_value:
    return visited
	
  # Add your recursive case here:
  for neighbor in graph[current_vertex]:
    if neighbor not in visited:
      path = dfs(graph, neighbor, target_value, visited)
      if path:
        return path
      

the_most_dangerous_graph = {
    'lava': set(['sharks', 'piranhas']),
    'sharks': set(['lava', 'bees', 'lasers']),
    'piranhas': set(['lava', 'crocodiles']),
    'bees': set(['sharks']),
    'lasers': set(['sharks', 'crocodiles']),
    'crocodiles': set(['piranhas', 'lasers'])
  }

# Call dfs() below and print the result:
search = dfs(the_most_dangerous_graph, "crocodiles", "bees")
print(search)

And Link:
https://www.codecademy.com/paths/computer-science/tracks/cspath-cs-102/modules/graphs-and-graph-search/lessons/graph-search-python/exercises/graph-search-python-dfs-iii

1 Like

Cheers, that works perfectly. Yeah it’s just a consequence of the search you’re using, there’s a chance you’ll take paths that don’t actually go towards your goal. The bit that might be unexpected is the way the sets work, they’re hashed much like dictionaries (hence the constant time membership tests) but they’re not ordered, not even insertion ordered like the newer Python dicts are. This is what’s giving you different results, the order in which the neighbouring vertices are being traversed.

Technically there’s an underlying order from the hash itself but Python’s interpreter actually seeds the hash (at least for strings, not sure about everything else) with a random number each time you run the interpreter so the “order” of the set appears to change.

If you ran the code on the cc interpreter several times I think you’d get a different order each time. The Jupyter session on the other hand is a single python session so the seed would be exactly the same and the order would never change (until you restarted the kernel).


Edit: You can even test your code in the snippet below. Each time you run it you’ll probably get a different result (might need to run a few times to actually see the effect since there are only so many possibilities)-

def dfs(graph, current_vertex, target_value, visited=None): if visited is None: visited = [] visited.append(current_vertex) if current_vertex == target_value: return visited # Add your recursive case here: for neighbor in graph[current_vertex]: if neighbor not in visited: path = dfs(graph, neighbor, target_value, visited) if path: return path the_most_dangerous_graph = { 'lava': set(['sharks', 'piranhas']), 'sharks': set(['lava', 'bees', 'lasers']), 'piranhas': set(['lava', 'crocodiles']), 'bees': set(['sharks']), 'lasers': set(['sharks', 'crocodiles']), 'crocodiles': set(['piranhas', 'lasers']) } # Call dfs() below and print the result: search = dfs(the_most_dangerous_graph, "crocodiles", "bees") print(search)

If you were to change the dangerous_graph to something ordered like lists or tuples, e.g.

the_most_dangerous_graph = {
    'lava': ['sharks', 'piranhas'],
    'sharks': ['lava', 'bees', 'lasers'],
    'piranhas': ['lava', 'crocodiles'],
    'bees': ['sharks'],
    'lasers': ['sharks', 'crocodiles'],
    'crocodiles': ['piranhas', 'lasers'],
}

then you should get the same result each time.

1 Like

Thank you so much for the answer. I’m not sure I understand everything about it but at least it helps making sense of what’s going on. Maybe the main problem is that I’m not familiar with sets as a data structure.

My surprise what that the search is supposed to give you the path, and since “lasers” and “bees” are not connected in this graph, I could not understand why the path that the search outputs connects them, if this makes sense. Like, in my mind the order should not matter, because the algorithm should not be able to go from “lasers” to “bees,” no?

1 Like

It doesn’t directly jump from those two, the algorithm won’t find the most efficient path, it will often go along dead ends. At the moment there’s nothing to stop it adding nodes to the path which don’t lead to the final result.

I think path is included just to help you visualise the path it traces (in terms of nodes visited), consider how it works in the following animation- https://en.wikipedia.org/wiki/Depth-first_search#/media/File:Depth-First-Search.gif and the “path” might make more sense. Perhaps path_visited or order_visited would be a preferable name since it simply tells you the order in which nodes were visited, the full path of considered nodes would be a bit different and the most efficient path is different again.

If you understand how dictionary keys work in Python then you understand the basics of sets. I think the CS path introduces hashing at some point which would explain most of their implementation. One of the intermediate or advanced Python courses goes into the actual uses of sets (intersection, symmetric difference and so on) if you’re curious. It’s not an essential topic and you might prefer not to use them; in the right place though they can be very efficient and dramatically clearer than any alternatives.

1 Like

Thank you.

Yes, I guess a path_visited name would have been more clarifying. I thought that the search would give you the actual path, but it makes sense that it gives you all the visited nodes until it finds the target node.

I was thinking about doing the intermediate Python course after CS102 and I’ll definitely do that now. There are so many things that the basic course just touches without really explaining them in much depth.

1 Like

Question about Jupyter Notebook: in Task 4 of the Frida Project, it’s not remembering “paintings” from previous tasks, so returns a NameError message saying “paintings” is not defined. How do I avoid this?
Thanks for any help with this!