Python_dictionary_list_looping

Hello,
I need your help to find a coding solution calculate distances between two points.
The data comes from a list called “unknown” and dictionary "movie_dataset. I could not find a solution to extract list from dictionary without two for loops, because of 2 loops final code has to many iterations.
Help me to find another solution
See my code bellow.

movie_dataset = {'Avatar': [0.01940156245995175, 0.4812286689419795, 0.9213483146067416], 
                 "Pirates of the Caribbean: At World's End": [0.02455894456664483, 0.45051194539249145, 0.898876404494382], 
                 'Spectre': [0.02005646812429373, 0.378839590443686, 0.9887640449438202], 
                 'The Dark Knight Rises': [0.020465784164507467, 0.4334470989761092, 0.9550561797752809], 
                 'John Carter': [0.021587310114693104, 0.3242320819112628, 0.9550561797752809], 
                 'Spider-Man 3': [0.021120689828849445, 0.4061433447098976, 0.898876404494382], 
                 'Tangled': [0.021284416244934937, 0.2150170648464164, 0.9325842696629213]}
distance = 0
unknown = [0.4, 0.2, 0.9]
for key, value in movie_dataset.items():
    for i in range(len(value)):
        distance += (value[i] - unknown[i])**2
        squared_root = distance ** 0.5 
    print(key, squared_root)

I ran the same code and it came out with the answer to each movie, which was only seven. How many iterations are you wanting?

And out of curiosity, were are these points?

1 Like

I wonder if what the OP meant is that, with 7 movies and 3 seemingly random looking values per movie, the inner for loop is being executed 21 times…

I’ll second that sentiment, but tbh I can’t grasp quite what we’re trying to achieve here…

Any clarity on what those random data points are, and what the objective is, would certainly be helpful (to me at least)…

Firstly, my apologies for terminology and wording. I still not developed my language in this domain.

I am trying to get sum of each element from list named unknown subtracted with each element from movie_dataset[key], the movie_dataset[key] is also a list, to reach this list i had to use inner loop.
Due to the inner loop, code is executed 21 times and this is not the result i would like to have.
Do you know a suggestion how to reach movie_dataset[key] “[0.01940156245995175, 0.4812286689419795, 0.9213483146067416]” from a dictionary without using second for loop?

It wasn’t so much the wording which was unclear.

The thing which has no context is the data; for example, what are the values in this list:
[0.01940156245995175, 0.4812286689419795, 0.9213483146067416]

To me, it’s a sequence of random numbers, with no context at all. I can follow what the program is doing - fetching those numbers from the dictionary, iterating over them to perform some calculation, and returning an output.

What I don’t have is the understanding of why we are doing this, which means that whilst I might be able to refactor your program in one (or more) ways I have no idea if it’s helpful or getting any closer to the objective because I have no concept of what that objective is.

(This could just be me being dense, and if so I apologise profusely.)

Edit: I compartmentalised your loops into a callable function, so I could compare the execution time (not in too precise a fashion) against an alternative method I thought of:

OP average time over 200 iterations: 0.0338 seconds.
My average time over 200 iterations: 0.0326 seconds.

Little appreciable gain.

Hello,
I am practicing ML and reach K-nearest neighbors lesson.
In this lesson there are available input data (“from movies import movie_dataset, movie_labels”) to calculate KNN , i wanted to download this files with data’s to work on my local drive but i could not find them, therefore i printed the file and copied the dataset from terminal.
Due to their configuration i have save them in a txt file and from this file create dictionary object called movie_dataset.
I hope is clear where this situation come from.
Shortly, i am mapping the steps from this lesson https://www.codecademy.com/courses/machine-learning/lessons/knn/exercises/find-neighbors on my local Jupiter notebook. Unfortunately, codeacademy use as input import file from their internal library and i had to transfer those datas into a dictionary.
Finally, i got stuck on my second block of coding method classify, i expect to have distinct value for each movie label but didn’t happens, all movies have same label output.
Can you help me to find out why the output is same value for each title?
The argument for function classify:
dataset.txt (6.9 KB)

#function to calculate distances between  points 
def distance(movie1, movie2):
    squared_distance = 0
    for value in movie1.values():
        for dist in range(len(movie2)):
            squared_distance += (value[dist] - movie2[dist])**2
    squared_root = squared_distance ** 0.5 
    return squared_root
print(distance(normalized, un_known))

#function to find the nearest point, unknown list is the data i want to classify, k the numbers of neighbors 
unknown =  [0.5, 0.2, 0.9]
def classify(unknown, dataset, k):
    distances = []
    for key in dataset:
        #print(dataset[key])
        distance_to_point = distance(dataset, unknown)
        #print(distance_to_point)
        distances.append([key, distance_to_point])
        distances.sort()
        neighbors = distances[0 : k]
    return neighbors
print(classify(un_known, movie_dataset, 5))