FAQ: The Data Science Process - Reproducibility and Automation

This community-built FAQ covers the “Reproducibility and Automation” exercise from the lesson “The Data Science Process”.

FAQs on the exercise Reproducibility and Automation

I am relating to the exercise “Reproducibility and Automation” of the “Introduction of Data Science Course” to be found here

I am trying to code along with Anaconda in Jupyter Notebook but am getting an error in the last line in the attachment. Can someone please help me what I am doing wrong? Is the example wrong from Code Academy?

I get with the code line: population_mean = np.mean(uscities[“age”]) the NameError: name ‘np’ is not defined.

Please find attached screenshot.![Name Error|690x417]

Thank you,

You’re trying to use a name that hasn’t been assigned to anything. In this case it’s np.
Have you imported numpy and set np as the name to bind to? You’ll need import numpy as np at the start of your code to make use of it in this way.

Hello thank you for your reply,

Sorry for my late reply but I was diving into a Python course to get better understanding.

Can you please check the screenshot I attached because I have done exactly that I believe.

Starting with the line “import numpy as np”.

It’s hard to say but make sure you actually run each of those cells in order and ensure numpy has actually been imported. Since you’re using pandas you could probably skip the import for the most part anyway. It already wraps a lot of numpy methods and you could call the .mean method with your dataframe (it’s worth looking into).

Okay so what you wrote made sense, when I run all the lines again it gives me a different error.

Can you please help me understand why in the screenshot attached uscities is not defined? I am importing the data in the line above it.

Also in the exercise I need to do it in this way. I have attached also the exercise as a screenshot.

That one has quite a clear error message. It cannot find the name uscities because it has not been assigned at any point in the code. You read your data in via a pandas function and assigned the return to the name user_data. Is that perhaps what you intended to work with?

The name of the file is no longer important once the data has been read and the file closed (which is all automated with pandas.read_csv). I’d suggest having a little look at the docs if you are unsure how it works-

Your question: " That one has quite a clear error message. It cannot find the name uscities because it has not been assigned at any point in the code. You read your data in via a pandas function and assigned the return to the name user_data . Is that perhaps what you intended to work with?"

My answer:
Please understand that the confusing part for me is why is it working for codecademy but when I use exactly the same code it is not working for me?? I did not wrote this code myself. It is an answer of an exercise in Codecademy.
In my previous post I attached a screenshot of this code (named screenshot_exercise) that belongs as an answer to the exercise named “rural or urban” which I am trying to re-create in jupyter notebook myself to see how it looks there. But somehow I am getting errors.

I can also see the clear name error but I just copy the code from the answer from codecademy so why is it not working for me is the question? I am not creating this code myself. I am a beginner and am not understanding everything yet. The more confusing it is when you have a code that works in the codecademy environment but when running it in Jupyter Notebook it gives errors. Now I want to understand why. Is this answer code then even working outside the exercise UI :wink:

You are attempting to use a name uscities before it has been assigned hence the NameError from cell[6]. This is why I asked about what you name you actually intended to work with.

What’s the difference between what you assigned to (from pd.read_csv()) and what you’re trying to work with?