Cant Read a Json File in Jupyter notebook

Hi,
I am trying to read a josn file in a jupyter noteook as part of the Yelp regression project. Please find the link here

https://www.codecademy.com/paths/data-science/tracks/dspath-supervised/modules/yelp-regression-project/informationals/predict-a-yelp-rating-regression

However, I have the following error thrown at me all the time.
I have tried even copying the path name to the file in the function but it still produces the same issue

Help would be appreciated
Thanks

@arc3273773205,

Why are you trying to use a raw string absolute path to load the JSONs? As long as the JSON files are still in the same folder as your notebook you can just open them with the file name.

Your first cell should look something like this:

import pandas as pd
businesses = pd.read_json('yelp_business.json', lines=True)
reviews = pd.read_json('yelp_review.json', lines=True)
users = pd.read_json('yelp_user.json', lines=True)
checkins = pd.read_json('yelp_checkin.json', lines=True)
tips = pd.read_json('yelp_tip.json', lines=True)
photos = pd.read_json('yelp_photo.json', lines=True)

Hey,

Yes I tried the code below before I used a raw file path but I kept on getting an error.

Hi there.

I’d be surprised if the error you’re seeing was related to how you’ve provided a path to the JSON file…

The very first line of your error message says, quite clearly, what the problem is: MemoryError

Here’s the Python docs for that exception type.

For some reason, when your computer attempts to load that JSON file, Python is encountering (and alerting you to) a low-memory condition. This shouldn’t happen… but I don’t use Jupyter, so I’m not sure why…

First guesses would be either you’re using a really low spec computer (unlikely), or you’re somehow repeatedly loading something and that’s why you’re running out of memory… Can you post all the code you’ve got in that notebook, so we can see what it’s trying to run?

1 Like

Now that I take a closer look at your screenshot, it looks like you’re getting a MemoryError (which thepitycoder noticed before me). This tends to be an issue with very large JSONs, but not typically ones this size. I don’t run into this error when I run the code above in a Jupyter notebook.

I did find this discussion on the subject, but considering these JSONs are under 200mb each, I don’t think you should need to use any of these other than lines=True.

1 Like

Hi,
Thanks for that explanation. I also thought it had nothing to do with the path but just wanted to be sure.
Regarding the I ran in the notebook please see below

import pandas as pd

review = pd.read_json('yelp_review.json', lines=True)

Dont understand why I am getting the error as I am using the provided notebook given for this exercise

Hey,
Any further help will be appreciated

How much memory do you have free on your computer when you are trying to use read_json? From what I’ve found on Google, the Python object of the JSON you’re reading into memory can occupy 10 to 25 times more space in memory than the size of the file (not sure how true those stats are, but it definitely takes more space than the listed file size).

I’m not sure why you would be having this problem though. I just tested this on a computer with 8GB RAM and RAM usage at around 83-84% and it worked just fine for me. Maybe try rebooting and giving it another shot? Not very technical, but it can’t hurt.

If it is still causing you a problem, I would suggest trying one of two things.
Either:

  • try using the chunk_size parameter (docs here) — you will have to iterate over this and save it to your final dataframe; or
  • you can try manually loading the JSON files by line and then converting to a DataFrame.

Code for option 1:

import pandas as pd

chunks = pd.read_json('yelp_review.json', lines=True, chunksize = 100)

reviews = pd.concat([c for c in chunks])

Code for option 2:

import pandas as pd
import json

reviews_lst = [json.loads(line) for line in open('yelp_review.json', 'r')]
reviews = pd.DataFrame(review_lst)

Screenshots showing all three methods (including the original one) were successful in my tests:


3 Likes

Great, thanks so much for the help here