"Unicode can't decode error" when trying to import data

I tried to use BeautifulSoup to import my Steam Purchase History, which I had loaded and then saved as a local html file, into Python. This resulted in the error:

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

I then highlighted the entire table in the browser with the cursor and copied into Microsoft Excel Online and then copied that into LibreOffice and then saved it as a CSV file. I tried to import the csv file using Pandas but I still got the exact same error as before:

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

So it seems to be there probably was nothing wrong with my Pandas code nor my BeautifulSoup code but perhaps some setting or property or maybe some library or codec i need to add.

Also, I must add that I have already imported a lot of csv files while doing Codecademy projects offline.

I don’t know what to do anymore.

If that’s the error I think it is then it’s one often bites windows users because of the difference in path separators and escape characters. Perhaps the following link could help (one of the first responses if you search for that specific error online)-

2 Likes

Confirmed it had something to do with the file path.

Turns out it was the first time ever that I tried to import a csv file that was not in the same folder as the python script. What I did which was causing the error was to copy the entire file path like this:

"C:\Users\myname\Beatifulsoup practice\Steam_History.csv"

I was able to import the csv file by putting it in the same directory as the py file
Next thing I will try is to import via BeautifulSoup which seems to be the proper way.
I’m wondering what are the valid and invalid ways to reference a file in Python in Windows

3 Likes

This answer covers a couple of possibilities- https://stackoverflow.com/a/46011113. If you’re looking for something more portable though the pathlib library is worth a look-
pathlib — Object-oriented filesystem paths — Python 3.9.1 documentation

1 Like

On Windows, if you need to import from another directory you can simply escape the backslashes if you want a quick workaround:

C:\\Users\\myname\\BeautifulSoup\\tomato\\data.csv

Looks odd, but it works. :slight_smile:

(Not so sure how this would go down in a production environment, though…)

In the second link, a raw string is mentioned.
Also, I didn’t know that you can use either forward slash or back slash for file paths.
This is new knowledge for me thanks.

1 Like

Confirmed and tested working with Beautiful Soup saved HTML file import.
Had to pass the file path and file encoding as arguments to open() before passing it to BeautifulSoup()
Had to check encoding of file by opening it with Notepad --> Save As --> Inspecting default selected Encoding

1 Like