This course introduces Big Data both conceptually and with hands on practice. If you’ve heard about big data, but weren’t quite sure what it means, or what is different when working with big data, then this course is for you. Likewise, if you have tried to load a dataset and crashed your computer, this course will teach you the practical techniques to fix the problem, and the conceptual foundation to understand what happened.

To that end allow me to introduce @michellemcsweeney and @andreahassler who worked to create this course. They’re here to answer any questions you have.

I like this course, but it is stuck on section 3 of 8

At this point it’s necessary to load a wikipedia data source (a csv file that is a recording of clicks) but that file is missing
So it is impossible to proceed to the rest of the course.

Here’s the error message
AnalysisException: Path does not exist: file:/home/ccuser/workspace/pyspark-sql-lesson-spark-dataframes-from-external-sources/data/wiki_uniq_march_2022.csv

It looks like the instructions included the wrong file path. You should be able to read in the data with just ‘wiki_uniq_march_2022.csv’ rather than ‘./data/wiki_uniq_march_2022.csv’. I’ve just updated the instructions accordingly.

Nope that file isn’t there either

If you enter into the notebook

! ls

you’ll see that there are no data files in the directory

I think this may be a caching issue then–have you tried resetting the exercise? You can do this from the ‘Get Unstuck’ menu at the top right or by hitting the arrow button next to ‘Test Work’ at the bottom. This should reset the exercise to the most updated workspace.

Ok NOW the file is there. It wasn’t there before because live been using (from the not book) the bash command
! ls

And it never showed anything but in this folder except
Notebook.ipynb and pycache
until just now

now it works. Someone must have placed the file there since the last time I checked

