FAQ: Data Types and Quality - Working with Missing Data

This community-built FAQ covers the “Working with Missing Data” exercise from the lesson “Data Types and Quality”.

Paths and Courses
This exercise can be found in the following Codecademy content:

FAQs on the exercise Working with Missing Data

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!
You can also find further discussion and get answers to your questions over in #get-help.

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head to #get-help and #community:tips-and-resources. If you are wanting feedback or inspiration for a project, check out #project.

Looking for motivation to keep learning? Join our wider discussions in #community

Learn more about how to use this guide.

Found a bug? Report it online, or post in #community:Codecademy-Bug-Reporting

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

I don’t quite understand the difference between Missing completely at random and Missing at random.

So the former just means data wasn’t entered properly. What about the latter? How is it different from the former?

Thanks!

1 Like

I found more information in the cheatsheet that answered my question: https://www.codecademy.com/learn/paths/data-science-nlp/tracks/dsf-data-literacy/modules/introduction-to-data-38e13b33-2ba6-4515-bfbf-4a785c9194a9/cheatsheet

2 Likes

The difference is that Missing completely at random is a random error not linked to a variable (i.e bad entry due to fatigue and sloppiness) where Missing at random can have something with a variable causing the error (tree bigger than the tape measure).

Make sure you do the “Handling Missing Data” course that is linked to get deeper into it. The cheat sheet doesn’t have enough detail but the course is good.
Handling Missing Data | Codecademy

3 Likes

Hello there! I didn’t really get what exactly we can do about missing data, what are the steps? Imagine I have such a situation at work, what exactly am I supposed to do? Thank you so much for the answer!

That’s why I recommended the Handling Missing Data course. It teaches you how to fix the different types of problems arising in categorical or numeric data types. I had no problems with it until the last page when it gets heavy into pandas. I also had to learn Jupyter Notebooks and am working on Git and Github to get ready for the first project. There’s a lot to learn.