Hey everyone!
I hope you’re all doing well. I have a question regarding the “Data Science: NLP” path, specifically about the concepts of “missing at random” and “missing completely at random.” I encountered a question during a quiz that confused me, and I wanted to get your thoughts on it.
Here’s a quick rundown: I was presented with a table containing missing values, and I had to determine whether the missing data was “missing at random” or “missing completely at random.” I’ve attached a screenshot of the question for reference. Although I answered it correctly this time (after getting it wrong before), I’m still struggling to grasp why this particular example would fall under “missing at random” instead of “missing completely at random.”
From what I understand, “missing completely at random” refers to data where the missing values occur randomly across participants, without any systematic relationship to other variables. In this table, the missing values do appear to be randomly distributed among the participants, without any obvious pattern related to specific variables. To me, this aligns with the definition of “missing completely at random.” However, the learning module suggests that it’s an example of “missing at random,” which usually implies a consistent missing pattern based on another variable (e.g., all missing height data for Redwood trees due to equipment limitations).
If I’ve misunderstood something, I would greatly appreciate it if you could shed some light on the difference between “missing at random” and “missing completely at random.” I’m eager to deepen my understanding, and any explanation would be incredibly helpful.
Thank you all in advance for your assistance. I truly value your input!