FAQ: Data Types and Quality - Validity

This community-built FAQ covers the “Validity” exercise from the lesson “Data Types and Quality”.

Paths and Courses
This exercise can be found in the following Codecademy content:

FAQs on the exercise Validity

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!
You can also find further discussion and get answers to your questions over in Language Help.

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head to Language Help and Tips and Resources. If you are wanting feedback or inspiration for a project, check out Projects.

Looking for motivation to keep learning? Join our wider discussions in Community

Learn more about how to use this guide.

Found a bug? Report it online, or post in Bug Reporting

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

For the Validity Quiz question asking which variables would be relevant for “You want to know if trees near highways are taller than trees in cities,” would the species not also be relevant in this case? Since the species of tree would certainly affect the height and could provide more insight. For example, one would not be able to conclusively state that trees happen to grow larger in one area or the other if the same species was not growing in both. Whereas, if both species and type of location were listed and there were a specific species growing in both areas, the average height of that species could be compared between the two locations, and this would in turn solidify the hypothesis.


I agree with olditalics statement and come here for further clarification. If the forest we are in is growing redwoods and in the city we can only find Oaks then by looking at the data you would make the assumption that trees grow taller in the forest, but that would not answer the question that we were asking.


Completely agree with this. I get that maybe for simplicity of newly getting into statistics this may be a bit too complicated but to evaluate the question properly you absolutely need to consider species if that’s available in the data. I suppose you can answer it with a simple statement of ‘trees near X is taller than Y’ but it’s misleading because with the research question you implicitly assume that the location has to do with the height, and leaving species out you are not actually testing fairly if same types of tree grows taller and the height may just be a different choice of species.

Same thing could be said about location and closeness. Leaving out species you are not testing the location impact on density.


I couldn’t agree more. I don’t think this data set is large or varied enough. Only the Honeylocust tree exists in both the city and near a highway and one pair is not enough to draw a viable conclusion.

1 Like

In general, I find many of the “correct” answers to be flat out wrong.
One should Be Not Valid because the data set is too small. I want to see a data set with several instances of each species in both the city and near a highway. The honeylocust is the only one that appears in both places and that is only two trees… not enough to draw a valid conclusion.

Two is correctly not valid.

Three’s answers are fine.

Four again has too little data and too much incomplete data. The distance was only entered for 3 trees and highway is not represented. We couldn’t draw any meaningful conclusion from this.

Five again, the data is not complete enough. Only 4 trees have data for both species and Single. Of the three species represented, (Tulip, Red Oak, and Honeylocust), only Honeylocust has more than a single reference. It isn’t possible to draw a pattern out of two entries, especially when one was in a group and the other wasn’t. Patterns require consistency and there isn’t any consistency here.

I think this exercise needs some work.

1 Like

What is the Single variable representing? Not sure what the relation between Single and Species would provide which trees grow in groups.

If somebody could please post a link to the exercise in question, we would greatly appreciate it. Any takers? Please oblige if deliberation of this question is ever to be ratified.

I came here from https://www.codecademy.com/paths/data-science-foundations/tracks/dsf-data-literacy/modules/dsf-introduction-to-data/lessons/data-types-and-quality/exercises/validity-data-types-quality

1 Like

Clicking on “Check Answers” I am told the column “Single” should not be considered for Question #4 “You want to know in what kind of location trees grow the closest together?”

Answers that can be selected for which columns to consider were these: Height, Location Type, Species, Prettiness, Distance, Single, and “Not Valid”

“Single” having the value 1 means a tree stands alone. “Single” having the value 0 means a tree stands with at least one other tree, ie “Not Alone” or “Is in a group”

Without considering the value in column “single,” the distance 3.20 recorded between a tree indicated as being single trees, the City Honeylocust having ID 13281 will contribute to your decision even though that tree is probably alone given the likelihood of incorrectness between the dichotomous value, single, and the continuous value, height.

Not only that, for tree species that tend to grow “single,” a recorded tree distances when there is a group of that species should weigh less toward the result than should distances for trees that tend to grow in groups aka “not single”