FAQ: Modifying Data Frames in R - Adding a Column

This community-built FAQ covers the “Adding a Column” exercise from the lesson “Modifying Data Frames in R”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Learn R

FAQs on the exercise Adding a Column

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Just a quick question to check my understanding of mutate().

The following code snippet is provided in the example:

df %>%
  mutate(sales_tax = price * 0.075)

The example then continues: “Now the inventory table has a column called sales_tax , where the value is 0.075 * price:”

Is this statement totally true? I.e., is this modifying the inventory table in place, so to speak, or is mutate() actually returning a new data frame?

This would explain why we need

dogs <- dogs %>%
	mutate()

rather than

dogs %>%
	mutate()

which seems to be implied above.

1 Like

the first ‘dogs’ is your variable name for the df.

dogs ← dogs %>%

You’re actually overwriting your existing dataframe ‘dogs’ with one that is the same as the existing + the avg_height column.

5 Likes

Is there a reason you need to state the variable/data frame twice though? I thought usually you could do it without having to call it first in other languages - you just redefine it with whatever you’re overwriting it with. (I don’t know if I’m saying this right sorry!)

1 Like

Anyone else getting a broken exercise here?
I get an error message saying:

Error: package or namespace load failed for ‘readr’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):

From what I understand (if R is anything like other languages), most functions work by creating a quick little copy of the things that get passed into them. Most of the time, this doesn’t actually alter the original passed-in variables that are saved in a computer’s memory. There are exceptions, like with some compound types or some functions that specifically want to change the original (I know the python data frame package has a lot of inplace = True options that can be used).
However, most of the time, the function will just do it’s job with the copy of the object passed in. The program won’t know the working copy is important and it’ll ditch it, so if you want the copy that returns from the function to stick around you need to assign it quick (which literally makes an “important” copy of the copy that has it’s own longer term spot in memory at the variable name you give it)

So here, we load the data frame to the dogs, and mutate it.
When we pass in the dogs data frame into mutate() the computer goes and finds where the dogs data frame is stored. Instead of directly taking that data frame, it instead makes a quick copy and stores it somewhere temporary, then uses that working copy to make the new column with.

dogs <- read_csv('dogs.csv') mutated_dogs <- dogs %>% mutate( avg_height =(height_low_inches + height_high_inches)/2 ) head(mutated_dogs) # shows the mutated data frame, with the avg_height column head(dogs) # shows the original data frame, unchanged since the mutate() alterations were made on a working copy.

The thing that still confuses me has to do with R and the .rmd format. I’m still unsure of how much I actually have to print things, and how much the cells just naturally output the variables and functions floating around.

EDIT: don’t run that codebyte… there is no support for the R language yet apparently.

1 Like

What does “df %>%” in the example provided mean?
Why do we need to write “dogs ← dogs %>%” before we write the mutate function (this excerpt I took from a previous comment, which is the only reason I have this answer right)?
Why is this different than the example provided (why don’t we instead write “dogs %>%”)?
Why is this not at all mentioned or explained in the lesson?
Why is nearly every lesson in this course missing at least one piece of important information?

If we look at the documentation for the mutate function, we can see that if the data frame is called .data, then the function returns "An object of the same type as .data." This tells us that the existing data frame won’t be modified in place. Instead, a new data frame with modifications will be created. If we want to save this result, then we must assign our result to some variable. This is why we have dogs <- in the code you mentioned (There are some functions which do modify the existing data frame in place. We have to look at the documentation for the function to understand what is the expected output).

As for the piping %>%, have a look at this post. It is about the gather function, but the same explanation holds for the mutate function. The post also mentions the earlier point about saving the result.

If you are still confused about something, share your thoughts.

Hello! I donwloaded the dataset from the AKC website and tried this exercise in RStudio.

I tried this:

dogs <- dogs %>%
  mutate(avg_height=(height_low_inches + height_high_inches)/2)

head(dogs)

This worked in Codeacademy, but in RStudio it yeld the following error:

Error in `mutate()`:
ℹ In argument: `avg_height = (height_low_inches + height_high_inches)/2`.
Caused by error in `height_low_inches + height_high_inches`:
! non-numeric argument to binary operator
Run `rlang::last_error()` to see where the error occurred.

Does anyone know why is this happening? I’ve found no useful solution in Stackoverflow.

Thanks in advance!

How are you reading in the dataset?

Based on the error message, it may be the case that height_low_inches and height_high_inches are being seen as characters(strings) instead of numbers.

I think after reading in the dataset, you should inspect the data types of the columns. Are they as expected?

Perhaps, a related thread: