There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply () below.
If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.
Join the Discussion. Help a fellow learner on their journey.
Ask or answer a question about this exercise by clicking reply () below!
Agree with a comment or answer? Like () to up-vote the contribution!
Is there a reason you need to state the variable/data frame twice though? I thought usually you could do it without having to call it first in other languages - you just redefine it with whatever you’re overwriting it with. (I don’t know if I’m saying this right sorry!)
From what I understand (if R is anything like other languages), most functions work by creating a quick little copy of the things that get passed into them. Most of the time, this doesn’t actually alter the original passed-in variables that are saved in a computer’s memory. There are exceptions, like with some compound types or some functions that specifically want to change the original (I know the python data frame package has a lot of inplace = True options that can be used).
However, most of the time, the function will just do it’s job with the copy of the object passed in. The program won’t know the working copy is important and it’ll ditch it, so if you want the copy that returns from the function to stick around you need to assign it quick (which literally makes an “important” copy of the copy that has it’s own longer term spot in memory at the variable name you give it)
So here, we load the data frame to the dogs, and mutate it.
When we pass in the dogs data frame into mutate() the computer goes and finds where the dogs data frame is stored. Instead of directly taking that data frame, it instead makes a quick copy and stores it somewhere temporary, then uses that working copy to make the new column with.
dogs <- read_csv('dogs.csv')
mutated_dogs <- dogs %>%
mutate(
avg_height =(height_low_inches
+ height_high_inches)/2
)
head(mutated_dogs) # shows the mutated data frame, with the avg_height column
head(dogs) # shows the original data frame, unchanged since the mutate() alterations were made on a working copy.
The thing that still confuses me has to do with R and the .rmd format. I’m still unsure of how much I actually have to print things, and how much the cells just naturally output the variables and functions floating around.
EDIT: don’t run that codebyte… there is no support for the R language yet apparently.
What does “df %>%” in the example provided mean?
Why do we need to write “dogs ← dogs %>%” before we write the mutate function (this excerpt I took from a previous comment, which is the only reason I have this answer right)?
Why is this different than the example provided (why don’t we instead write “dogs %>%”)?
Why is this not at all mentioned or explained in the lesson?
Why is nearly every lesson in this course missing at least one piece of important information?
If we look at the documentation for the mutate function, we can see that if the data frame is called .data, then the function returns "An object of the same type as .data." This tells us that the existing data frame won’t be modified in place. Instead, a new data frame with modifications will be created. If we want to save this result, then we must assign our result to some variable. This is why we have dogs <- in the code you mentioned (There are some functions which do modify the existing data frame in place. We have to look at the documentation for the function to understand what is the expected output).
As for the piping %>%, have a look at this post. It is about the gather function, but the same explanation holds for the mutate function. The post also mentions the earlier point about saving the result.
If you are still confused about something, share your thoughts.
This worked in Codeacademy, but in RStudio it yeld the following error:
Error in `mutate()`:
ℹ In argument: `avg_height = (height_low_inches + height_high_inches)/2`.
Caused by error in `height_low_inches + height_high_inches`:
! non-numeric argument to binary operator
Run `rlang::last_error()` to see where the error occurred.
Does anyone know why is this happening? I’ve found no useful solution in Stackoverflow.