Cleaning Data - Confused about syntax - gsub() why two backslashes?

I’m doing intro to R, and i’m currently within the cleaning data section.

throughout the whole lesson code academy has been really good about explaining the syntax, but it seems like a ball was dropped here.

I don’t understand what the purpose of the two backslashes (\) are in the code below. What do they do and will I always need to add them? Are there times when gsub calls for one or three or more backslashes?

here’s the code from the example asking to get rid of the ‘$’ in the price, e.g. ‘$2.50’

fruit %>%
  mutate(price=gsub('\\$','',price))

and here’s the code from the problem asking to get rid of the ‘%’ in the scores, e.g. ‘60%’

students %>%
  mutate(score=gsub('\\%','',score))

Got to escape your escapes, I don’t miss this haha. Issue #1 here is that gsub takes a regex pattern (at least by default). In regex patterns $ has a special meaning (end of string/line). So to actually search for a $ character in regex you must escape the $ in the pattern such that the pattern is written- \$

The second issue here is strings in R also use \ as an escape character so to create the pattern \$ in a string you must also escape the \ character so the original input becomes '\\$'.

You can test this quickly with the following which throws an error since \$ is not a valid escape sequence (like \n would be).

pattern <- '\$'

Apparently 4.0+ can use raw strings which no longer interpret the escape sequences if you have no need for them in a specific string.

1 Like

Thank you! This is really helpful. I’m a little too new to R to fully understand what you’ve detailed, but at least now I know what I have to learn!

Does searching for ‘%’ also require me to ‘escape your escapes’?

1 Like

That combination of two things, new syntax in R and regular expressions probably makes this trickier than it would otherwise be.

% doesn’t have a special meaning in any regex I’m familiar with so you don’t need the escapes gsub('%', '', text) should work perfectly well.

If you’re doing a lot of work with strings then it might be worth looking into regex syntax properly (not sure if it’s included in the course or not). It’s a fairly simple syntax that shouldn’t take too long to get acquainted with, there’s a regex course on cc- https://www.codecademy.com/learn/introduction-to-regular-expressions (states an hour to completion) that won’t teach you everything but would give you a decent basis to start from so you can at least recognise and use parts of it.

Once you get the hang of it you’ll probably start using it all the time even when just editing text documents and such because things like find/replace suddenly become much better.