There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply () below.
If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.
Join the Discussion. Help a fellow learner on their journey.
Ask or answer a question about this exercise by clicking reply () below!
Agree with a comment or answer? Like () to up-vote the contribution!
Why is the regex in this example ‘[$,]’? That is, why is the comma necessary? and in the solution, why is the backslash listed before %, and why is the comma necessary in the solution as well?
The backslash allows you to use an “escape” special characters. Both the $ and the % are special characters in python. In the exercise, the Run works with or without the backslash and the comma – at least on my browser. The comma is a convention we sometimes see employed, though it is often optional.
I tried to replace ‘fract’ with ‘act’ in exam column. I got actactactactactions as a result. I used regex ^ to not match the ‘fract’ , then I got fractactactactact as a result. Is there a way to replace ‘fract’ in fractions with ‘act’ that gives us the result like ‘actions’.
I am using the following line of code to try and convert the column. But it says that I must return string instead of float. when did i change the type to float in this code? it seems to me like it should have been left as a string unless i am missing something.
I’m not sure on your second example where you use ^ unless I see that specific line of code. but for the example where you include the code this is because by using brackets you are telling python that any of these individual values is acceptable to replace with act. So it looks at the the first character in ‘fractions’ and says ‘f’ in ‘fract’ so it updates the final string to actractions. it then looks at the next character from the original string and says ‘r’ in ‘fract’ so it updates the final string to actactactions. it repeats this for the entire string and thus you end up with 5 ‘act’ strings. try something like this instead without the brackets. There are multiple right answers but this fits best into your description
Is there a glitch of some kind? My code was exactly the same as the solution. I even copy and paste the solution code to run step 1 and it still said “must be str, not float”.
Thu, is the code taught in this section correct?
but I also noticed that the data in the column was cleaned (the %'s are removed)
(note the not escaping the %…not thinking it needs escaping…though it is harmless…)
I already understood the need for the \ to escape the special characters, and your explanation for that makes sense. However, you also reference the use of the comma as a convention (even if optional), but I can’t find any information or reference for it in either the Python docs for string.replace(…), Python regular expressions, or pandas DataFrame.replace(…).
Isn’t the regex ‘[\$,]’ just providing a character set that will match either $ or , in a string? Why is the comma necessary at all, for use in a data series that doesn’t include any commas?
Can you provide a more detailed explanation for the use of the comma in the regex and/or please cite a source for that convention? I appreciate any help you can give.
You’re right. The comma , in this case is not necessary, but in my opinion, they include the comma for more general cases. For example, if there is an item whose price has more than 3 digits (e.g. $4,025), in this case, you should not only remove the $ but also the comma , as there shouldn’t be a comma in a number in Python.
Only after removing the comma can you convert a string to a numerical datatype.
Hope this helps.
I understood everything but my problem is (maybe a bit basic for most of you) instead of having “69” and I’m having “69.0” so with decimals… anybody there knows how to get just two (2) int numbers / or remove them? e.x: 69, 53…so on?
The reason why the return type of .to_numeric is float64 is probably because the score column contains nan. So the code above cast it to Int64, which is one of pandas’ nullable-integer extension dtypes. If the Int64 part in the code is set to a Numpy’s integer type such as int64, which will not be able to hold nan, it results in an error. The following two articles in the User Guide may be helpful:
Question: both of the following lines work to describe the ‘score’ column. Is there a reason to use one or the other? Is there a difference?
students[‘score’] = students[‘score’].replace(’%’, ‘’, regex=True)
students.score = students.score.replace(’%’, ‘’, regex=True)
The students[‘score’] option is useful if the column name has whitespace, for example if the column was called ‘student score’ instead you would use students[‘student score’] as the other version would throw an error. It’s also useful if you are altering multiple columns at once. Otherwise, there is no real difference as far as I am aware
Is there any particular reason we’re using pd.to_numeric() rather than adding .astype() to the end of the line where we replace the % or is it just a matter of preference?