Thread Shed: Behavior of Split(',') and Replace

Link to Project: https://www.codecademy.com/paths/computer-science/tracks/cspath-python-objects/modules/cspath-python-strings/projects/thread-shed

First time posting, so let me know if I can do better in formatting/asking my questions.

I don’t really have a problem with this exercise yet. But I’m confused about the behavior seen in this exercise. Specifically, after splitting the string at ‘,’ why is is the resultant list littered with ‘\n’?

In a previous exercise we were given the following use case for split(’,’):

authors = "Audre Lorde, William Carlos Williams, Gabriela Mistral, Jean Toomer, An Qi, Walt Whitman, Shel Silverstein, Carmen Boullosa, Kamala Suraiyya, Langston Hughes, Adrienne Rich, Nikki Giovanni"

author_names = authors.split(',')
print(author_names)

The results are:

[ 'Audre Lorde', ' William Carlos Williams', ' Gabriela Mistral', ' Jean Toomer', ' An Qi', ' Walt Whitman', ' Shel Silverstein', ' Carmen Boullosa', ' Kamala Suraiyya', ' Langston Hughes', ' Adrienne Rich', ' Nikki Giovanni']

Which is as I would expect. That is: no extraneous ‘\n’ line breaks are present.

However in this exercise using the same split() function in a similar manner like so:

daily_sales_replaced = daily_sales.replace(';,;', '>')

#print(f'\n Swapping out ;,; for > \n {daily_sales_replaced}')

daily_transactions = daily_sales_replaced.split(',')

print(f'\n Splitting at , {daily_transactions}')

Results in output that is littered with ‘\n’ :

 [  .....
\ngreen&white>09/15/17', '   Gail Phelps   >$30.52   \n> green&white&blue   > 09/15/17 ', ' Myrtle Morris \n>   $22.66   > green&white&blue>09/15/17']

Why?

My second question is, why is it not possible to utilize .strip(’\n’) to remove these characters? I have read through forum posts and learned that .replace(’\n’,’’).strip() works. But why does the latter work and not the former? Like so:

daily_transactions_linebreaksdel = []

for eachtransaction in daily_transactions:
    daily_transactions_linebreaksdel.append(eachtransaction.replace('\n','').strip())

To summarize my 2 questions:

  1. What causes the placement of ‘\n’ in the results? I’ve been trying to find a pattern in the string that would explain the occurrence, but it isn’t readily apparent to me.

  2. Why is it not possible to remove line breaks with .strip(’\n’) whereas .replace(’\n’, ‘’).strip() gets the job done?

1 Like

The first part of the question is verifiable in the terminal, this is normal behaviour:

test = "this, is, a, split, test"
test2 = "this,\nis,\na,\nsplit\n,test"
test.split(',')
#['this', ' is', ' a', ' split', ' test']
test2.split(',')
#['this', '\nis', '\na', '\nsplit', '\ntest']
test2.split(',\n')
#['this', 'is', 'a', 'split', 'test']

If you look in python docs you’ll see that it says nothing specifically about strip() removing escape characters. Specifically strip deals with leading and trailing characters (so you could remove line breaks if they were the leading and trailing characters of your entire string). replace() becomes a great alternative because you can target \n specifically.

@beta4085117449

test3 = "\n leading and trailing \n"
print(test3)
#
# leading and trailing 
#
test3.strip("\n")
#' leading and trailing '

str.strip([chars])

Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped:
>>>

>>> '   spacious   '.strip()
'spacious'
>>> 'www.example.com'.strip('cmowz.')
'example'

The outermost leading and trailing chars argument values are stripped from the string. Characters are removed from the leading end until reaching a string character that is not contained in the set of characters in chars. A similar action takes place on the trailing end. For example:
>>>

>>> comment_string = '#....... Section 3.2.1 Issue #32 .......'
>>> comment_string.strip('.#! ')
'Section 3.2.1 Issue #32'

https://docs.python.org/3/library/stdtypes.html

Welcome to the forums :slight_smile:

3 Likes

Thanks for the response!

If you don’t mind, I’d like to ask a few more questions regarding .strip()


The first part of the question is verifiable in the terminal, this is normal behaviour:

I see now that the '/n’ coincides with line breaks in the given string. This should have been obvious but somehow I missed that.

However I’m unclear on this claim:

Specifically strip deals with leading and trailing characters (so you could remove line breaks if they were the leading and trailing characters of your entire string). replace() becomes a great alternative because you can target \n specifically.

The syntax I found for split() indicates that it can be used to find and remove characters anywhere - not just leading and trailing edges. This example replaces multiple characters within a string. Is there a reason it works with these characters and not with ‘\n’?

The example you provided also illustrates my issue. i.e., What rules govern the behavior seen here:

>>> '   spacious   '.strip()
'spacious'
>>> 'www.example.com'.strip('cmowz.')
'example'

In the example above, what governs whether Python will remove the ‘m’ at the exterior and not in the interior. Clearly, it will move progressively inwards in the string to remove multiple w’s as well as the c, and o.

That works with \n as well:

txt = ",\n,,,,rrttgg.....banana....rrr"
x = txt.strip(",\n.grt")
print(x)
#banana

However if I try to strip something inside the banana:

txt = ",\n,,,,rrttgg.....ban#ana....rrr"
x = txt.strip(",\n.#grt")
print(x)
#ban#ana

… it doesn’t work (the banana is impregnable). So this is what is meant by leading and trailing.

The key point is that it works from the outside going in.
Again, look at the documentation quote (can also follow the link) for the official description of how it works.

That, and experimentation, will give a better sense of the expected behaviors of this method…

2 Likes

Thanks! I think I understand it now. Correct me if I am wrong, but Python will check for the provided characters at the leading and trailing edge, but will stop once it meets a character that is not mentioned - even if one of the characters occurs further ‘inside’, so to speak.

If we think of it linguistically, like taking of layers of clothes, we can only go from the outside in. Unless you have some fancy strapped clothes, but for the analogy’s sake I wouldn’t think about that hahah.

@beta4085117449 correct.

1 Like

Thank you for taking the time to talk me through this!

1 Like

Anytime. Good questions. It’s a good refresher for me too…

1 Like