How does split() work for arguments longer than a single character?

Question

In the context of this exercise, how does split() work for arguments longer than a single character?

Answer

When we provide an argument that is a string of more than one character, then it will split on wherever that argument string occurs in the original string, very similar to when you provide just a single character argument.

The length of the argument string can be any length, but when you input an argument longer than the original string, it will not return anything.

string = "This123is123a123secret123message"
result = string.split("123")

print(result)
# ["This", "is", "a", "secret", "message"]
7 Likes

So when we split the string, it’s not like the prior lesson where each letter it’s own index #? In this case each word is an index in the list? Is that why the negative index works to create the list of last names only?

2 Likes

In a string, each character in the string is indexed, beginning with zero.

In a list, each element of the list is indexed, beginning with zero. The elements are separated by commas. They can be of any type: int, float, string (as in the case shown), list, or any other.

8 Likes

If we split a string on nothing, will it produce a list of characters in the string?

>>> 'list of characters'.split('')

Traceback (most recent call last):
  File "<pyshell#4>", line 1, in <module>
    'list of characters'.split('')
ValueError: empty separator

That’s our answer. So we give it a separator that is not in the string.

>>> m = '^'.join('list of characters')
>>> m.split('^')
['l', 'i', 's', 't', ' ', 'o', 'f', ' ', 'c', 'h', 'a', 'r', 'a', 'c', 't', 'e', 'r', 's']
>>> 

and now it gives us a list.

Of course we could have done this with the built-in,

>>> list('list of characters')
['l', 'i', 's', 't', ' ', 'o', 'f', ' ', 'c', 'h', 'a', 'r', 'a', 'c', 't', 'e', 'r', 's']
>>> 

All that this teaches us is that str.split() needs a string that is not empty. This now allows us to infer that any string may be specified, regardless of length.

3 Likes

May I ask if someone can explain this part of the code to me:

author_last_names.append(name.split()[-1])

8 Likes

Let’s say we have an author name,

name = "Edward Arthur Milne"

The line above will append the last name, ā€˜Milne’ to the author_last_names list.

First we split the name,

['Edward', 'Arthur', 'Milne']

Then we access the last element (-1 means read from the right) and append it.

10 Likes

Thank you very much. I am getting the hang of it, I guess.

@mtf
i’m sorry but i still can’t understand the following code :

author_last_names.append(name.split()[-1])

1 Like

No problem.

name.split()

returns a list.

name.split()[-1]

is the last element in that list.

The rest of the code should be understandable, we append that element value to the last_names list.


Edited 21/09/2019

5 Likes

i did it like this its not pretty but works i dont understand how to do it so the code know the last name

author_names = authors.split()
author_last_names = []
for a in author_names:
  if "," in a:
    a = a[:-1]
    author_last_names.append(a)
author_last_names.append(author_names[-1])
print(author_last_names)

this prints

['Lorde', 'Williams', 'Mistral', 'Toomer', 'Qi', 'Whitman', 'Silverstein', 'Boullosa', 'Suraiyya', 'Hughes', 'Rich', 'Giovanni']

i had to manually add the giovanni because it didnt have ā€œ,ā€ after the name still gives me error

You might consider the comma to be the delimiter. (What are you currently splitting by?)

But without making that change, if you look at your code there is a condition for adding a name, and the last one will indeed not meet that condition.

You might also want to make sure that it handles empty input, and single-name input may be worth checking too.

If there are spaces between the names in your input then I think this is what’s supposed to be there:

authors = "Audre Lorde,Gabriela Mistral,Jean Toomer,An Qi,Walt Whitman,Shel Silverstein,Carmen Boullosa,Kamala Suraiyya,Langston Hughes,Adrienne Rich,Nikki Giovanni"

ok i did it again this time with this code

author_names = authors.split(",")
author_last_names = []
for a in author_names:
  author_last_names.append((a.split())[-1])
print(author_last_names)

the output now is ok

['Lorde', 'Williams', 'Mistral', 'Toomer', 'Qi', 'Whitman', 'Silverstein', 'Boullosa', 'Suraiyya', 'Hughes', 'Rich', 'Giovanni']

still ugly code

Hi, thanks for the above but I still don’t understand how the syntax works with name.split()[-1]
With the split method in the middle, how do we know the [-1] still refers to the name list.
I tried doing it separately for this reason, first making separate list, then selecting last element from new list.

name.split() results in a list. At this point the list has no name, but its elements can be referenced by index. [-1] is the index of the last element in that list.

We can give the split result a name,

names = name.split()

and as you have done, refer to that list to poll the last element…

names[-1]
3 Likes

This is a perfect explanation and easily comprehensible for me if we are dealing with just only a full_name of an author. But in this case where you have names of several authors, it become extremely confusing for me to understand that each author’s full_name will be treated as a list of names with indices within which implies that for each author, their name indices starts from 0 to the number of names given, the next author’s indices again starts from 0 to the number of names given etc. Please kindly expand your explanation. Thank you

I don’t know what to expand upon. It answers the question above it, namely explain that line of code.

author_names is a list of full names as gleaned out of the original string. That list will have its own indices, and as we iterate it, the index of the next item increases moving right.

We are treating each name at a time, first by splitting it to extract the first, (middle), last names as a list, then appending only the last to our new list.

On the whole, apart from the [-1], we never need to look at the indices.

for a in author_names:

is Read Only.

1 Like

@mtf And does the below answer my question? kindly add if there is any gap.

authors = "Audre Lorde,Gabriela Mistral,Jean Toomer,An Qi,Walt Whitman,Shel Silverstein,Carmen Boullosa,Kamala Suraiyya,Langston Hughes,Adrienne Rich,Nikki Giovanni"

author_names = authors.split(',')

print(author_names)

author_last_names = []
for name in author_names:
  author_last_names.append(name.split()[-1])
  
print(author_last_names)
  1. We already have author_names as a list by saving authors.split(',') into it. This also implies that the full name of each author has a single index
  2. The code line for name in author_names: make it posibble for us to be able to have indices within each name in author_names
  3. The code line author_last_names.append(name.split()[-1]) looks at the indices within each author’s names and append [-1] index to author_last_names
2 Likes

in that list. Correct to no. 2. Correct to no. 3.

That pretty much sums it up.

You say index but you’re not processing that information anywhere. You can for example not write out any indices you’re using… because you’re not.

Nor are they interesting information, so, don’t mention them I guess?

What you are interested in is having access to each value in turn… That’s iteration.

You could for example write a function that accepts a name and carries out some action.

Then you would call that function once with each name as the argument. Note, no index is being given to the function. It’s the value that you would be giving to the function.

1 Like

I think the confusion we’re having with this is it’s the first time in the course we’ve seen a syntax in this order. In my novice mind, it should look something like:

name.split([-1])

It feels like the [-1] needs to be a parameter of .split(). Which of course it’s not.

Is name.split()[-1] effectively shorthand for something?

1 Like