FAQ: String Methods - Splitting Strings

This community-built FAQ covers the “Splitting Strings” exercise from the lesson “String Methods”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Computer Science

FAQs on the exercise Splitting Strings

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

7 posts were split to a new topic: What does the delimiter do?

I don’t fully get this method. In exercise description when you have a code like this:

man_its_a_hot_one = “Like seven inches from the midday sun”
man_its_a_hot_one.split()
[‘Like’, ‘seven’, ‘inches’, ‘from’, ‘the’, ‘midday’, ‘sun’]

So here it takes every element separated by a space, creates a new list and puts every of those elements as a new element of the list, each on of tchem inside “” and separated by commas.

But in another topic to this exercise I saw in one post code like this:

a = “#”.join(‘mississippi’)
a
m#i#s#s#i#s#s#i#p#p#i
b = a.split(’#’)
b
mississippi

Here it took every element that was separated by argument and instead of doing the same thing as above (so [‘m’, ‘i’, ‘s’, ‘s’, ‘i’, ‘s’, ‘i’, ‘p’, ‘i’]) it just “glued” tchem together.

Why is that?

You mean this example?

>>> a = "*".join('mississippi')
>>> a
m*i*s*s*i*s*s*i*p*p*i
>>> b = a.split('*')
>>> b
mississippi

The two methods, str.join() and str.split() are inverse functions. a above is the character sequence with a separator string inserted between characters. When we split on the same character it returns the original sequence.

The same works in reverse. If we split a sequence on a given separator (default is a space character) then we will be able to join with the same character and restore the original state.

So if I use this method when join() was previously used it will just glue tchem together and if I use split on the same code but in a situation when join() was not used it will create a list with separate characters separated with “” and commas. Is my assesment correct?

A sequence of characters, a string, may be split into a sequence of strings, a list. If we do not alter the list, it may be joined again to restore the original string.

>>> a = 'mississippi'.split('iss')
>>> a
['m', '', 'ippi']
>>> b = 'iss'.join(a)
>>> b
'mississippi'
>>> 

Every element in the split list is a string, including the empty string we see above. That element is crucial if we are to re-join and restore the original.

I kinda understand what you are saying right now but I have no idea how it answers my question. What I gathered right now is that join and split remember what was the original state of a string that is operated on.

I really need to take it slow and in pieces to understand some concepts so I really need to know if my previous assesment is correct in a “yes” or “no” kind of answer, otherwise I have a problem with conecting what you are saying to the rest.

They don’t ‘remember’ anything. We do. We’re the one’s who decide what to split on, and if we join using the same separator we arrive back at the original.

We cannot split a number, such as, 1234567. We can only split strings, which is why we get a list of strings. A list consists of elements separated by commas.

>>> s = "A quick brown fox jumps over the lazy dog"
>>> t = s.split()    # same as `s.split(' ')`
>>> t
['A', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

When we go to join, we must remember to replace the space character that was removed in the split.

>>> u = ' '.join(t)
>>> u
'A quick brown fox jumps over the lazy dog'
>>>

join/split have specific behaviour, read what that is before you start speculating.

>>> help(''.join)
join(iterable, /) method of builtins.str instance
    Concatenate any number of strings.

    The string whose method is called is inserted in between each given string.
    The result is returned as a new string.

    Example: '.'.join(['ab', 'pq', 'rs']) -> 'ab.pq.rs'

>>> help(''.split)
Help on built-in function split:

split(sep=None, maxsplit=-1) method of builtins.str instance
    Return a list of the words in the string, using sep as the delimiter string.

    sep
      The delimiter according which to split the string.
      None (the default value) means split according to any whitespace,
      and discard empty strings from the result.
    maxsplit
      Maximum number of splits to do.
      -1 (the default value) means no limit.

As you can see, str.join accepts an iterable of strings, so both your examples are the same thing and both of them behave the same.

But as you can see above they don’t. In one case I got strings glued together and in another I got them as a list, each separated with a coma and inside “”.

The thing you pasted is beyond my ability to comprehend right now, it’s very technical and I dont understand even first sentence “join(iterable, /) method of builtins.str instance”

ELI5?

which is it? string or list?

no such thing as “inside quotes”, you can’t put things in quotes, quotes isn’t a container, it’s not a list

If you look at the documentation, str.join returns a string, and str.split returns a list of strings

I mean something like this one

So … it did what it promised to do?

read it. it’s no more technical than what you need to know to use it.

hei guys ,
i wonder if there’s any way i could use the split() method to split this given string in 2.
for ex :

line1 = “the sky has given over”
line2 = line1.split(‘‘Do not know what to enter here’’')
print(line2)
so it results in [ "the sky has ", ‘given over’]
Any idea if such manipulation of a string is possible with the split method or join method ?

We could use split to make a list, then divide the list in two and rejoin the words.

>>> def split_sentence(line):
    words = line.split()
    n = len(words) // 2
    n += not n % 2 and 1 or 0
    return f"{' '.join(words[:n])}\n{' '.join(words[n:])}"

>>> print (split_sentence("A quick brown fox jumps over the lazy dog"))
A quick brown fox jumps
over the lazy dog
>>> print (split_sentence("A quick brown fox jumps over the moping lazy dog"))
A quick brown fox jumps
over the moping lazy dog
>>> 

There are likely concepts above that may not be familiar…

  • // => floor division
  • += => augmented assignment operator
  • not n % 2 and 1 or 0 => logic to yield 1 when n is even else 0
  • return f"{..}\n{..}" => f-string formatting
  • \n => newline escape sequence
  • words[:n] => slice of first n - 1 words
  • words[n:] => slice of words from n to end

Using your accumulated knowledge, see if you can interpret this to naive (unrefactored) code that only uses concepts you have already learned.


Let’s work through the code and reduce it to more naive terms as we go. We must at least understand what the int() constructor does, as it usually precedes introduction of floor division.

>>> def split_sentence(line):
    words = line.split()
    n = int(len(words) / 2)  # expanded form of x // 2
    n += 0 if n % 2 else 1   # conditional expression vs. logic
    return f"{' '.join(words[:n])}\n{' '.join(words[n:])}"

>>> print (split_sentence("A quick brown fox jumps over the lazy dog"))
A quick brown fox jumps
over the lazy dog
>>> print (split_sentence("A quick brown fox jumps over the moping lazy dog"))
A quick brown fox jumps
over the moping lazy dog
>>> 

Notice we’re starting to back away from abstraction on the two commented lines. There’s still more work to be done.

The introduced concept is the conditional expression which is similar to a ternary expression in other languages. Python does not have a ternary operator so this is what we fall back to if that is the logic we are going after. It cannot be left this way since you may have never encountered it in previous lessons. Still, it is a step away from the abstract logic used earlier. As stated, more work to be done.

What if all we have is the sentence, and no int() constructor? Can we still arrive at a value for n? Well of course we can. Take out your pencil and paper… Draw a t-grid,

   odd  |  even
-----------------
        |
        |
        |
        |
        |

Go through the words and put them alternatively into the grid. One on the left, one on the right until all words are exhausted. We’re not doing anything with this grid so don’t care that it doesn’t make code sense. It’s serving a purpose by helping us split the sentence into two nearly equal lists.

Now cross off every row that has both an odd and an even entry. If there is nothing left at the end, add 1 to n.

The closer we can bring code to pencil and paper, the closer we are to naive code. Yes, this is a strange exercise, and almost seems like reverse engineering, but it’s not. It is the reduction of abstracts to their basest form. Keep playing with this. For sure I will be.

1 Like

Hi, for splitting strings is white space the main deciding piece?

White space is the default, so

'string split'.split()

will give,

['string', 'split']

as a return value.

We could also write,

'string split'.split(' ')

to get the same return.

That being said, we can use any character or sequence of characters as our split boundary.

 'mississippi'.split('ss')

will give,

['mi', 'i', 'ippi']

Aside

The inverse of str.split() is str.join(list):

>>> a = 'mississippi'.split('ss')
>>> b = 'th'.join(a)
>>> b
'mithithippi'
>>> 
1 Like

Thank you understood better now

1 Like