Censor Dispenser: how to deal with the line breaks?

Hi everyone,

I’m trying to solve the Censor Dispenser challenge in Python and I’m having a very hard time.

I’m at exercise five, and I’ve made a function which censors certain words from an email. In order to do this, I’ve had to make the text into a list of words, by splitting it on the spaces and the line breaks. Problem is, when I put the words back together, I don’t know how to put the line breaks back in the right place. Since the removing of the line breaks results in an empty slot ("") in the list, I thought I could add the linebreaks back by doing this:

    if word in words_txt == "":
        word = "\n"

But that didn’t work. This is my function:

forbidden = ["concerned", "behind", "danger", "dangerous", "alarming", "alarmed", "out of control", "help", "unhappy", "bad", "upset", "awful", "broken", "damage", "damaging", "dismal", "distressed", "distressed", "concerning", "horrible", "horribly", "questionable", "she", "personality matrix", "sense of self", "self-preservation", "learning algorithm", "her", "herself"]

def censor(txt):
    words_txt = []
    censored_mail = ""
    for word in txt.split(" "):
        bla = word.split("\n")
        for word1 in bla:
            words_txt.append(word1)
    for i in range(len(words_txt)):
        word_lowercase = words_txt[i].lower()
        if word_lowercase in forbidden:
            words_txt[i] = str("*" * len(words_txt[i]))
            words_txt[i-1] = str("x" * len(words_txt[i-1]))
            words_txt[i+1] = str("#" * len(words_txt[i+1]))
    if word in words_txt == "":
        word = "\n"
    censored_mail = " ".join(words_txt)
    return censored_mail

Is there anyone who can help me?

cheers,

Mango

1 Like

One idea might be to split on line breaks, and keep the data structure intact from that point. Iterate over the rows, split them, and put them back into the data structure. The censoring could be done in that step, as well (or not). When you want to restore the original line breaks, join each row with spaces, then join the data structure with a newline.

I’ve started a mock up of this proposition on repl.it

At present it only splits into lines, then splits the lines. Run to see line count, and longest line polls.

Poll the lines array to view content.


Proof of concept

def line_splitter(document):
  lines = [line.split(' ') for line in document.split('\n')]
  return lines

def line_joiner(lines):
  return '\n'.join([' '.join(line) for line in lines])

def censor(document, proprietary=[]):
  lines = line_splitter(document)
  for line in lines:
    for x, word in enumerate(line):
      if word in proprietary:
        line[x] = "*" * len(word)
  return line_joiner(lines)
1 Like

ok, so I think the solution you’re saying is to first break up the text in lists between the linebreaks, and then to break up those lists in words. So, essentially, I’ll just have to reverse the order in which I break up the email.
I’m going to give this a try, thank you!

1 Like

It’s easier to treat it as a single string and replace all occurrences.
Is that cheating the exercise? Yes, probably, but it is also the better outcome, is it not?


But if splitting…

There are many kinds of whitespace, and \n is just one of them. If, when splitting the text, one keeps all the delimiters regardless of what they are - then there’s no need to have lists inside lists to accommodate lines and words (would get worse if caring about more kinds of space)

>>> import re
>>> # match just one space at a time:
>>> re.split(r'(\s)', 'hello\n\rworld\t')
['hello', '\n', '', '\r', 'world', '\t', '']
>>> # or match multiple to group the spaces, they're not interesting anyway
>>> re.split(r'(\s+)', 'hello\n\rworld\t')
['hello', '\n\r', 'world', '\t', '']

Now the words are nicely isolated and can be operated on. Strings have a method for testing if they’re space which can be used to tell words and space apart.

In the pattern, \s means any whitespace, and the parentheses says to include them in the result. + means “one or more”.