Help with Censor Dispenser - Step Four

Hi,

I’m struggling to pass the step four of this project. The part when you’re asked to uncensor ONLY the first occurrence of forbidden words. The second, third, etc occurrence should all be censored.
This is my code:

phrase3 = ["concerned", "behind", "danger", "dangerous", "alarming", "alarmed", "out of control", "help", "unhappy", "bad", "upset", "awful", "broken", "damage", "damaging", "dismal", "distressed", "distressing", "concerning", "horrible", "horribly", "questionable"]
def **censorThree**(phrases, email):
    count = 0
    for phrase in phrases:
      if phrase in email:
        while count < 1:
          count += 1
    while count >= 1:
      for phrase in phrases:
        if phrase in email:
          email = email.replace(phrase, "[REDACTED]")
      return email
print("\n\n\n\nPART 3")
print(**censorThree**(phrase3, email_three))

And instead I censored all the forbidden words:

PART 3
Board of Investors

Things have taken a [REDACTED] turn down in the Lab. Helena (she has insisted on being called Helena, we’re unsure how she came to that moniker) is still progressing at a rapid rate. Every day we see new developments in her thought patterns, but recently those developments have been more [REDACTED] than exciting.

Let me give you one of the more [REDACTED] examples of this. We had begun testing hypothetical humanitarian crises to observe how Helena determines best solutions. One scenario involved a famine plaguing an unresourced country.

Horribly, Helena quickly recommended a course of action involving culling more than 60% of the local population. When pressed on reasoning, she stated that this method would maximize “reduction in human suffering.”

This [REDACTED]ous line of thinking has led many of us to think that we must have taken some wrong turns when developing some of the initial learning algorithms. We are considering taking Helena offline for the time being before the situation can spiral [REDACTED].

More updates soon,
Francine, Head Scientist

What am I doing wrong?
Any help would be appreciated :smiley:

while?

do the same thing as you would otherwise, with a condition

Hi, thanks for answering!

I honestly can’t understand which direction you’re trying to nudge me to. Can you maybe make it more specific?

Thanks :slight_smile:

1 Like

You should only censor IF you have seen enough bad words
…so you’d take your current censoring function, slap on a condition … add counting, done

Hey! Thanks again for the swift response.

I partly figured it out, but the last “out of control” string doesn’t seem to be censored when it’s should be. What’s the best move to censor this spaced string?

Here’s a step-by-step breakdown in case anyone’s wondering:

  1. Create an empty list called index
  2. Split the email and assign it to a list called emailSplit
  3. Iterate through emailword in emailSplit, and nest an if statement inside that iteration:
    3a. Check if emailword is in phrases,
    3b. if yes, assign a variable forbiddenIndex and assign it a value where the forbidden word is “located” (i.e. the index of the found forbidden word inside emailSplit).
    3c. Append every single forbiddenIndex to the premade index list (created in list 1)
  4. Check if len(index) is more than 1:
    4a. If no, pass.
    4b-1. If yes, assign another variable called new_index that’s basically the same as the now-full index list EXCEPT the first element (i.e. the zeroth index)
    4b-2. Iterate through new indices in new_index and change every value matching said index.
  5. Join the emailSplit list.
  6. Return that mf.

Code:

def censorThreeB(phrases, email):
    index = [] #1
    emailSplit = email.split(" ") #2
    for emailword in emailSplit: #3
        if emailword in phrases: #3a
            forbidden_index = emailSplit.index(emailword) #3b
            index.append(forbidden_index) #3c
            if len(index) < 1: #4
                pass #4a
            elif len(index) >= 1:
                new_index = index[1:] #4b-1
                for ind in new_index: #4b-2
                    emailSplit[ind] = "[REDACTED]"
    new_mail = " ".join(emailSplit) #5
    return (new_mail) #6

print("\n\n\n\nPART 3")
print(censorThreeB(phrase3, email_three))

Prints:

We can see that “out of control” has yet to be censored. Splitting the strings in phrases won’t work, as it will also censor words that shouldn’t be censored.

What is the best way to do this? If my code is flawed, please point me to the right direction.
Thanks :smiley:

I’d want to obtain a list of words, from that I can then iterate through each location, and at each location compare to each disallowed phrase. For each disallowed phrase I would look ahead as many words as there are words in the disallowed phrase. “regular” words are 1-word phrases, so it would not be a special case, it’s all the same thing.

After separating words from non-words one would have a list where every other element is a word and every other element is a non-word. One could pop out the words, send it to censoring logic, then reassemble the whole thing again when the results come back.

Hey! Thanks for the response.

I figured it out! Took me more than 10 hours but here it is:

  1. Create an empty list called more_than_one to represent values that are more than one word.
  2. Check all the words (iterate them) in phrases to see which has more than one word (i.e. the one that has space)
    8a. If yes, append that to the more_than_one list created in step 7.
  3. After that, iterate through that list to find whether the words inside are in the email.
    9a. If yes, replace that.

NOTE THAT STEPS 7-9 SHOULD HAPPEN BEFORE STEPS 1-6!
In other words, I replaced special strings (the ones with space) before splitting the actual email.

Code:

def censorThreeB(phrases, email):
    more_than_one = [] #7
    for forbiddenWords in phrases:
        if " " in forbiddenWords: #8
            more_than_one.append(forbiddenWords) #8a
    for more_than_one_count in more_than_one:
        if more_than_one_count in email: #9
            email = email.replace(more_than_one_count, "[REDACTED]") #9a
    index = [] #1
    emailSplit = email.split(" ") #2
    for emailword in emailSplit: #3
        if emailword in phrases: #3a
            forbidden_index = emailSplit.index(emailword) #3b
            index.append(forbidden_index) #3c
            if len(index) < 1: #4
                pass #4a
            elif len(index) >= 1:
                new_index = index[1:] #4b-1
                for ind in new_index: #4b-2
                    emailSplit[ind] = "[REDACTED]"
    new_mail = " ".join(emailSplit) #5
    return (new_mail) #6

Thanks @ionatan , you’ve been really helpful.

P.S. figuring that out felt really good omg

EDIT: Forgot the end results. Here it is:

1 Like

If you can handle “many” then that would include handling “one” using the same code

(is that supposed to be counting? because str.replace will replace all occurrences, if it’s early in the email then maybe it shouldn’t be censored)

The thing I did was censor all the special strings (i.e. the ones with space). So "out of control" will be replaced with "[REDACTED]" before it’s split to "out" "of" and "control".

Then after all the special strings has been censored, it’s safe to split the whole email :smiley:

What I mean is that if you have an email like this:

out of control Board of Investors,

Things have taken a concerning turn down in the Lab. Helena (she has insisted on being called Helena, we’re unsure how she came to that moniker) is still progressing at a rapid rate. Every day we see new developments in her thought patterns, but recently those developments have been more [REDACTED] than exciting.

(starting with “out of control”)

then that first instance will get redacted, but it’s the first disallowed phrase so it should be kept as is

I don’t imagine that’s an easily solvable problem. This is why I would want to turn the email into a list of words instead of operating on a giant string. Having tidy input makes the problem easier.

def indices_to_censor(phrases, email, after=0):
    should_censor = set()
    count = 0
    for i, phrase in product(range(len(email)), phrases):
        words_here = email[i : i + len(phrase)]
        if words_here == phrase:
            if count >= after:
                should_censor.update(set(range(i, i + len(phrase))))
            count += 1
    return should_censor

^ that’s my version. it gets to be short because there’s less work to do because I clean up both the phrases and the email into lists of words which is easy to operate on. it also doesn’t need to do any string specific operations for the same reason.

1 Like

Hi,
May I ask what product in line 4 is referring to?
Thanks!

1 Like

itertools.product

1 Like

Thank you for the solution, been tearing my hair out at Number 4! Plus i add this new skill to my skillset. :slightly_smiling_face:

Thank you @charischrisna3 too for posting your solution. The number of times an OP has runaway after finding out a solution without telling what the solution is, is very high!