Censor Dispenser issue

Hi guys i have tried to strip special characters in my code but it is not working, i have done everything i can but words before special characters refuse to censor e.g ( help! ) i added help in the negative_words but the code does not even identify that help! exist in negative_word since i have applied strip, i expected it to identify it. That means my strip does not work

Also i noticed that learning algorithms was not censored and i found out that this is from the email content being split which ends up splitting the learning algorithms as two words instead of one. Is there any other way to iterate through a string without splitting it and converting it to a list?


def censor_fours (email_content, negative_word_list, censor_list):
  email_list_content = []
  email_content_lower = email_content.lower()
  content = email_content_lower.split()
  negative_word_new = []
  censor_new = []
  special_char = [",", "!", ".", "/", "("]
  for lower_word in range(len(negative_word_list)):
    for i in special_char:
  for censor in range(len(censor_list)):
    for i in special_char:
  for word_len in range(len(content)):
    similar = []
    before = ""
    after = ""
    for i in special_char:
      clean_word = content[word_len].strip(i)
    if clean_word in negative_word_new or clean_word in censor_new:
      word_before = content[word_len - 1]
      word_after = content[word_len + 1]
      for word in similar:
        censored_word = "" 
        for letter in range(len(word)):
          censored_word += "x"
        content[word_len] = content[word_len].replace(word, censored_word)
        for letter in range(len(word_before)):
          before += "y"
        for letter in range(len(word_after)):
          after += "z"
        content[word_len-1] = content[word_len - 1].replace(word_before, before)
        content[word_len+1] = content[word_len+1].replace(word_after, after)
  return " ".join(content)
call_censor_fours = censor_fours(email_four, negative_words, proprietary_terms)

Doesn’t iterating through it MEAN splitting it? Unless you mean characters. And besides, what’s wrong with having a list?

>>> from itertools import groupby
>>> [''.join(g) for _, g in groupby("blah, blah, bleh.", key=str.isalpha)]
['blah', ', ', 'blah', ', ', 'bleh', '.']


Thanks for the quick response.
The problem with splitting it is that it will split the learning algorithms into two words, i want a way that i can preserve the learning algorithms as a word. any idea

it’s not a word, it’s two
nothing’s stopping you from looking further ahead, you might for example make a loop that looks at each location, grabbing two words from there (or rather, grabbing the amount of words that you’re currently concerned with, whatever the amount of the disallowed phrase is)

Alright thanks for the help

You could also look at each character position and ahead enough to match whatever disallowed words you’re looking for, some kind of find all subsequences…But now you have a different problem, word boundaries.

regex can match on word boundaries.