Improved Censor Dispenser

Hi all,

I’ve been messing around with the Censor Dispenser exercise for a while, in particular part 3 where the goal is to write a function that censors items occurring in a list. So far, I’ve come up with this:

proprietary_terms = ["she", "personality matrix", "sense of self", "self-preservation", "learning algorithm","herself", "her", ]

def censored(email, to_censor):
  for word in to_censor:
     email = email.replace(word, 'XXXX')  
  return email

Which replaces the words in the list with ‘XXXX’. It’s not as refined as the solution code:

def censor_two(input_text, censored_list):
  for word in censored_list:
    censored_word = ""
    for x in range(0,len(word)):
      if word[x] == " ":
        censored_word = censored_word + " "
        censored_word = censored_word + "X"
    input_text = input_text.replace(word, censored_word)
  return input_text

print(censor_two(email_two, proprietary_terms))

Now, when printing both, the output is slightly different (my code just replaces words in a string whereas the solution replaces each letter in a word occurring in the input list with an ‘X’).

There are three problems here that the solution also doesn’t address:

  1. ‘herself’ will be replaced by ‘XXXself’ (because “her” is before “herself” in the input list)
  2. ‘She’ will not be replaced because the string is capitalised.
  3. Words containing a string from the input list will be censored as well, e.g. researcXXXs.

I’ve tried resolving these but am all out of ideas.

For 1. I took the lazy approach and moved ‘herself’ before ‘her’ in the list). I guess this has to do with list iteration.
For 2. I want to check the email against capitalised words in the list. I tried to resolve this by adding .title() here and there:

def censored(email, censor):
  for word in censor:
    if word.title() in email:
      email = email.replace(word.title(),'XXXX')
  return email

The problem is that although “She” and “Her” are now censored, lowercase strings ‘she’ isn’t.

For 3 , I though adding a space before the item in the list would prevent words such as ‘researcher’ from being censored:

def censored(email, censor):
  for word in censor:
    if " "+word in email:
      email = email.replace(word, 'XXXX')
  return email

This doesn’t make a difference, and I don’t really know why it doesn’t.

Now, I added an elif statement to the code I made for problem 2, which somehow resolved problem 3:

def censored(email, censor):
  for word in censor:
    if word.title() in email:
      email = email.replace(word.title(),'XXXX')
    elif word in email:
      email = email.replace(word,'XXXX')
  return email

Now I guess this got resolved because I capitalised the string ‘her’ to ‘Her’ which doesn’t match in ‘researchers’. I just don’t really know what part of the code makes it do this. Swapping the if and elif statements does the same as the code I initially wrote.

Any thoughts on how to get a clear code that takes care of the three issues mentioned?


EDIT: would converting the entire string to a list, then run all the code to censor items from that list, and then convert the list back to a string work?

@java5668470000, interesting take on the assignment. The provided solution does not cover the instances you mention.

The way that one would approach this assignment in “real life” would be via the use of regular expressions, but that’s not the subject being explored here, so it takes some doing to excise every instance of a string only when it is standing “alone”.

Here is one try at it:

s_1 = '''Her reseacher said to do her own thing,
if not just by herself,
then it should be her friend Sherman and her.'''
s_2 = "her"
punctuation = [",", "!", "?", ".", "%", "/", "(", ")"]
s_1, s_2 = s_1.lower(), s_2.lower()
if s_2 in s_1:
    idx_lst = []
    idx = s_1.find(s_2)
    # make a list of all occurrences of s_2
    while idx != -1:
        idx = s_1.find(s_2, idx + 1)        
    # explore that list for criteria such as leading or trailing whitespace or punctuation
    for idx in idx_lst:
        index_1 = idx
        index_2 = idx + len(s_2)
        begin, end = True, True
        if not index_1 == 0:        
            begin = s_1[index_1 - 1] == " "                
        if index_2 < len(s_1):
            end = (s_1[index_2] == " ") or (s_1[index_2] in punctuation)
        # if criteria met, make substitution        
        if begin and end:          
            s_1 = s_1[:index_1] + "XXX" + s_1[index_2:]



XXX reseacher said to do XXX own thing,
if not just by herself,
then it should be XXX friend sherman and XXX.

(It will take a bit more to restore the missing uppercase, obviously!)