Censor Dispenser email_two - unexpected instance of substring that should have been removed

Hello, I’m working on the Censor Dispenser project:

I wrote the following in an attempt to solve the problem of email_two:

#defines censor function for a list of words or phrases

def censor_dispenser_lst(text):
  proprietary_terms = ["she", "personality matrix", "sense of self", "self-preservation", "learning algorithm", "her", "herself"]
  proprietary_terms_title = []
  for i in proprietary_terms:
# created a bigger list of the terms plus the title case versions
# to catch instaces where the word started the sentence
  super_lst = proprietary_terms + proprietary_terms_title
  new_text = text
  after = [" ", ",", ":", ";", "!", ".", ")", "-", "?", "\"", "\'", "s"]
  before = [" ", "\n"]
  for censor in sorted(super_lst, key=len, reverse=True):
    while censor in new_text and new_text[new_text.find(censor)+len(censor)] in after and new_text[new_text.find(censor)-1] in before:
      new_text = new_text[0:(new_text.find(censor))] + "x"*len(censor)+ new_text[(new_text.find(censor)+len(censor)):]
# basically going from longest word to shortest, 
# does the big string have this word with one of these characters
# before or after it? if so replace each character in the string
# with an x
  return new_text.replace("xxs","xxx")
#return the string, cleaning up the plurals

# censor_lst function used on email_two
email_two_censored = censor_dispenser_lst(email_two)
print("subject: email_two_censored")

The problem is this:

There’s that one solitary ‘her’ hanging out down there and I can’t get it to go away without butchering the word ‘researcHERs’ above it in the email. The thing is that it suits the criteria in the boolean I made above for selecting such instances. All the other instances of ‘her’ and all the other offending terms were removed. I can’t figure out why it’s there. It’s driving me up the wall!!

Is there anyone who can take a look and tell me what could possibly be causing this?

Edit: sorry about all the edits, there were a couple of different versions I came up with and I posted the wrong code and pic the first time!!

presumably you have other instances of “her” earlier in the string, and if you’re using find and similar things then which position is that going to report? If you mean to look at a specific location, then do that

more generally, look at what your conditions result in when you think they should have made the replacement happen

personally I’d write a function that only looks at one word and says whether or not it should be censored, and then make use of that function to make decisions for the overall text

1 Like

Thanks. And I can get it to work, like I said by butchering ‘researchers’. I’m just trying to understand specifically why it’s not working here because everything I think I know says it should work. Why would it not work with just that instance of ‘her’ when it removes everything else in the list with no problem? I’ve been over the code with a fine-tooth comb. It should be working. I’m trying to find someone who can help me figure out why exactly it’s not because I’m either overlooking something important or something I think I know is wrong. Either way, I think there’s a lot to be learned in knowing why that ‘her’ is there.

Thanks again.

Well I made a guess as to why. And if that’s the case then it should not be working. I also said how you’d work around it. And how to verify whether that’s happening.

1 Like

oh, I see, then I didn’t understand.

You said:

presumably you have other instances of “her” earlier in the string, and if you’re using find and similar things then which position is that going to report?

the fisrt instance. But the code accounts for this, it keeps checking the string until there are no more instances of the substring there. That’s exactly how it works with all the other substrings in the list of offending phrases. Seriously, try it. It took out several instances of ‘her’ and other substrings on the list. In fact, if I turn the second ‘and’ to an ‘or’ it’ll remove not only that last instance of her, but also the one in the word ‘researchers’. My goal is to get rid of that last her, but leave the one in researchers. That why I posted this question.

I get that that find only returns the first instance of the substring, but I coded for this. The code works with all the other substrings in the list, several of which have more than one instance. That’s the mystery I’m trying to solve.

I hope my tone doesn’t sound off. I’m genuinely appreciative of your help. I’m just frustrated.

No that’s not what the condition is, there are several other things in that condition

while (
# this is how you describe it
censor in new_text

# there's also this... which is .. a lot to not mention
and new_text[new_text.find(censor)+len(censor)] in after
and new_text[new_text.find(censor)-1] in before
# and since they're using `find`, what are they looking at?
1 Like

yeah, you’re right. what comes before it and what comes after it. But if I look at the code and this specific instance of her

 while censor in new_text and new_text[new_text.find(censor)+len(censor)] in after:
    new_text = new_text[0:(new_text.find(censor))] + "x"*len(censor)+ new_text[(new_text.find(censor)+len(censor)):]

while ‘her’ in new_text —> true, it’s still there
the index after the word contains one of the substrings in the list ‘after’ ----> true (it is " ")
the index before the word contains one of the substrings in the list ‘before’ ----> true (it is " ")

so why isn’t the code substituting the substring with x’s??

And don’t get me wrong, I know I’m missing something I’m just not getting your hints, I’m too dense. You’re just gonna have to tell me what the problem is.

You are treating find as if though it is providing the location that you mean.
If you mean a specific location, then you don’t need find. If you care which, you can’t use find.

1 Like

but wouldn’t ‘find’ be returning the location of that specific iteration of the substring? If I’m iterating through the list and want to look for things that come before or after that specific iteration, don’t I need to use find? How else could I automate it? I don’t know the specific location of the index before or after some substring when I write the code. Why isn’t it providing the location I mean? And why does it work for every other instance of every other substring in the list of proprietary terms?

it provides the first location

1 Like

right, but each time the string it’s checking changes, the first term changes. So at the point it’s checking for this instance the first the first location is the substring I’m looking to get rid of

the first one is in researchers

1 Like

omfg, you’re right…

thanks so much, I don’t know why I was so blind to that

And the solution is to take your index finger, point at the first word. Should this be censored, yes/no? Move finger to next word. Should this be censored?

By the way, no repeated searching via find, simpler, faster.

You may also be able to use information like all locations of something, and then considering each one. That’s not really different from considering one word at a time though.

1 Like

Thank you for doing this with me. I really appreciate it and it’ll make me better for sure.

In your defense codecademy does lead you into using replace and find and that kind of thing. And you can make something that sorta works with that but those tools completely lack the concept of word and whatever else, and again, codecademy is talking about words.

Try not to solve a whole problem in one lump of code. Solving subproblems with functions can steer you into a much better direction, testing a single word means that you get forced to extract one word at a time and there are already several mistakes that you cannot do.

And be super suspicious of when code gets too complicated to understand fully in your head, writing more difficult code isn’t about keeping lots of things in your head it’s about splitting it up into small problems, to create abstractions, tools, that you can then use to solve the bigger problem. You can look at a problem and ask, hey, what fancy functions would make this easy? Then go write those. And those might break into smaller parts too.

If you were to split this text, where would you do so? It might be tempting to say space, but what you really care about is letters and non-letters, so it’s where this changes that you would want to split the text. You would then have words and non-words, and you could step through them, do things to the words, and finally put it back together again. Does that sound simple? Well there’s a part missing there, the function for splitting at those boundaries. You’d need to write this if you were to go this way. If you have this function, then the overall problem becomes easy, and the problem of splitting is easier than the overall problem, so this would bring down the difficulty of implementing a solution.

1 Like

That’s extremely helpful advice, both conceptually and practically. Thanks again.

This topic was automatically closed 18 hours after the last reply. New replies are no longer allowed.