Make Censor Better(Spoiler)

python

#1

Hey All,

I'd advise anyone to not read this until they figure out the solution on their own. For those who have, I was wondering if anyone had any nifty tricks that could perhaps make my function more efficient.


def censor(text, word):
    asterisks = []
    for i in range(0, len(word)):
        asterisks.append('*')
    asterisks = ''.join(asterisks)
    return text.replace(word, asterisks)

#2

One step would be to replace the entire asterisks structure and logic with a single expression.

    return text.replace(word, '*' * len(word))

The spolier here is that we get no real practice with algorithms when we use a built-in string method. Okay for production, but not meant for beginners who need to learn how to think these things through.


#3

That's perfect. I was aiming for something like that before, but I did not know you could multiply a string in the way that you did in the second param of the replace method. Thanks a lot.

Too bad CA didn't have an algorithm type place for more experienced developers to practice on. =(


#4

I'm not sure where this concept is introduced or discussed (if at all) but it does surface in practically everyone's code for Battleship!.

"O" * len(board)

The practice side of things for intermediate and above is the Weekly Challenge section. We can also discuss code concepts and enhance our understanding by starting topics in the Corner Bar that are off the Q&A track. This is where to really explore since the floor is open to all kinds of discussion that would only disrupt a Q&A topic. The current topic will be moved there, presently.

Now, let's consider the lead-up plan for an algorithm approach to the problem...

  • iterate text word by word
  • match word to word, and
  • remove the word and replace with filler of same length

This implies that we need a list to collect accepted or modified words since without the string method, we cannot mutate the string, and would have no practical way to identify individual words in a string.

text = text.split(' ')

It is not necessary to preserve the input string since it is a local variable we won't need again. This saves us creating another variable. We've mutated the string into a list. Note that there will be no space characters.

Now the iteration is straight forward. We are not changing the read in data, only accepting it, or rejecting and replacing. The outcomes are appended to the result.


    r = []
    for term in text:
        if term == word:
            r.append('*' * len(word))
        else:
            r.apend(term)
    return r

#5

Thanks for the help. I was unaware of the other forums for discussion. Pardon my ignorance. Good stuff.


#6

What's interesting to me is which of your solutions would be faster(or less resource intensive).

Your first solution was a nice one liner that used the replace method. I am not sure what it does behind the scenes, but it clearly loops through the text and can successfully find all occurrences of the word.

In your more in depth explanation, it seemed as you were opting for:

  1. Splitting the text parameter variable into a list made up of each word from text
  2. Creating a new list that will eventually contain our solution
  3. Looping through the list mentioned in step 1, and appending each word into our new list.

However, if we go this route, we will also need to join the words in the new list back into a string before we return it. Clearly, this seems much more resource intensive than the one liner. I have learned though, that less code is not always faster, as some recursive functions blow up if the data being passed into them is large enough.


#7

It actually finds all occurrences of the string expression. It does have a limiting value option, otherwise it is greedy. However, it still grabs in the order it finds.

>>> "one two one two one two one".replace('one', 'two', 3)
'two two two two two two one'
>>>

Which I was under the belief was self-evident. We are expected to return a string so our program would need to implement that last step. The key is we can find word units and manipulate them. The end result is the desired string with the word censored out of it.

Recall that in Python, and many other languages, strings are immutable. The only way replace is able to work in the background is with a transient object that is replaced each time the method iterates over the string. The string itself is not being manipuled, but replaced each time.

For us to do this would mean a lot of work. First we would need a mask, word. Then we would need to iterate over the text string, letter by letter, and overlay the string with the mask so that the current letter corresponds with the first letter of the mask.

When a match occurs, we take a slice on the left of the match, and a slice on the right of the match, then reassign the concatenation of left slice, replacement text, right slice, back onto the variable. It is a new string assignment each time.

See if you can write this type of replacement algorithm, for the sport. I'll give it a go, as well, though you might be quicker.


#8

x = "one two one two one two one"
m = "one"
n = len(m)
i = -1
while i < len(x) - n:
    i += 1
    k = x[i:i + n]
    if k == m:
        print "found at i = %d" % i

 > 
 found at i = 0
 found at i = 8
 found at i = 16
 found at i = 24

That gets us part of the way there.

Now we do the string replacement...

        left = x[:i]
        right = x[i + n:]
        x = left + '*' * n + right
        i += n - 1

 >   
 found at i = 0
 found at i = 8
 found at i = 16
 found at i = 24
 > x
=> '*** two *** two *** two ***'
 >

In fairness to the truth, x is never transient in this process; more in place. One can get the implications despite of my earlier description. Needless to say, the object is overwritten four times in this process. It is the only way in Python that we can mutate a string.


docs

Up to but not including the match, from index 0...

        left = x[:i]

First character after the match string and beyond...

        right = x[i + n:]

Replace object with reconstructed string...

        x = left + '*' * n + right

Move pointer to the edge of replacement where next input is new...

        i += n - 1

The last step is not possible in a Python for loop; why I used a while which allows us to manipulate the control variable.


#9

It boils down to this…

Code
# string replacement method of censoring
# by Roy
# https://discuss.codecademy.com/t/make-censor-better-spoiler/146070

def censor(x, m):
    n = len(m)
    i = -1
    while i < len(x) - n:
        i += 1
        k = x[i:i+n]
        if k == m:
            x = x[:i] + '*' * n + x[i + n:]
            i += n - 1
    return x
    
print censor('one two one two one two one', 'one')

#10

This is really clever. Is this what the replace method is doing behind the scenes for us, or is this your own algorithm? Took me a bit to walk through and understand it, but it seems like it would be pretty efficient. Lets pretend that we don't have any matches, it still creates a temporary variable(k) that is equal to a certain slice of m each pass through the while loop, but I would assume something like that doesn't require much computing power.

I come from Javascript and a little bit of PHP, with no comp sci background, so I'm at the point of my learning where I am starting to question all of the code I write in the real world, and am really trying to understand how to make it as efficient as possible. Python is new to me, but I understand the other two languages well enough to jump in pretty quickly.

As you can see, I have some gaps of knowledge, and should probably pick up some material on exactly how my code effects the computer.


#11

It's my algorithm based on the pseudo design model. I've never seent he actual code for str.replace(), only theorized that it must work along the same lines. I suspect the actual code is compiled C.

There will be len(x) - n iterations of the k slice, yes, and regardless whether there is a match or not. I can't speak to the efficiiency but one would be hard pressed to make it more efficient. There is no fat to trim.

I too come from JS and PHP, but only as a hobbyist. I have no schooling in this discipline, and being in my sixties cannot see that ever happening. I'm content to spend my time in the trenches with learners and pick up an advanced topic now and then when my head is clear enough to soak it all in.


#12

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.