Censor - why split?


#1

In this lesson, why are we splitting the string and rejoining? Can we not just replace the words in the phrase with the word entered?


#2

We can, but it is far too simple since the str.replace() method does all the heavy lifting.


>>> def censor(text, word):
    return text.replace(word, '*' * len(word))

>>> censor("This hack is a whack", 'hack')
'This **** is a w****'
>>>

As we can see, though, it is greedy, taking all matched strings whether a word or part of a word.

To get a closer look at the words, it makes sense to isolate each one and compare them individually to word so that only exact word matches are censored.


>>> def redact(text, word):
	x = text.split()
	return ' '.join(['*' * len(word) if y == word else y for y in x])

>>> redact("This hack is a whack", 'hack')
'This **** is a whack'
>>>

#3

Ahh.. much thanks...


#4

Now take the time to reason out the all out mechanical form that doesn't use a comprehension or a built-in. That's what this exercise is all about. Getting down and dirty with bare bones algorithms. Nothing fancy, just spelled out steps.

You can if you wish deconstruct the comprehension, but what about other approaches as well. Play with it and know it forwards, backwards, are sideways. Then you'll be having fun, and teaching your brain to reason out steps. You win, by the effort.


Study

First we find exact match in the string...


>>> def _censor(text, word):
	n = len(word)
	m = len(text)
	for i in range(m - n + 1):
		if word == text[i:i + n]:
			print (text[i:i + n])

			
>>> _censor("This hack is a whack", 'hack')
hack
hack
>>> _censor("hack this hack is a whack", 'hack')
hack
hack
hack
>>>

We're able to find all instances of the string expression front to back. The next thing will be to determine if they are surrounded by white space or the start or end of the string. That will be a little trickier and involve a good deal of logic, I suspect.

Showing pattern match redaction:


>>> def _censor(text, word):
	n = len(word)
	m = len(text)
	for i in range(m - n + 1):
		if word == text[i:i + n]:
			text = text[0:i] + '*' * n + text[i + n:]
	return text

>>> _censor("hack this hack is a whack", 'hack')
'**** this **** is a w****'
>>> _censor("shack this hack is a whack", 'hack')
's**** this **** is a w****'
>>>

But of course this is not addressing anything, just showing the completed base function.


Pursuing this further...


>>> def _censor(text, word):
	n = len(word)
	m = len(text)
	for i in range(m - n + 1):
		if word == text[i:i + n]:
			text = text[0:i] + '*' * n + text[i + n:] \
			if text[0:i + n] == text[i:i + n] or \
			       (i > 0 and text[i - 1] == ' ') and \
			       (i + n + 1 < m and text[i + n] == ' ') \
			else text
	return text

>>> _censor("shack this hack is a whack", 'hack')
'shack this **** is a whack'
>>>

And now the redaction of only word units while working upon one string...


>>> def _censor(text, word):
	n = len(word)
	m = len(text)
	for i in range(m - n + 1):
		if word == text[i:i + n]:
			first = text[0:i + n] == text[i:i + n]
			last = text[i:i + n] == text[m - n:]
			text = text[0:i] + '*' * n + text[i + n:] \
			if (first and text[i + n] == ' ') or \
			       last and text[i - 1] == ' ' or \
			       text[i - 1] == ' ' and text[i + n] == ' ' \
			else text
	return text

>>> _censor("shack this hack is a whack and hack", 'hack')
'shack this **** is a whack and ****'
>>> _censor("shack this hack is a whack and hack whack", 'hack')
'shack this **** is a whack and **** whack'
>>> _censor("hack shack this hack is a whack and hack whack", 'hack')
'**** shack this **** is a whack and **** whack'
>>> _censor("hack shack this hack is a whack and hack whack hack", 'hack')
'**** shack this **** is a whack and **** whack ****'
>>> _censor("hack shack this hack is a whack and hack shackle hack", 'hack')
'**** shack this **** is a whack and **** shackle ****'
>>>

And ultimate refining...


>>> def _censor(text, word):
	n = len(word)
	m = len(text)
	s = ' '
	for i in range(m - n + 1):
		if word == text[i:i + n]:
			first = text[0:i + n] == text[i:i + n]
			last = text[i:i + n] == text[m - n:]
			text = text[0:i] + '*' * n + text[i + n:] \
			if (first and text[i + n] == s) or \
			       last and text[i - 1] == s or \
			       text[i - 1] == s and text[i + n] == s \
			else text
	return text

>>> _censor("hack shack this hack is a whack and hack shackle hack", 'hack')
'**** shack this **** is a whack and **** shackle ****'
>>>

I hope you were working on this at the same time I was else some of the energy will surely have been lost. For some reason I don't think I could have done this on my own so you must have been figuring.

https://repl.it/JbrX/3


#5

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.