[Extreme] Tolerant String Matching Python Challenge (not for beginners)

Code Challenge #25: April 18th, 2021

It has been a very long time since @oduffy and I have run a challenge (four years), but I stumbled across this rather fiendishly hard problem the other day and thought I would share.

As normal, it is split into basic, intermediate and hard. I wouldn’t recommend trying anything beyond easy unless you have a strong grasp of algorithms, because it is one of the hardest challenges I’ve put up. By all means though, feel free to give it a go! Only Python is a valid submission, closing date is two weeks from now (2nd May), with a winner hopefully announced shortly after. I’ll endeavour to arrange for a prize too (though no promises!). (update prize confirmed!)


Basic Difficulty

You have two strings message and snippet with len(message) >= len(snippet) and the length of the message can be orders of magnitude bigger than the snippet. Write a function, basicMatch, that will return true if the snippet is present in the message (case insensitive).

  • Function Name: basicMatch

  • Input: two strings message and snippet

  • Output: boolean showing if snippet is in message

  • Examples:

    • basicMatch("Codecademy taught me to code", "code") => True
    • basicMatch("A really long string", "on") => True
    • basicMatch("Welcome to the internet", "hello") => False
    • basicMatch("CaseInsEnSiTiVE", "caseinsensitive") => True
  • Always remember to explain your code and the thought processes behind it!

  • What if your interviewer had follow-up questions or extensions to this challenge? Don’t anticipate what exactly those follow-ups or changes may be, but try to write your code so that it is easily read, easily maintained, and can be adapted to potential modifications in the interviewer’s questioning.

Find out more about basic challenges.

Intermediate difficulty

You are told the input message went through a decoding step and could be malformed with the wrong characters due to text encoding errors (whereas snippet will never be malformed). I.e. the message was incorrectly decoded and still contains the string “%65” which should have been decoded as an “A”. Write a function, intermediateMatch, that will say if a tolerant matching is present, given two strings and an array of possible decodings.

  • Function Name: intermediateMatch
  • Input: two strings message and snippet and a 2D array of possible decodings [["%65", “a”], …]
  • Output: boolean showing if snippet is tolerantly in message
  • Examples:
    • intermediateMatch("Codecademy is great", "codecademy", []) => True
    • intermediateMatch("Codec%65demy is great", "codecademy", [["%65", "A"]]) => True
    • intermediateMatch("Codec%65demy is great", "codec%65demy", [["%65", "A"]]) => True
    • intermediateMatch("Codec%65demy is great", "codecademy", [["%65", "A"], ["%67", "D"]]) => True
    • intermediateMatch("Co%67ec%65demy is great", "codec%65demy", [["%65", "A"], ["%67", "D"]]) => True
    • intermediateMatch("Co%67eca%68emy is great", "codecademy", [["%67", "D"], ["%68", "D"]]) => True
    • intermediateMatch("Co%67eca%67emy is great", "co%67ecademy", [ ["%67", "D"]]) => True
    • intermediateMatch("Codecademy is great", "codec%65demy", [["%65", "A"]]) => False
    • intermediateMatch("abc%6efg", "abcdefg", [["e", "d"], ["%6", "e"]]) => False
  • Please note the following:
    • snippet will never be wrong. I.e. if it includes the string “%65” this is the literal string
    • message may or may not be decoded wrong. I.e. “%65” could be the literal string or could be decoded as an “A”
    • the decoding function is a many-to-one mapping, so given the decoding [a, b] a can only ever decode to b, but multiple a’s could give you the same b. I.e. [["%67", "D"], ["%68", "D"]] is valid but [["%67", "D"], ["%67", "C"]] is an invalid mapping.
    • For any mapping [a, b] a can be any number of characters but b will always be a single character. I.e. ["%68", "D"] is valid but ["%67", "CODE"] isn’t
Find out more about intermediate challenges.

Hard Difficulty

Your chief network engineer tells you the message is even more garbled than it first seemed. He can’t remember how many times the message was encoded, which means we will have to decode the message an unknown number of times! He’s promised that there are no cycles in the encoding though, so there is no chain that leads back to itself i.e. this is invalid [a, b], [b,c], [c, d] .... [X, a]

  • Function Name: hardMatch
  • Input: two strings message and snippet and a 2D array of possible decodings [["%65", “a”], …]
  • Output: boolean showing if snippet is tolerantly in message
  • Examples:
    • hardMatch("Codecademy is great", "codecademy", []) => True
    • hardMatch("Codec%65demy is great", "codecademy", [["%65", "A"]]) => True
    • hardMatch("Codec%65demy is great", "codecademy", [["%65", "$"], ["$", "A"]]) => True
    • hardMatch("Codec%65demy is great", "codecademy", [["%65", "$"], ["$", "B"]]) => False
  • Don’t forget to explain your submission just as you would do in a job interview setting!
  • Also consider the Big O complexity of your code and comment on it
Find out more about hard challenges and Big O

Extreme Difficulty

As with Hard but most of our three assumptions are no longer valid. In a decode [a, b] b can be an arbitrary length. [a, b] now has a many to many relationship (one a can go to multiple bs now). Your chief engineer has realised that there may indeed by a cycle in the decoding!

  • Function Name: extremeMatch
  • Input: two strings message and snippet and a 2D array of possible decodings [["%65", “a”], …]
  • Output: boolean showing if snippet is tolerantly in message
  • Examples:
    • extremeMatch("$1cademy is great", "codecademy", [["$2", "co"], ["$1", "code"]]) => True
    • extremeMatch("$2decademy is great", "codecademy", [["$1", "code"], ["$2", "co"]]) => True
    • extremeMatch("%1 is great", "codecademy", [["%1", "codecademy"], ["%68", "D"]]) => True
    • extremeMatch("%1 is great", "codecademy", [["%1", "codecademy"], ["codecademy", "%2"], ["%2", "%1"]]) => True
    • extremeMatch("%1 is great", "code", [["%1", "codecademy"], ["%68", "D"]]) => True
    • extremeMatch("dbcd%6fg", "abc%6efg", [["%6", "d"], ["d", "e"], ["e", "%6"], ["d", "a"]]) => True
    • extremeMatch("%1 is great", "code", [["%1", "co"], ["%68", "D"]]) => False
    • extremeMatch("%1 is great", "code", [["%1", "co"], ["co", "d"], ["d", "%1"]]) => False

Bonus (for any level)

Your chief network engineer is amazed you have managed to get this far, and is extremely grateful that you’ve helped him fix his mistakes. However, he realised he also needs to know the start and end index of the snippet within the message. Redo the first three tasks including start and end indices.

  • Function Name: bonusMatch
  • Input: two strings message and snippet and a 2D array of possible decodings [["%65", “a”], …]
  • Output: boolean showing if snippet is tolerantly in message
  • Examples:
    • bonusMatch("Codecademy is great", "codecademy", []) => True, 0, 10
    • bonusMatch("Codec%65demy is great", "codecademy", [["%65", "A"]]) => True, 0, 12
    • bonusMatch("Codec%65demy is great", "codecademy", [["%65", "$10"], ["$10", "A"]]) => True, 0, 12
    • bonusMatch("Co%67ecademy teaches me to code", "code", [["%67", "d"], ["d", "a"]]) => True, 0, 6
    • bonusMatch("Codec%65demy is great", "codecademy", [["%65", "$10"], ["$10", "B"]]) => False, 0, 0
  • You must return the indices of the first possible match

Reply to this thread with a link to your code on repl.it and paste your properly formatted code to participate! Don’t just submit your code, remember to explain your solution, too! If you want to be considered as the winner, your function must have the correct name and provide output in the format specified above, you also need to abide by our other simple rules.

As always solutions using imports to do all the heavy lifting such as itertools will not be considered for the winner.

When including your repl.it link, please just give us the link and not the code preview! Just put a space before your URL so that the preview box doesn’t show – too many preview boxes can make the thread hard to read.


The fine print:

For this challenge only Python entries will be considered as valid submissions.

Feel free to solve this challenge with any language, but only a Python submission will be considered for the winner, to making validating the solution easy.

Click the links to find out more about:

Reminder this challenge gets pretty insanely hard, so don’t feel put off if you can’t get past the easy difficulty, instead have a look at some of the past challenges.

I’d definitely encourage you to post any parts of the challenge you do manage though, and any partial solutions to the others, would be great to see what you all come up with.

If I manage to get a prize together for this challenge it will be for whoever completes the highest level of the challenge the best. I.e. if no one successfully completes hard the prize will be for the best medium entry.


Just finished the extreme part of the challenge it took me a lot longer than I thought it would! Excited to see people’s approaches to the challenge and remember share your code and ideas for every single stage! If you’ve got an awesome solution to basic or intermediate then definitely share them below :smiley:

Any other thoughts on the challenge or clarification to the wording is also welcome.


Prize Update!

The amazing oduffy has just confirmed we will be able to give away a months worth of Pro for free to the winner of this challenge :grin::tada:

6 Likes

ExtremeTolerantStringMatching - Replit

That’s an absolutely awesome solution well done @willamor6753833191

I have a couple of wrinkles with more test cases that would be fun to see if you could refactor your code to handle:

print(intermediateMatch("Co%67eca%67emy is goo%67", "codeca%67emy is good", [["%67", "D"]]))
print(hardMatch("Codec%65demy is great", "codecademy", [["%65", "$10"], ["$10", "D"], ["D", "A"]]))
print(extremeMatch("%1 is great", "code", [["%1", "code"], ["co", "d"], ["d", "code"], ["code", "%1"]]))

Awesome submission though, looking forward to seeing what else you can manage :smiley:
Also, have you seen the Bonus?

https://replit.com/@VictoriaDR/TolerantStringMatching

Great challenge, I very much enjoyed solving it!

1 Like

Thanks Alex, I had wondered how to break it :slight_smile:

I’ll take a look again (borrowing from Victoria’s great solution :+1:)

All the best,

Will

2 Likes

Incredible solution @victoria_dr, I really didn’t think anyone would be able to solve all of these!

I have managed to find one slight wrinkle with your solution though:

extremeMatch("%10 is great", "codecademy", [["%1", "random"], ["%10", "codecademy"]]) => True

It did take me an awfully long time to find a bug though and you’re really close to making it work! And I found the same bug in my own solution.

Other than that do you have any idea of the complexity of your solution and if that complexity could be improved?

Those two new tests go against this assumption in the intermediate description.

And even if they didn’t, this should be false unless there’s a typo. It seems maybe you meant to use all $ or all %, not both.

Ah great spot @el_cocodrilo I got confused with my own challenge categories!
I’ve updated my comment to reflect the actual bug.


@victoria_dr @willamor6753833191 off the back of @el_cocodrilo comment I realised I hadn’t actually fully explained Expert. My intention was for it to be even harder so [a, b] had a many to many relationship.

I’ve added an example to reflect this added case, so feel free to give it a go :smiley: However, since I forgot about it I will still test your entries against the original wording of the challenge.

1 Like

I missed that b was a single character in intermediate and wrote it to handle full strings
Still working out the kinks in mine, but I think I’m rather close.

Hardest cases I’ve found to work with so far are:

extremeMatch("code", "codecademy", [["c", "codecademy"]]) => True

extremeMatch("%%656", "code", [["%65", "6"], ["%66", "code"]]) => True
1 Like

I’ve thought of an even harder example too that my code doesn’t begin to cover:

extremeMatch("1", "aaaaaaaa", [["1", "a"], ["a", "11"]]) => True

Back to the drawing board for me!

2 Likes

One week challenge update

We’ve had a fair bit of interest in this challenge and three entries so far from @willamor6753833191, @victoria_dr and @mtf.

The extremeMatch is proving even more extreme than intended, so I’d encourage people to just enter their solutions up to hard. And then experiment a bit with extreme entering any fun solutions they have found.

This was my original attempt which I did before putting the challenge up. It handles “happy” expert cases but I think my regex knowledge has reached it’s limit with the harder cases, I did manage to get all six failing cases passing but only by running a few extra passes over it which didn’t fix the generalised problem.

Looking forward to seeing everyone’s thoughts and solutions so far! Remember no one has solved expert in its entirety yet so you are still in for winning the challenge with an elegant solution to hard and if you can’t solve that then share your solutions to intermediate anyway!


Update

I’ve managed to tackle expert in a different way so it should now handle all cases that we can throw at it (please share any hard test cases). I’ll share this at the end of the challenge, so as not to detract from anyone else managing to achieve it independently.

ExtremeTolerantStringMatching - Replit

Updated so Intermediate and Hard should now work, Extreme seems to work but I haven’t figured out how to get the correct span for extremeMatch(“1”, “aaaaaaaa”, [[“1”, “a”], [“a”, “11”]]) => True

1 Like

That’s great, well done William, nice to see your take on my approach. It passes 43 of my 48 tests (with the 5 failing being the super extreme ones). Nice job! With the regex approach I found I could make contrived examples for expert that would fail, but it works really nicely for the cases below hard.

Given that “%6” has a path to “d” via “e” would this not be True, or is it False because the message was only encoded and decoded once (not explicitly stated, but perhaps implied)?

Yes that’s right Tod, I think perhaps I wasn’t as clear as I could have been when I said:

But you are right it is False since only one step is allowed. Subsequent challenges allow for multiple steps

1 Like

Still a work in progress, but here’s my approach. Works for all examples through the Hard category so far. I added a few examples as well. https://replit.com/join/yuhrmwiv-midlindner

Oh that is awesome great job Tod :grin: I also went down the graph route (though mine was quite implicit rather than explicit vertices)

My solution which covers all of my test cases including the ones I’ve dubbed “super extreme” uses a graph algorithm and a couple of other graph algorithms for additional heuristics.

Approaching it as a graph theory problem you can get a really elegant answer.

I’ll post my solution on Sunday, looking forward to seeing if people think it is good or not.

1 Like

And the competition is closed! Though by all means continue to post your solutions :smiley:

This was the solution I ended up on which (I like to think) is a fairly elegant depth first search approach, with a couple of added heuristics that aren’t needed but can speed up the program by finding the right part of the search tree first.

Would be great to hear what everyone else thinks of this solution (and each other’s solution).


And the winner is @willamor6753833191 for their original solution and reworking of the regex approach! @fede_hq or @lilybird would you please be able to get in contact with William for the prize of a months free Pro?

Congrats William!

4 Likes

Absolutely! And congrats @willamor6753833191 :tada: I will DM you so that we can get you your prize :slight_smile:

5 Likes