[Challenge] Word Transformation

Code Challenge #6: May 17, 2017

Every week, we feature the type of brain-teasing question that might be asked in a full-stack developer’s job interview at places such as Google and Facebook.

This week’s challenge was reportedly asked in interviews at Snap, Inc. (parent company of Snapchat):


Basic Difficulty

Given two words (beginWord and endWord), and a dictionary’s word list, find the length of shortest transformation sequence from beginWord to endWord.

You may assume:

  • Both beginWord and endWord are strings of equal length (that is, equal numbers of characters).
  • beginWord and endWord are words with all lower-case characters (in ASCII hexadecimal, characters range from 61 to 7A)
  • Letters cannot be inserted or deleted, only substituted.
  • Only one letter can be changed at a time.
  • Each intermediate word must exist in the word list.
  • Your algorithm should be able to work with any dictionary, but for this challenge just use the one posted below. Either copy it word for word into an array, or save it into a text file and load it using File I/O.
Find out more about basic challenges.

Intermediate difficulty

beginWord and endWord may now be strings of different lengths

You may assume:

  • Your function can now delete and insert letters.
Find out more about intermediate challenges.

Hard Difficulty

Make your submission as efficient as possible.

  • Try adding a timing function to your code, and try out different approaches to the problem to find the quickest solution. Also, try to reduce the amount of space you use.
  • You may limit your function to check whether the edit distance between beginWord and endWord is within some threshold or maximum distance of interest, k.
Find out more about hard challenges and time complexity + Big O.

Scroll down and reply to this thread with your code to participate! Don’t just submit your code, remember to explain your solution, too! If you want your entry to be considered under our “best code” assessment, you must include a link to your code running on repl.it and abide by our other simple rules.


Resources:

The fine print:

Click the links to find out more about:

1 Like

Please use the following dictionary, and check your code using the test cases below.

For the intermediate level, I assume that users should be able to prepare test cases and should be able to verify correctness without our help.

Dictionary:

humic
humid
wane
jade
molt
moll
want
slag
wade
mist
dolt
doll
mate
fade
maze
wart
jake
wary
mitt
wake
gate
mite
wait
faze
malt
hemic
male

Test cases:

shortestTransformation("gate", "gate")
>> 0

shortestTransformation("jake", "jade")
>> 1

shortestTransformation("humid", "hemic")
>> 2
# humid -> humic -> hemic

shortestTransformation("mist", "wary")
>> 11
# mist -> mitt -> mite -> mate -> maze -> faze -> fade -> wade -> wane -> want -> wart -> wary

shortestTransformation("wait", "doll")
>> 11
# wait -> want -> wane -> wade -> fade -> faze -> maze -> male -> malt -> molt -> moll || dolt -> doll

shortestTransformation("slug", "gate")
>> -1
# not possible
5 Likes

Here is a solution using a recursive approach

Basic Difficulty:

Non-Recursive Python 3 Method:


Hard Difficulty:

Recursive Python 3 Method:

Hi @dave.n that’s a cool solution for shortest distance, but we’re actually looking for the shortest path. Look at the first post, try to make a solution that creates a path through only words in that dictionary. Then abstract so you can input any dictionary and it will only traverse words that are in it.

Not had time to optimise it but I think this runs in O(d^2), where d is the number of words in the dictionary.

It works by performing a breadth first search on the dictionary, starting with the first word and treating the dictionary as a graph.

Done. Finally. :sweat_smile:

Basic Difficulty:


2 Likes

Hard / Intermediate (I think):

Should between O(n) and O(n^2), is that call O(n log n)? n = number of word in dictionary

  1. Start from start string, find sibling from dictionary (1 char diff, 1 char insert or delete)
  2. append the new sibling word to the path, and loop again, until the last word in the path is a sibling of the end word
  3. each time a sibling found from dictionary, remove the sibling from dictionary, this means the number of word in dictionary will become smaller and smaller
  4. Stop when no sibling can be found, or already reach end word

#!/bin/env python

dictionary = ["humie","humic","mal","mail","humid","wane","jade","molt","moll","want","slag","wade","mist","dolt","doll","mate","fade","maze","wart","jake","wary","mitt","wake","gate","mite","wait","faze","malt","hemic","male"]
        
def isSibling(one,two):
	len_one = len(one)
	len_two = len(two)
	diff = len_one - len_two
	if diff == 0:
		# equal length string, check for 1 character different
		for i in range(len(one)+1):
			if one[:i] == two[:i] and one[i+1:] == two[i+1:]:
				return True
	elif diff == -1:
		# string one length shorter by 1 char,
		# skip 1 charcter in string two and check equal string 
		for i in range(len(one)+1):
			if one[:i] == two[:i] and one[i:] == two[i+1:]:
				return True
	elif diff == 1:
		# string two length shorter by 1 char,
		# skip 1 charcter in string two and check equal string
		for i in range(len(two)+1):
			if two[:i] == one[:i] and two[i:] == one[i+1:]:
				return True
	return False

def checkPath(start,end):
	# string 1 == string 2
	if start == end:
		return ([start],0)

	bfs=[[start]]
	next=dictionary
	
	# remove start and end word from dictionary
	# to remove redundant loop
	if start in next:
		next.remove(start)
	if end in next:
		next.remove(end)
	
	# start with start string
	while bfs:
		path=bfs.pop(0)
		#last word is a sibling of end
		if isSibling(path[-1],end):                               
			ret=path + [end]
			return (ret,len(ret)-1)
		
		#loop all words in dict 
		lib=next
		next=[]
		for i in lib:
			#last word is the sibling of the word in dictonary
			#   True: add the dict word to path and add path to queue for next loop
			#   False: rebuild a new dict with those not matching word
			if isSibling(path[-1],i):
				# add to the queue to continue searching next in next loop
				# dont add back dict word to dict
				bfs.append(path + [i])  
			else:
				# rebuild dict with word that is not sibling
				next.append(i)
                                        
	return ["Not Possible"]
	

print(checkPath("gate", "gate"))
print(checkPath("jake", "jade"))
print(checkPath("humid", "hemic"))
print(checkPath("mist", "wary"))
print(checkPath("wait", "doll"))
print(checkPath("slug", "gate"))
print(checkPath("mail", "mal"))
print(checkPath("mal", "male"))  
4 Likes

If there are N words in the dictionary, and longest word has length w, this is O(N*(w**2)), which I think is pretty good if N is large, and w is small

Define a “club” as a set of words which are all directly connected to one another.
“fat”, “hat”, “sat”
all belong to the club named ‘.at’, because they can all be converted by a substitution
at the position of the ‘.’.

For intermediate, “at” is added to the “.at” club, because “at” can be converted to any
of those words, and any of those words can be converted to “at”.

Breadth first search involves moving from a word to all of its clubs, and from clubs to all of their words.
Once navigation away from a club has occurred, it is emptied (no more need to use it, all members have been visited).
This helps avoid the O(N2) behavior of dense graphs (where N nodes may have N2 edges).

Another optimization (good for dense graphs) is to simultaneously search from both “begin” and “end”, and stop when those searches first intersect.

Edit: If word lengths are long (words are longer than the alphabet length), you could do something similar. Any time you “find” a word by doing a substitution at position x, tell the word that when searching for neighbors, it doesn’t need to do substitutions at that position, any such words have been already found. Similar logic could probably be used for the intermediate code (need to think about it). This probably gives a complexity (N words, word size w, alphabet size a) of O(Nwa).

Edit2: I downloaded an online list of English words (more than 300k words), modified the code a bit so that once the set of clubs were made, they could be reused (instead of emptying a club when visited, just mark them as visited). It turns out that connections are pretty sparse. With repeated testing of random pairs of words, the search rarely hit more than 10k words before either finding a path, or proving there was no path. That means that preprocessing the entire dictionary is not efficient, unless you are going to search for paths between multiple pairs of words. It takes my code about 100 seconds to preprocess the dictionary, and typically about 0.15 seconds to run the challenge on a pair of words. When it finds a long path (longest I saw was 18 steps), that takes about 3 seconds.

Edit3: A bit more on thought processes for this challenge. To solve this problem, there are two issues.

  1. Find the single-step neighbors of a word.
  2. Using (1) find the path (or perhaps just the cost of the path) from beginWord to endWord.

For item (2) a simple breadth-first search seems appropriate. You could do some optimistic searching (prefer to search in the direction of lowered edit-costs), but with so many dead-ends, the extra work doesn’t seem worth it.

For finding a word’s neighbors, the potential search strategies seem to be
a) Look at every word in the dictionary to see if it is a neighbor of the current word. O(dictionarysizewordsize) to find a single set of neighbors.
b) Look at every “transformation” of the current word, and see if that transformation is in the dictionary (after putting the dictionary into a set). So for “cat”, a possible transformation is “cgt”, so we’d check to see if “cgt” is in the dictionary. Preprocessing the dictionary is O(dictionarysize
wordsize), but finding neighbors of a word is O(wordsize*alphabetsize). If the dictionary is bigger than the alphabet, and you end up doing very many “neighbor” searches, this is a win.
c) Put words into groupings that make things more efficient. Possible groupings to put neighbors together:

  • Group words by length, since even for intermediate, neighbors have similar lengths.
  • Group words by “first half” and “second half”. Any neighbor of mine (with the same length) has either the same first half, or the same second half, and that search is easy to extend to slightly longer or shorter neighbors. Consider looking for neighbors of “football”. The dictionary I downloaded has over 49k eight-letter words, but only 88 of them either begin with “foot” or end with “ball”. This costs more preprocessing (each word is now in two hash tables), but a single hash-table lookup gives a fairly short list of candidate matches.
  • Lots of other potential groupings (group words by their count of each vowel). For all kinds of counting strategies, words can only be neighbors if their counts are no more than one apart.
  • I ended up taking the grouping a bit farther, so that if-and-only-if two words are in a same group, they are neighbors. The disadvantage is that it takes a lot of preprocessing, O(dictionarySizewordSizewordSize) in both time and space. One advantage was some simplicity (never having to check for partial matches). Also, when several words are in the same group (cat,hat,fat,mat,pat,sat,bat,rat,vat), there are O(N^2) edges between the words, but the search only every has to spend time on O(N) of those edges (there are O(N) edges between the “group” and its words). In hindsight, large groups like this are probably uncommon for longer words.
  • Potential other optimizations (not done)
    ++ Group all words by length (fast in python), but don’t put them into other groups (hash tables) unless/until a search actually reaches their word length.
    ++ Use different techniques based on word length and number of words of a particular length. The dictionary I downloaded had less than four hundred words of length 20. Any non-trivial processing of a list that short may be wasted.
1 Like

Hi,
Please find below my solution to the basic difficulty version of this challenge.
https://repl.it/IJoN/1

I used Python 3 to implement my submission.
I decided to use sets to hold a dictionary of words, as well as the exploratory candidates.
I make a local copy of the entire dictionary, so I can prune it down as I iterate.
I initialize a “source” set, left, with the beginning word.
I initialize a “destination” set, right, with the ending word.
As the search progresses, I populate one of these sets with known valid candidate words.
Also, when adding new candidates, the old set is removed from the dictionary.
The role of source and destination can dynamically be reversed.
If these two sets ever intersect, then we have discovered a solution.

Here is my code and results:

1 Like

Basic Difficulty
hamming distance = 1
Used a Graph data structure.
Used the breadth first search algorithm on the Graph to find the shortest path.

Python 3, recursive depth first search (I think?). Not sure if it’s appropriate to ask questions here, but I’m really confused about something. Where my code confirms a completed path from fromWord to toWord, I have placed a return statement but it is always ignored. If I don’t place the print statement all I get returned is my default “no go” at the bottom of my function. This means I can’t do any kind of intelligent sorting of my various paths to solution. I’ve had this problem before with other problems I’ve tried, and I just can’t figure out what I’m doing wrong. Can anyone help out? I really appreciate anyone taking the time to look at this for me!

P.S. I forgot to add a check to see if fromWord and toWord are already the same, but oh well.



Almost all the solutions work as expected, and meet most of the criteria, after the hint to use a breadth first search on a graph we ended up with some very nice solutions.

The winner is @ricebill for his use of a bidirectional BFS, clubs, and a great explanation. But well done to all who submitted it was a very hard challenge.

1 Like

When you recurse you need to return the recursion not just call the recursion else it will carry on going around the for loop.
This is indeed an attempt at DFS, but you didn’t implement the recursion properly so it didn’t work.

So I’ve fixed your code to run DFS: PreviousLonelyFinwhale - Replit The code is commented with all the changes I have made.

Now if you look at the results these return compared to the expected result you can see it isn’t as low as it should be. That is because DFS finds a path between two nodes, whereas BFS finds the shortest path between two nodes.

Hope that helps.

1 Like

It certainly does help! I can’t thank you enough for taking the time to do this. These sorts of recursive search problems fascinate me, despite the fact that I can’t usually seem to quite get them right… Maybe now I can get on the right track. Thanks once again!

1 Like

I thought I would share this clunky workaround I managed to figure out with your help! Still DFS, but somehow I rigged it to collect all the paths and then return the shortest one. I’m not under the assumption that it’s the best solution by any means, but it seems to work. Anyway, you were kind enough to help me out, so I wanted to share.

Nice :slight_smile: maybe try to implement a Breadth First Search