Doesn’t that start over from the root of the trie every single lookup? You already know where to start looking for the children
def searchword(trie, wildcards, tiles, output):
if trie is word:
output.append(trie.word) # or send current letters as arguments to this function
for each unique tile in tiles:
if trie has tile:
searchword(trie[tile], wildcards, (tiles - tile), output)
if wildcards > 0:
for each key in trie that has not already been searched in previous loop:
searchword(trie[key], (wildcards - 1), tiles, output)
I thought your idea is exactly what I need so i implemented it like so:
def search_word(node, wildcards, alphabet, tiles, output)
output.push(node.full_state) if node.terminal? # append output if its a word
unless tiles.empty? && wildcards.zero?
tiles.uniq.each do |tile|
unless node.walk(tile).nil? # choose only those letters that could possibly make a word
next_node = node.walk(tile)
remaining_tiles = take_away(tiles, tile)
search_word(next_node, wildcards, alphabet, remaining_tiles, output)
end
end
end
if wildcards > 0
other_letters = take_away(alphabet, tiles.uniq) # letters that weren't used in previous loop
other_letters.each do |tile|
unless node.walk(tile).nil? # only those that could make a word
next_node = node.walk(tile)
remaining_tiles = take_away(tiles, tile)
remaining_wildcards = wildcards - 1
search_word(next_node, remaining_wildcards, alphabet, tiles, output)
end
end
end
end
This is exactly what I wanted, thank you!
However, It is still slow. for input: %w[a b c d e f g h i j k l m n o] it’s about 0,4 s - which is acceptable, but for %w[a b c d e f g h i j k l m n ?] it’s 3,4 s and for %w[a b c d e f g h i j k l m ? ?] it’s 15,1 s.
How would you make it faster? Can i somehow refactor this algorithm? Or perhaps storing words different way could play bigger part? For exaple store a word “fast”, as “afst” and have a link to all it’s anagrams -> [“fast”, “fats”, “saft”]. What do you think?
You could run a profiler on it and see if it’s spending a lot of time in something that could be done differently
Would it make a difference if the order of the letters were changed? (For example most common letters first, or perhaps least common letters first, not… sure if this has any effect at all or which would be better)
Grouping duplicated characters? So words with two a’s in it would have a “aa” tile
Grouping common combinations? or some other combinations? Any common groups that could be considered the same character? Like “tion” in English, though they don’t necessarily need to be in order
Moving it out to a native language language might cut the time by a pretty large factor (something like 50, but depends an awful lot on what’s being done)
Adding constraints in the input? Like already placed tiles and how much room there is
Splitting up the wordlist by score, so that you try the best-scoring ones first and then stop early when there are enough results