Common letters permutation

The goal is that given a list of words, what is the simplest way to generate a set of words with common letters? Here is my first attempt, though I am certain it is as naive as they come…

four_letter = "BIOS BOLD BYTE CHIP CODE COPY DATA DBMS DHCP EDIT \
FILE FONT FORM HANG HELP HOST HTML HTTP ICON JPEG LINE LOAD LYNX \
MAIL MENU NODE OPEN PING PORT POST QUIT RAID READ SORT TASK \
TEAM TEXT TREE UNIX USER".split()
u = four_letter
v = u[:]
w = []
for x in u:
    for y in x:
        for m in v:
            for n in m:
                if y == n:
                    if (x, m) not in w and (m, x) not in w:
                        w.append((x, m))

					
q = filter(lambda x: x[0] != x[1], w)
z = set(list(q))

The above set has 518 tuples.

generate a set of words with    common letters
all pairs of words      with    common letters
    combinations       filter      intersect

These concepts exist in python.

from itertools import combinations
from functools import reduce

def intersection(xs):
    return reduce(set.intersection, map(set, xs))

z = filter(intersection, combinations(four_letter, 2))
2 Likes

What about something like this:

>>> from itertools import chain
>>> four_letter = "BIOS BOLD BYTE CHIP CODE COPY DATA DBMS DHCP EDIT \
FILE FONT FORM HANG HELP HOST HTML HTTP ICON JPEG LINE LOAD LYNX \
MAIL MENU NODE OPEN PING PORT POST QUIT RAID READ SORT TASK \
TEAM TEXT TREE UNIX USER".split()

>>> words = {k:[w for w in four_letter if k in w]for k in set(chain(*four_letter))}
>>> words['A']
['DATA', 'HANG', 'LOAD', 'MAIL', 'RAID', 'READ', 'TASK', 'TEAM']
>>> words['O']
['BIOS', 'BOLD', 'CODE', 'COPY', 'FONT', 'FORM', 'HOST', 'ICON', 'LOAD', 'NODE', 'OPEN', 'PORT', 'POST', 'SORT']

Edited after suggestion from @ionatan
Thanks!

1 Like
{l: None for w in four_letter for l in w}.keys()
{l: None for w in four_letter for l in w}
{l for w in four_letter for l in w}
{l for l in ''.join(four_letter)}
set(''.join(four_letter))
set(chain(*four_letter))  # concat without being specific to string
1 Like

Great way to sum up the concepts; and, the tools you introduce are much welcome in dealing with this problem. Thank you.

You’re hit upon my second goal, that of grouping by words that contain a specific letter. The earlier collection is unique tuple pairs of intersecting words

Both are important in my imaginary tool for drawing candidates from the pool when constructing a PlayFour puzzle (which is still a ways off, for me, though maybe not for you or others). The other important component is match at position, as in, which words have an A in the third position?

1 Like

I had to google four word puzzles to get an idea of what you were getting at. Looks pretty challenging to come up with code to generate such puzzles. At least for me.

I had that in mind while I was working on my code. Thinking that finding words with a given letter in x position would be easier if I knew which of the available words contained the letter.

For now, code that can aid in the manual construction of said puzzles is more what I have in mind. We are still the author, and the program is like our thesaurus.

Between you and @ionatan we’ve got a lot to work with so far.

from itertools import chain
>>> words = {k: [w for w in four_letter if k in w] for k in set(chain(*four_letter))}
>>> words['A']
['DATA', 'HANG', 'LOAD', 'MAIL', 'RAID', 'READ', 'TASK', 'TEAM']
>>> 

Thanks to both of you for spurring this on. One thinks a few puzzles will soon emerge out of this endeavor.

Look for the topic where I posted the link to PlayFour to see how this plays out.

1 Like

You could fill in each row while keeping an eye on possible candidates in the columns and those would in turn affect what you could use for the following rows

So for example if one has

GOLF
GOLD

then possible remaining words for column one would be the intersection of g1 (g in position 1) and g2
, column 2 would be o1 & o2, l1 & l2, f1 & d1
the third row would then … uhm… something.

https://discuss.codecademy.com/t/monthly-features/438474/22

    D I E T
    E D G E
    A L O E
    L E S S

Alternating horizontal and vertical choices may rule out candidates fast enough. After having picked one of each there shouldn’t be much left to try (on average)

My /usr/share/dict/words file has 3721 words with 4 letters in them. Making two choices is then about 16 million combinations, and if there’s not much work beyond that then that’s totally fine

1 Like

That’s actually my fear, that we won’t have enough candidates to formulate a single puzzle. Still, ours is to try, now we got this far.

1 Like

I actually did this puzzle after I Googled it :slightly_smiling_face:

1 Like

Sixteen million combinations versus our 518, for this particular instance, computer science terms with four letters drawn from the University of Utah glossary, and checked against the Wikipedia version. It’s a challenge to work through in those confines.

that’s not the number of solutions possible, or the number of words

With 518 intersections, we’re still not guaranteed any usable puzzles That is for us to discover. The number of words is given in the initial list.

oh intersections. right. still not the same number though.
for that few words it doesn’t matter, just pick each combination of 4 and see if it adds up in the other direction

If we want to start with say, L Y N X, as the top line in the puzzle, how many words can we project downward from those letters? None. But we could have it on the bottom line if U N I X is on the vertical axis in the last column.

- - - U
- - - N
- - - I
L Y N X

The only word that ends in U in our list is MENU,

M E N U
- - - N
- - - I
L Y N X

and conveniently we have, M A I L,

M E N U
A - - N
I - - I
L Y N X

What’s to go in between? This is where we begin shedding options very quickly, as you suggested, earlier. M A I L may be the only word we salvage out of the four.

So we reduce the list to exclude those three since we know they will never fit with the terms we have in the list.

four_letter = "BIOS BOLD BYTE CHIP CODE COPY DATA DBMS DHCP EDIT \
FILE FONT FORM HANG HELP HOST HTML HTTP ICON JPEG LINE LOAD  \
MAIL NODE OPEN PING PORT POST QUIT RAID READ SORT TASK \
TEAM TEXT TREE USER".split()

Having defined the list with a verbose name, I still think it’s okay to assign it to a symbol,

u = four_letter

Less typos that way.

I don’t find any solutions for those words, but I won’t swear my code is correct, if anything I’d guess it isn’t :stuck_out_tongue: (it’s supposed to be a full search)

{-# LANGUAGE NoImplicitPrelude #-}

module Main where

import           Control.Arrow
import           Control.Monad
import           Data.Char       (toUpper)
import           Data.List       (foldl1', words, (!!))
import           Data.Map.Strict ((!))
import qualified Data.Map.Strict as Map
import qualified Data.Set        as Set
import qualified Data.Text       as Text
import           Protolude

letters :: [Char]
letters = ['A' .. 'Z']

conditions :: [(Char, Int)]
conditions = (,) <$> letters <*> [0 .. 3]

meetsConditions :: [[Char]] -> Map.Map (Char, Int) (Set [Char])
meetsConditions wrds =
  Map.fromList $ do
    (ch, loc) <- conditions
    let matches =
          Set.fromList $ do
            word <- wrds
            guard $ atMay word loc == Just ch
            return word
    return ((ch, loc), matches)

intersectAll :: Ord a => [Set a] -> Set a
intersectAll = foldl1' Set.intersection

findAllSolutions chlocSets words = do
  r1 <- words
  c1 <- fromSets [(r1 !! 0, 0)]
  r2 <- fromSets [(c1 !! 1, 0)]
  c2 <- fromSets [(r1 !! 1, 0), (r2 !! 1, 1)]
  c3 <- fromSets [(r1 !! 2, 0), (r2 !! 2, 1)]
  r3 <- fromSets [(c1 !! 2, 0), (c2 !! 2, 1), (c3 !! 2, 2)]
  r4 <- fromSets [(c1 !! 3, 0), (c2 !! 3, 1), (c3 !! 3, 2)]
  c4 <- fromSets [(r1 !! 3, 0), (r2 !! 3, 1), (r3 !! 3, 2), (r4 !! 3, 3)]
  return [r1, r2, r3, r4]
  where
    fromSets = Set.toList . intersectAll . map (chlocSets !)

main = do
  allwords <- (words . map toUpper . Text.unpack) <$> getContents
  let fourWords =
        allwords & filter (length >>> (== 4)) & filter (all (`elem` letters))
  print fourWords
  let chlocs = meetsConditions fourWords
      solutions = findAllSolutions chlocs fourWords
  mapM_ (\s -> mapM_ putStrLn s >> putStrLn "") solutions

The method of getting all combinations of 4 and checking whether it’s a solution would be much easier to implement correctly

...

ABBE
DORA
ALAS
MATE

ABBE
DORA
ALAS
MATT

ABBE
DORA
ALAS
MATT

ABBE
DORA
ALAR
MAYS

ABBE
DORA
ALES
MADE

ABBE
DORA
ALEC
MATH

ABBE
DORA
ALES
MATE

...
1 Like

That looks to be Haskell; is it? Looks very interesting albeit way over my head. Excellent work, just the same. Given a whole range of four character words, such as your dictionary of 3000 plus will undoubtedly yield hundreds of puzzles.

After scratching away on paper for hours I’ve not found one solution for the list of words we started with. Words with J, K, Q and X can be elliminated since there are no multiples of those letters; we can drop JPEG, QUIT, TASK and TEXT from the list. Given the horsepower of your program I’m sure we can trust the outcome if it could not produce a solution. There is just not enough to work with.

>>> four_letter = "BIOS BOLD BYTE CHIP CODE COPY DATA DBMS DHCP EDIT \
FILE FONT FORM HANG HELP HOST HTML HTTP ICON JPEG LINE LOAD  \
MAIL NODE OPEN PING PORT POST QUIT RAID READ SORT TASK \
TEAM TEXT TREE USER".split()
>>> sorted(list(set(chain(*four_letter))))
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'X', 'Y']
>>> sorted(list(filter(lambda x: len(words[x]) > 1, set(chain(*four_letter)))))
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'L', 'M', 'N', 'O', 'P', 'R', 'S', 'T', 'U', 'Y']
>>> 
1 Like