[Challenge] Anagram Detector

challenge

#34

A solution in Python which has a O(n+m) time complexity. It iterates through the first string and counts up the occurences of each character, and then does the same for the second string but decrements the count instead. If the two strings are anagrams then the count for every character should be 0.

Decided to handle all 128 ASCII codes so that the function works with all ASCII characters and not just capital letters, and it removes the need to subtract a base value from each of the ASCII values to convert it to a 0-based index.

def is_anagram(s1, s2):
    # can return 0 immediately if the strings are different lengths
    if len(s1) != len(s2): 
        return 0
    # for counting the ASCII characters 0 to 127
    counts = [0]*128
    # increment the count for each character in the first string
    for c in s1:
        counts[ord(c)] += 1
    # decrement the count for each character in the second string
    for c in s2:
        counts[ord(c)] -= 1
    # the strings are anagrams if all the counts cancel each other out
    return 0 if any(counts) else 1

There is a simple optimisation to check that the strings are the same length at the start of the function; if the strings are different lengths then they cannot be anagrams. Would be significant improvement for very big strings.

A totally inconsequential optimisation (but why not) is the use of the any() function instead of sum(). It is ever so slightly faster as it only needs to do a comparison with 0 instead of addition, and it does lazy evaluation, meaning it will return as soon as a non-zero value is found.

%timeit sum([0]*128)
100000 loops, best of 3: 2.48 µs per loop

%timeit any([0]*128)
100000 loops, best of 3: 2.07 µs per loop

%timeit sum([1]*128)
100000 loops, best of 3: 2.47 µs per loop

%timeit any([1]*128)
1000000 loops, best of 3: 764 ns per loop

Out of interest, I compared the runtime with a solution that use sorting to compare the strings (shown below). The time complexity for sorting is O(n log(n)), but for short strings the function below is faster. The point where the function above is faster is about strings of length 70 on my laptop.

def is_anagram(s1, s2):
    return 1 if len(s1) == len(s2) and sorted(s1) == sorted(s2) else 0

#35

I had this idea as well, but I don't think it works! Just adding up the ASCII values would mean it would see these two strings as anagrams, "AD" and "BC".

I thought maybe you can use the product but it would only work if each letter was represented by a prime number and the calculation of that product would probably make the function slow compared to the other methods, especially for large strings.


#36

Here is my entry using Python 3.6.1 hosted on repl.it:

https://repl.it/H6EZ/0

My first bit of thinking was to be sure to get the input from the user and prompt them as to what words they would like to compare. I then assigned another set of variables equal to the original input ran through the built-in .upper() function, to guarantee that the function would be handling uppercase letters, as per the challenge rules. Next up was converting the strings into lists with a for loop to iterate through each of the words and append each iterated item to the appropriate list. After assigning the lists to sorted versions of themselves, I then compared the sorted lists to each other and returned 1 for anagrams, and 0 for not anagrams.
Code below for those who do not wish to follow links:

def anagram():
    #ask the user for words to compare
    print('What two words would you like to compare and see if they are anagrams?')
    print('First word: ')
    word1 = str(input())
    print('Second word: ')
    word2 = str(input())

    #converts the strings from the user to uppercase
    word1Upper = word1.upper()
    word2Upper = word2.upper()

    #creation of empty lists for conversion from strings to lists
    word1u = []
    word2u = []
    #for loops do the work of making lists from the strings
    for char in word1Upper:
        word1u.append(char)

    for char in word2Upper:
        word2u.append(char)

    #sorting the letters into sequential order so that Python can compare the lists
    word1u = sorted(word1u)
    word2u = sorted(word2u)

    #condition that compares the sorted lists to each other to see if they match
    if word1u == word2u:
        #returns 1 if they match
       """ print('anagram')"""
       return 1

    else:
       """ print('not an anagram')"""
        #returns 0 if they don't
       return 0

    #print(word1u, word2u)

Time complexity was not taken into consideration when devising this solution as I’m still definitely a beginner.


#37

Here is my simple code in PHP.

  1. Then the string length, if it is same then it will proceed to next step.
  2. Then using a for loop, checked each character of the string 1 against string 2 in the reverse order.

<?php
function find_anagram ($sample_string1, $sample_string2)
{
	$string1 = str_replace(' ', '', $sample_string1);
	$string2 = str_replace(' ', '', $sample_string2);
	
	$str_len1 = strlen($string1);
	$str_len2 = strlen($string2);
	
	$flag=1;
	if($str_len1!=$str_len2)
	{	
		$flag=0;
	}
	else {	
		$c = $str_len2-1;
		for($i = 0; $i<$str_len1; $i++)
		{
			If ($string1[$i] != $string2[$c])
			{
				$flag=0;
				break;
			}			
			$c--;
		}	
	}
	
	echo $flag . "<br />";
}
find_anagram("amma", "amma");
find_anagram("asdf", "asdf");
?>

#38

Just a note to anyone using a SORTING FUNCTION this is not O(n + m), so you might want to look for a different solution.


#39

ES6 / Javascript Solution:

https://repl.it/H5it/4

  1. My function takes two strings as arguments. It loops through first one and creates a Map object to store frequencies of each letter in string (“letter => occurance” pairs).
  2. Then the function loops through second string decreasing these values - if any value is not already in our Map or the value becomes less than 0, then loop stops because we know the words can’t be anagrams.
  3. In worst case scenario the loop finishes and then function checks if any value in our Map is left greater than 0 - if yes, then the words are not anagrams. Otherwise they are.

I chose Map object, because i needed (key, value) pairs and hence i knew I wanted to iterate over the values it was easier to use Map rather than regular Object. It also provides quick access of values which was exactly what i needed.
The function takes into account that anagrams can have different space character counts. The second for loop stops immediatly if it finds a letter that does not occur in our Map or frequency of this particular letter became greater than it was in the first string.
Time complexity is O(n+m). Tests provided. Thanks for reading!


#40

My submission is in Python 3.

I decided to use a dictionary to keep track of the letters used in each of the two candidate strings.
For the first candidate string, I increment the letter count,
and for the second candidate string, I decrement the letter count.
If the two candidates are actually anagrams of each other, then all the counts should end up zero.
This can be easily checked with the any() function.

Here is my code:

def are_anagrams(a,b):
    letters = {}
    for c in a:
        if c.isupper():
            letters[c] = letters.get(c, 0) + 1
            
    for c in b:
        if c.isupper():
            letters[c] = letters.get(c, 0) - 1

    if any(letters.values()):
        return 0
    else:
        return 1
        
print("\""+"BRITNEY SPEARS"+"\"", "\""+"PRESBYTERIANS"+"\"", are_anagrams("BRITNEY SPEARS", "PRESBYTERIANS") )
print("\""+"DEDUCTION"+"\"", "\""+"DISCOUNTED"+"\"", are_anagrams("DEDUCTION", "DISCOUNTED") )
print("\""+"CLINT EASTWOOD"+"\"", "\""+"OLD WEST ACTION"+"\"", are_anagrams("CLINT EASTWOOD", "OLD WEST ACTION") )
print("\""+"DICTIONARIES"+"\"", "\""+"NEAR IODIOTIC"+"\"", are_anagrams("DICTIONARIES", "NEAR IODIOTIC") )

And here is the corresponding output:

"BRITNEY SPEARS" "PRESBYTERIANS" 1
"DEDUCTION" "DISCOUNTED" 0
"CLINT EASTWOOD" "OLD WEST ACTION" 1
"DICTIONARIES" "NEAR IODIOTIC" 0

Here is a link to my repl.it:

https://repl.it/H6gW/0

I also have a run where I capture some timing metrics:

def are_anagrams(a,b):
    letters = {}
    for c in a:
        if c.isupper():
            letters[c] = letters.get(c, 0) + 1
            
    for c in b:
        if c.isupper():
            letters[c] = letters.get(c, 0) - 1

    if any(letters.values()):
        return 0
    else:
        return 1

from timeit import timeit
f = open("bible.txt")
j = 1
for i in range(7):
    d = f.read(j)
    e = f.read(j)
    print("%d\t%e" % (len(d), timeit("are_anagrams(d, e)", setup="from __main__ import are_anagrams, d, e", number=10)))
    j *= 10

It appears to be O(N) as the size grows:

1       1.185185e-05
10      2.330863e-05
100     2.200493e-04
1000    2.495999e-03
10000   1.445293e-02
100000  1.419389e-01
1000000 1.397571e+00

#41

so the 'aab' is anagram of 'abb' in your code? :wink:


#43

Thank you artysta,
I used some seriously flawed logic there. I came up with a new solution, which seems to work fine now :smiley:


#44

This is just a proof of concept, without sorting…

>>> x = "ANAGRAM".replace(" ", "")
>>> y = "NAG A RAM".replace(" ", "")
>>> y = list(y)
>>> z = []
>>> for i in x:
	if i in y: z.append(y.pop(y.index(i)))

	
>>> len(y) == 0
True
>>> 

Will work on refinement and get this into function form. Looks like O(n + m), now.

def is_anagram(base, test):
    # are they string inputs?
    if type(base) != str or type(test) != str: raise TypeError

    # strip space characters
    base = base.replace(' ', '')
    test = list(test.replace(' ', '')

    # test is a list in order to allow `pop()`

    for char in base:
        if char in test: test.pop(test.index(char))
    # test will have no characters left if is anagram of base
    return len(test) == 0

print (is_anagram("JIM MORRISON", "MR MOJO RISIN"))     # True
print (is_anagram("JAMES MORRISON", "MR MOJO RISIN"))   # False

https://repl.it/H7DW/2


#45

This is my code in python. Anagram detection is made simply counting letters in each string, and if the same letters are equal in both strings, they are anagrams.

I'm using arrays because accessing an array member is faster than accessing a hash member. And using the ascii of the letter made it direct into the array. So, when I find an 'C' in the string, for example, I modify the 3rd element in the array (ord('C') - ord('A')), incrementing it for the first string, and decrementing it for the second. At the end, if the array of counters sums 0, the strings are anagrams.

def anagram(s1, s2):
counts = [0] * (ord('Z')-ord('A')+1)

   for i in range(len(s1)):
      if ord('A') <= ord(s1[i]) <= ord('Z'):
        counts[ord(s1[i]) - ord('A')] += 1

   for i in range(len(s2)):
      if ord('A') <= ord(s2[i]) <= ord('Z'):
        counts[ord(s2[i]) - ord('A')] -= 1

   if sum(counts) == 0:
      return 1
   else:
      return 0


s1 = input("string 1: ").upper()
s2 = input("string 2: ").upper()

if anagram(s1,s2) == 1:
   print("anagram found")
else:
   print("anagram not found")

I think this approach is O(m+n), as suggested.


#46

Ooops, I just figure an error. Code must be:

def anagram(s1, s2):
   counts = [0] * (ord('Z')-ord('A')+1)

   for i in range(len(s1)):
      if ord('A') <= ord(s1[i]) <= ord('Z'):
        counts[ord(s1[i]) - ord('A')] += 1

   for i in range(len(s2)):
      if ord('A') <= ord(s2[i]) <= ord('Z'):
        counts[ord(s2[i]) - ord('A')] -= 1


   for i in counts:
      if i != 0:
         return 0

   return 1




s1 = input("string 1: ").upper()
s2 = input("string 2: ").upper()

if anagram(s1,s2) == 1:
   print("anagram found")
else:
   print("anagram not found")

Because it's possible have a sum = 0, and members not zero (-1 y 1, for example)


#47

Used string encode in Python to find ascii sum of both strings. Spaces are ignored. Also counts characters. If ascii sum and character counts are equal in both, returns 1.

def anagram_check(str1, str2):
    ascii_total1 = 0
    ascii_total2 = 0
    char_count1 = 0
    char_count2 = 0
    for c in str1.encode('ascii'):
        if c != 32:
            ascii_total1 = ascii_total1 + c
            char_count1 += 1
    for d in str2.encode('ascii'):
        if d != 32:
            ascii_total2 = ascii_total2 + d
            char_count2 += 1
    if ascii_total1 == ascii_total2 and char_count1 == char_count2:
        return 1
    else:
        return 0

https://repl.it/H7Jc/1


#48

ignore_spaces is set to True by default so the strings "ANAGRAM" and "NAG A RAM" are anagrams. If you want to treat spaces as normal characters set it to False.


#49

Another try, this time in python. I built a dictionary from the letters of the first string with the occurrences as values and afterwards for every occurrence in the second string I reduced the value by 1. When the value reached 0, I deleted that letter. If there are letters in the second and not in the first string the corresponding letter has values lesser than or equal to 0, if there are letters in the first with fewer occurrences in the second then the value of the letter will be positive. Finally, if the two strings have the same letters exactly, the dictionary will be empty in the end.

def isAnagram(s1,s2):
    d={}
    for i in s1:
        if i in d:
            d[i]+=1
        else:
            d[i]=1
    for i in s2:
        if i in d:
            d[i]-=1
            if d[i]==0:
                del d[i]
        else:
            d[i]=0
    state=(len(d)==0)
    if state:
        return "the strings are anagrams of each other"
    else:
        return "no anagrams here"
s1="abhsiajikjhfjdfjbnaa"
s2="fbnaajdfjabhsiajikjh"
print isAnagram(s1,s2)
s1="tavar"
s2="varta"
print isAnagram(s1,s2)
s1="raaam"
s2="maaar"
print isAnagram(s1,s2)
s1="black"
s2="labcc"
print isAnagram(s1,s2)

#50

https://repl.it/H55J/16


#51

sorted goes O(nlogn), so the overall cannot be better than O(nlogn + mlogm)


#52

For the sport of it, I did a search of anagram generators and turns out there a quite few. One such generator churned out 10284 anagrarms for vanilla custard.

https://new.wordsmith.org/anagram/anagram.cgi?anagram=vanilla+custard&t=500&a=n

The speed and precision of this page made me go back and refactor the earlier submission for the satisfaction…

def is_anagram(base, test):
    if type(base) != str or type(test) != str: raise TypeError
    base = base.replace(' ', '')
    test = test.replace(' ', '')
    if len(base) != len(test): return 0
    if base.isalpha() and test.isalpha():
      test = list(test.upper())
      for char in base.upper():
        if char in test: test.pop(test.index(char))
      return 1 if len(test) == 0 else 0
    else: raise ValueError

print (is_anagram("WeE gIlLiS", "gIlE lEwIs"))

Edit: Monday, May 1

@alexcraig pointed out a couple of flaws, namely that I threw out the baby with the bath water and forgot to test string lengths. He also explained that in the worst cast scenario this is O(nm), which explanation I am ever grateful for. Thank you, Alex.

Code is edited to rectify omissions. Thankfully, this is only for practice, not the prize.

https://repl.it/H7DW/10


#53

Here is it. It transforms both string in decimal sums and compares them. If sums are equal, then is an anagram.

https://repl.it/H55J/15


#54

Looks like mtf was first with the list.pop idea, submitting my own version anyway:

def isanagramf(s1, s2):
    IGN=" .,!?"
    A=[i for i in s1 if i not in IGN]
    for i in s2:
        if i in IGN: pass
        elif i in A: A.pop(A.index(i))
        else: return 0
    return 1 if len(A)==0 else 0

Again, as the challenge description refers to wikipedia's definition of anagram where spaces can spawn or disappear into thin air (which is counterintuitive to me), I use IGN string for such magic characters. They will be ignored when comparing the strings from input.

The algorithm builds A list out of first input string minus the magic chars, then pop 's each element found in the other string. If the strings are anagrams, list A will have been emptied by the end of the iteration.

Time complexity is O(m+n) without any overhead this time (m to build list A, n to iterate over string s2) but memory consumption depends on the input string length rather than the size of alphabet input strings can be built of. For comparing whole books, I'd use my previous code instead :wink: