[Challenge] Anagram Detector

challenge

#11

Here is my solution in python. I hope it covers intermediate problem as well.

def isAnagram(s1,s2):
	# init dictionary for storing chars and their counts (has O(1) for set and get)
	myMap = {}

	# count of distinct characters
	chars = 0

	# going through string 1 (s1)
	for ch in s1:
		if ch not in myMap:
			# if current char is not in dict, add it with value 1 and inc. chars value by 1
			myMap[ch] = 1
			chars += 1
		else:
			# if current char is in dict, add 1 to its number of occurences
			myMap[ch] += 1
	
	# going through string 2 (s2)
	for ch in s2:
		if ch not in myMap:
			# if current char is not in dict (from string 1), char is new and therefore return 0
			return 0
		else:
			# if char is in dict, substract 1 from its number of occurences
			myMap[ch] -= 1

		if myMap[ch] == 0:
			# if number of occurences in the current string is for the char the same as in the s1, substruct 1 from number of expected chars
			chars -= 1

		if myMap[ch] < 0:
			# in case of number of occurences of the char in s2 is greater then s1 return 0
			return 0

	if chars == 0:
		# if all chars has been used in both strings equaly return 1
		return 1
	else:
		# some chars has not been used in s2, so return 0
		return 0

UPDATE 1: Added comments to the code


#12

Improved PHP solution stripping spaces and using a lambda for dryer code.

function stringsAreAnagrams($string1, $string2)
{
    $stringToSortedArray = function ($string) {
        $stringArray = str_split(str_replace(' ', '', $string));
        sort($stringArray);
        
        return $stringArray;
    };
    
    return $stringToSortedArray($string1) === $stringToSortedArray($string2);
}

https://repl.it/H2jv/0


#15

Hi, here is my solution in JavaScript.

I’m sorry if my english is malformed somewhere, but it is not my native language.

This works by looping one time through both strings. Inside this loop I collect the number of occurrences of each character on each string. I collect these counts in two arrays, one for each string. In these arrays the indexes correspond to the char codes, and the values correspond to the counts.

After the loop I compare the two arrays and if they have the same count for the same characters, the strings are anagrams.

I’m assuming the strings will only contain uppercase letters and spaces.

function isAnagram (string1, string2) {
    // These arrays will collect the counts.
    var arr1 = [], arr2 = [];
    var i = 0;
    // One loop to traverse both strings.
    // This loop will run n times, where n is the length of the longest string.
    while (i < string1.length || i < string2.length) {
        // It could happen that 'i' increases beyond the length of one of the two strings so first I check if there is a character at this index in the strings, and if this character is not a space. If it is a space it will just jump this character.
        if (string1.charAt(i) != '' && string1.charAt(i) != ' ') {
            // Then I increase the count for this character in the corresponding array.
            // The first time a character occurs I asign 1, the next times I increase 1.
            if (arr1[string1.charCodeAt(i)]) {
              	arr1[string1.charCodeAt(i)] += 1;
            }
            else {
              	arr1[string1.charCodeAt(i)] = 1;
            }
        }
        // Do the same for the other string.
        if (string2.charAt(i) != '' && string2.charAt(i) != ' ') {
            if (arr2[string2.charCodeAt(i)]) {
              	arr2[string2.charCodeAt(i)] += 1;
            }
            else {
              	arr2[string2.charCodeAt(i)] = 1;
            }
        }
        // Increase 'i'
        i++;
    }
    // After the loop I compare the two arrays, and if they have the same counts for the same characters, the strings are anagrams.
    i = ('A').charCodeAt(0); // Get the char code of the character 'A'
    while (arr1[i] === arr2[i] && i <= ('Z').charCodeAt(0)) {
     	  i++;
    }
    // After the loop if 'i' has increased beyond the char code of 'Z', all the counts were equal.
    return (i > ('Z').charCodeAt(0) ? '1' : '0')
}

https://jsfiddle.net/dwrdem7d/12/
https://repl.it/H2mo/4


#16

Hey all,

Just a reminder to please try and include a link to the code you submit in https://repl.it/, if possible, for easier testing - this will help us test the solutions you propose and choose the best solution out of those that work.

You may still paste in raw, markdown-formatted code in your post, but we ask that you try and give us a link to your code in repl.it for ease of testing and evaluation also.

Happy coding!
~CC mods


#17

Here's my entry, using pure Javascript. My thinking is that each word or phrase is really just a set of letters, and two anagrams have matching sets of letters. My function arranges the letters so they can be compared, and then sees if they are a match.

var is_anagram = function(str1,str2){
   str1 = str1.replace(/\s/g,'').toUpperCase().split('').sort().join('');
   str2 = str2.replace(/\s/g,'').toUpperCase().split('').sort().join('');
   if (str1 === str2){
   //It is an anagram!  Return 1
   return "1";
   }else{
   //It's not a match.
   return "0";
   };
};

On each string:
1. remove whitespace with regex
2. convert to uppercase
3. split into an array to make it possible to sort
4. sort the array (alphabetically)
5. convert back to a string (because it's harder to compare arrays without looping)
After this process, strings that are anagrams should produce the same string. If the produced strings don't match, then you don't have an anagram.

My working example is here: https://jsfiddle.net/clintongalloway/cfb0Lhxc/
I just saw that you prefer this so I took a few seconds to upload this https://repl.it/H2oZ/1


#18

Hi everyone,

Here I present you my Python code for solving the challenge so you can read it and test it yourselves. Right after it you will find a brief explanation on how the code works and how I came up with it:

https://repl.it/H2tf/4

So, if you have looked at the code you would have noticed that the most substantial part is contained in the anagrams function, which contains the logic that can tell if two strings are anagrams in O(n+m).

In order to be able to comply the time complexity constraint I assumed that both input arguments satisfied the precondition of being strings composed of uppercase ASCII letters. Then, if I wanted to accomplish the time complexity goal, I knew that I needed to iterate each argument one time at most. So, I came up with this idea of iterate over the first string and count the occurrences of each letter, and then iterate over the second string while trying to “undo” the counting I did before. This way, if the strings were actually anagrams, all my letters counts would be zero by the time I had finished iterating over the second string, and I wouldn’t have found any letter apart from the ones I encountered in the first string.

So, if we inspect the code of the anagrams function you will see that basically I use a dictionary for counting the letters occurrences. The first for loop is where I define each letter from the first word as a key in the dictionary, and I increment its value as I see them along the string:

for letter in first_string:
        letters_count[letter] = letters_count.get(letter, 0) + 1

Later on, in the second for loop, I walk through each letter of the second string and I retrieve from the dictionary its number of occurrences:

    for letter in second_string:
        occurrences = letters_count.get(letter, None)

If that value is higher than 1, then I decrease it and store it back, as I already managed to match a pair of the same letter in both strings:

if occurrences > 1:
            letters_count[letter] -= 1

Otherwise, if that value is exactly 1, that means I have just matched the last occurrence of that letter in the first string. Hence, I remove the letter from the dictionary as I shouldn’t be expecting to encounter this letter again while I finish my second string iteration.

if occurrences == 1:
            letters_count.pop(letter, None)

Lastly, if that value is None, it means that I have found a letter that I cannot match to an occurrence of the same letter in the first string. Therefore, I can immediately conclude that the strings are note anagrams and return False:

        if occurrences is None:
            return False

By the end of the loop, if no keys are left in the dictionary (meaning that I was able to match each letter from the second string with a letter from the first string and no letter from the first string remained without match), then the words are anagrams. Otherwise, they are not.

return True if not letters_count else False

On top of that, I defined a validate_string function that runs some checks to the guarantee those assumptions I made for the anagram function inputs, and it raises some custom made exceptions in case one of those checks fails.

Then I assembled both of the previous mentioned functions inside a wrapper function called anagram_detector, which also is responsible for translating the boolean value returned by anagram into a 1 or 0 accordingly.

Finally, I added some tests cases and an interactive routine that allows the user to run the whole test suite or just type in a pair of strings for detecting if they are anagrams.

Well, that is all about the code. I hope you liked it or you find it useful.
Thanks for reading it :slight_smile:


Carlos.


Exception handling
#19

A post was split to a new topic: Read those docs


#20

A post was split to a new topic: Exception handling


#21

function isAnagram(word1, word2){
   var word2Arr = word2.split("");
   for(var letter of word1){
        var index = word2Arr.indexOf(letter);
        if(index == -1){
            return "0";
        }
        word2Arr.splice(index,1);
   }

   if(word2Arr.length == 0){
       return "1";
   }else{
       return "0";
   }
}

The function isAnagram takes two arguments: word1 and word2.
Then word2 gets spilled into an array of charactes called word2Arr. Then word1 gets looped through. Each letter gets queried in the word2Arr array. If a match is found, that letter gets removed from the word2Arr array. If a match is NOT found, it means word1 contains a letter that word2 does not, and is thus not an anagram of word2, and returns "0". Once the loop has finished, the length of word2Arr is checked. If all the characters have been removed (an empty array), it means the word is an anagram and "1" is returned.


#22

My solution in Java

import java.util.Arrays;
class Input{
    static int find(String x, String y){
        String s1=x.replaceAll("\\s","");
        String s2=y.replaceAll("\\s","");
        boolean state=true;
        if(s1.length()!=s2.length()){
            state=false;
        }else{
            char[] s1Array=s1.toCharArray();
            char[] s2Array=s2.toCharArray();
            Arrays.sort(s1Array);
            Arrays.sort(s2Array);
            state=Arrays.equals(s1Array,s2Array);
        }
        if(state){
            System.out.print("Anagrams");
            return 1;
        }else{
            System.out.print("Not anagrams");
            return 0;
        }
    }
}
class Anagram{
   public static void main(String...s){
       Input.find("TOM MARVOLO RIDDLE","I AM LORD VOLDEMORT");
   }
}

Following steps are involved in the code:
1. Remove the white spaces using Regex
2. Convert the Strings into character arrays
3. Sort the characters in the array
4. Use the equals() method to check for matches
5. Return "1" if they are anagrams and "0" if they are not anagrams


#23

Since the input is guaranteed to be strings of spaces and uppercase letters, removing the spaces and comparing the sorted strings will do in Python 3:

def isanagram_slow(s1, s2):
    if [a for a in sorted(s1) if a!=" "]==[b for b in sorted(s2) if b!=" "]:
        return 1
    return 0
#    return 1 if sorted(s1)==sorted(s2) else 0

This is inefficient but compact easy version. Should whitespace be treated like any other char, the hashed line is the whole function in an even more compact form.

Intermediate difficulty version:

def isanagram(s1, s2):
    A=[0 for i in range(27)]
    B=[0 for i in range(27)]
#    A=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
#    B=[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    for i in s1:
        if i == " ": pass
#        if i == " ": A[0]+=1
        else: A[ord(i)-64]+=1
    for i in s2:
        if i == " ": pass
#        if i == " ": B[0]+=1
        else: B[ord(i)-64]+=1
    return 1 if A == B else 0

A & B will hold the amount of char occurencies in each string. In this case, position 0 counts spaces, others count letters. According to wikipedia, whitespaces don't count (eg "A B C" == "ABC") so I pass them, but they can be enabled by uncommenting the hashed tests.

Should list comprehension be considered to put a dent in the O(m+n) time for shorter strings, A&B arrays can be defined like the hashed examples. The code could easily be scaled to handle all the chars recognized by Python at the memory cost of 2*1,114,111 bytes (valid chr() range according to docs) for the A & B arrays.


#24

Explanations are in comments.

https://repl.it/H42L/0


#26

Hi, my first challenge :smiley: Solution on python:

def anagram_tester(s1,s2):
  if len(s1)!=len(s2):
    return 0  
  s = [0]*23
  for a in s1:
    s[ord(a)-ord('A')]+=1
  for a in s2:
    s[ord(a)-ord('A')]-=1
    if s[ord(a)-ord('A')]<0:
      return 0
  if sum(s)==0:
    return 1
  else:return 0    
    
s1,s2 = "ANAGRAMANAGRAMAMANAGR","RAMANAGRGRAMANAGAMANA"
print ("Anagram? %d" % anagram_tester(s1,s2))

It’s basically creating a list of the number of presences of each letter and increment on the first word and decrement on the second, the result should be that first there’s no negative numbers, and second the sum should be zero.

https://repl.it/H5NZ/3

Any comments appreciated.


#27

Link to Python code on repl.it:
Anagram Tester from Anupama.

Explanation:

ifelse loop logic within the function:

# 1) if length of the two strings do not match, then the strings are NOT anagrams. Stop checking. (iteration = 1)
# 2) if the number of unique letters in the two strings are NOT equal, then strings are NOT anagrams. Stop checking. (iteration = 2)
# 3) Convert to upper case and Sort the strings. The two strings should match at every index. If not, they are not anagrams. Stop checking. (iteration = 2 + length(string1) )

O = (number of iterations) = length(string1) + 2

Raw code:

def anagramfn(a, b, strflag):
  if(len(a) != len(b) ):  
    strflag = strflag + 1
  elif len(''.join(sorted(set(a.lower()))) ) != len(''.join(sorted(set(a.lower()))) ):
    strflag = strflag + 1
  else: 
    a = a.upper()
    a = ''.join(sorted(a))
      
    b = b.upper()
    b = ''.join(sorted(b))
      
    ptr = 0
      
    while(ptr < len(a)):
      if(a[ptr] != b[ptr]):
          strflag = strflag + 1
      ptr = ptr + 1
      
  if strflag > 0:
    print("NOT Anagrams")
  else :
    print("Anagrams")
    
  return;

  
# initialize test strings:
a = 'practi ce'
b = 'ce racpti;'

# initialize flag to indicate whether the two strings are anagrams or not.
# strflag = 1 implies the strings are true anagrams.
strflag = 0

# accept the two strings:
print("Anagram Tester")
a = input("Input String1: ")
b = input("Input String2: ")

anagramfn(a, b, strflag) ;

#28

A post was split to a new topic: Validation for incorrect inputs


#29

Like last week, I thought I would take a slightly more functional approach to this challenge, and like most people, I figured comparing two sorted lists is the way to go. While a lot of people just used Python’s sorted() function, I figured that if I want to make things more interesting, I could implement my own quicksort function. The quicksort I used is inspired by the elegant Haskell implementation of quicksort. Here it is:

https://repl.it/H54D/2

Now for a quick explanation:

First, I define the quickSort function, which takes a pivot (the first element in a word or list) and then creates two lists, one of which contains all the values in the list after the pivot greater than or equal to the pivot, and one of those values less than the pivot. Then, it puts the pivot between these lists and sorts both of these lists using the same method recursively.

The actual anagram function is pretty straightforward. All that this does is convert the supplied words into a usable format, sorts them, and compares them to see if they are the same. If they are, then they are anagrams, otherwise they are not.

Finally, let’s talk about efficiency. I realize that this is definitely not O(M + N), since it uses quicksort, which is at best O(NlogN). This program is probably slower than this, though, due to the other stuff involved, but the purpose of this exercise for me is to practice my functional programming skills with Python, so I don’t mind the slowness.

Note: here is the wikipedia article on quicksort for anyone looking to read about it in more depth https://en.wikipedia.org/wiki/Quicksort


#32

A post was merged into an existing topic: Validation for incorrect inputs


#34

A solution in Python which has a O(n+m) time complexity. It iterates through the first string and counts up the occurences of each character, and then does the same for the second string but decrements the count instead. If the two strings are anagrams then the count for every character should be 0.

Decided to handle all 128 ASCII codes so that the function works with all ASCII characters and not just capital letters, and it removes the need to subtract a base value from each of the ASCII values to convert it to a 0-based index.

def is_anagram(s1, s2):
    # can return 0 immediately if the strings are different lengths
    if len(s1) != len(s2): 
        return 0
    # for counting the ASCII characters 0 to 127
    counts = [0]*128
    # increment the count for each character in the first string
    for c in s1:
        counts[ord(c)] += 1
    # decrement the count for each character in the second string
    for c in s2:
        counts[ord(c)] -= 1
    # the strings are anagrams if all the counts cancel each other out
    return 0 if any(counts) else 1

There is a simple optimisation to check that the strings are the same length at the start of the function; if the strings are different lengths then they cannot be anagrams. Would be significant improvement for very big strings.

A totally inconsequential optimisation (but why not) is the use of the any() function instead of sum(). It is ever so slightly faster as it only needs to do a comparison with 0 instead of addition, and it does lazy evaluation, meaning it will return as soon as a non-zero value is found.

%timeit sum([0]*128)
100000 loops, best of 3: 2.48 µs per loop

%timeit any([0]*128)
100000 loops, best of 3: 2.07 µs per loop

%timeit sum([1]*128)
100000 loops, best of 3: 2.47 µs per loop

%timeit any([1]*128)
1000000 loops, best of 3: 764 ns per loop

Out of interest, I compared the runtime with a solution that use sorting to compare the strings (shown below). The time complexity for sorting is O(n log(n)), but for short strings the function below is faster. The point where the function above is faster is about strings of length 70 on my laptop.

def is_anagram(s1, s2):
    return 1 if len(s1) == len(s2) and sorted(s1) == sorted(s2) else 0

#35

I had this idea as well, but I don't think it works! Just adding up the ASCII values would mean it would see these two strings as anagrams, "AD" and "BC".

I thought maybe you can use the product but it would only work if each letter was represented by a prime number and the calculation of that product would probably make the function slow compared to the other methods, especially for large strings.


#36

Here is my entry using Python 3.6.1 hosted on repl.it:

https://repl.it/H6EZ/0

My first bit of thinking was to be sure to get the input from the user and prompt them as to what words they would like to compare. I then assigned another set of variables equal to the original input ran through the built-in .upper() function, to guarantee that the function would be handling uppercase letters, as per the challenge rules. Next up was converting the strings into lists with a for loop to iterate through each of the words and append each iterated item to the appropriate list. After assigning the lists to sorted versions of themselves, I then compared the sorted lists to each other and returned 1 for anagrams, and 0 for not anagrams.
Code below for those who do not wish to follow links:

def anagram():
    #ask the user for words to compare
    print('What two words would you like to compare and see if they are anagrams?')
    print('First word: ')
    word1 = str(input())
    print('Second word: ')
    word2 = str(input())

    #converts the strings from the user to uppercase
    word1Upper = word1.upper()
    word2Upper = word2.upper()

    #creation of empty lists for conversion from strings to lists
    word1u = []
    word2u = []
    #for loops do the work of making lists from the strings
    for char in word1Upper:
        word1u.append(char)

    for char in word2Upper:
        word2u.append(char)

    #sorting the letters into sequential order so that Python can compare the lists
    word1u = sorted(word1u)
    word2u = sorted(word2u)

    #condition that compares the sorted lists to each other to see if they match
    if word1u == word2u:
        #returns 1 if they match
       """ print('anagram')"""
       return 1

    else:
       """ print('not an anagram')"""
        #returns 0 if they don't
       return 0

    #print(word1u, word2u)

Time complexity was not taken into consideration when devising this solution as I’m still definitely a beginner.