FAQ: Eliminating Duplicates in an array

So recently I’ve been working on a JSON project in which I wanted to find the number of unique songs I had played on Spotify, and I ran into this problem.

How can I find out if I have already gotten that song?

Here’s my answer, simplified into letters:

x = ["a", "b", "c", "c"] // x is the array we're looping through
y = [] // y is the array which we want to put each unique song in

for(var i = 0; i < x.length; i++) { // a for loop to loop through the array
  if(y.indexOf(x[i]) === -1) {
    y.push(x[i]) // this pushes the value to y
  }
}

The only really complicated line is the if conditional statement. What this line basically does is it grabs the y array and asks what the index is of x[i] (which would be whatever value it is currently looping through e.g. x["a"] = 0 because that’s the index at which “a” is at.). If that item does not exist in the list, indexOf returns -1 which is why we compare it to -1. This is clear if you try running something like this in the console:

y = ["a", "b", "c"]
y.indexOf('z')
-> -1

Anyways, that’s the answer and explanation. I hope it helps anyone trying to create their own project and are having this problem.

Note: This concept was probably taught sometime in the codecademy learning exercises, but I don’t remember it :slight_smile:

When all we wish to do is poll values in an array, it is not necessary to access by index. ES6 gives us the of membership operator to simplify our loop for purely look-up purposes.

for (let u of x) {

}

Given we are using array.indexOf(), above, it may be safe to say that ES6 offers another method to test membership… array.includes().

if (! y.includes(u)) {
    y.push(u)
}

Now the only problem here is that we have generated an array free of duplicates, which runs contrary to the name of this FAQ… Find Duplicates in an Array. It may be just a mixup in naming the exercise/challenge, meaning what we have above is the accepted solution. That is understandable, but what if the objective is to find and name the duplicates? We need completely different logic to complete that task.

Please post a link to this exercise so we can get confirmation of the expected solution.

1 Like

Hey Roy,

It wasn’t really meant as an exercise in codecademy, it was a project I was working on in which this was the solution I used.

Okay, now the heading is more in line, however there still remains one problem. The duplicates have not been eliminated from the supplied array. To complete the process we would need to assign y to x.

return y

That’s assuming we have a function, else,

x = y.slice()

so it becomes detached from y.

 > const remove_duplicates = function (x) {
       y = []
       for (let u of x) {
           if (! y.includes(u)) {
               y.push(u)
           }
       }
       return y
   }
<- ƒ (x) {
       y = []
       for (let u of x) {
           if (! y.includes(u)) {
               y.push(u)
           }
       }
       return y
   }
 > a = [1,2,3,4,5,2,3,4,5,6,4,5,6,7,6,7,8,9,3]
<- (19) [1, 2, 3, 4, 5, 2, 3, 4, 5, 6, 4, 5, 6, 7, 6, 7, 8, 9, 3]
 > remove_duplicates(a)
<- (9) [1, 2, 3, 4, 5, 6, 7, 8, 9]
 > a
<- (19) [1, 2, 3, 4, 5, 2, 3, 4, 5, 6, 4, 5, 6, 7, 6, 7, 8, 9, 3]
 > 

So… Technically we haven’t yet removed any duplicates from a.

 > a = remove_duplicates(a)
<- (9) [1, 2, 3, 4, 5, 6, 7, 8, 9]
 > a
<- (9) [1, 2, 3, 4, 5, 6, 7, 8, 9]
 > 
1 Like

So far we’ve seen that what we learn drives us deeper into the rabbit hole. Here’s another venue…

 > remove_duplicates = x => Array.from(new Set(x))
<- x => Array.from(new Set(x))
 > a = [1,2,3,4,5,2,3,4,5,6,4,5,6,7,6,7,8,9,3]
<- (19) [1, 2, 3, 4, 5, 2, 3, 4, 5, 6, 4, 5, 6, 7, 6, 7, 8, 9, 3]
 > a = remove_duplicates(a)
<- (9) [1, 2, 3, 4, 5, 6, 7, 8, 9]
 > a
<- (9) [1, 2, 3, 4, 5, 6, 7, 8, 9]
 > 
1 Like

We are not done with iteration. Remember the question about naming the duplicates? We still haven’t answered that one.

One idea I have is to think of one element removed then tested against the array, given its absence. All we need is a true to confirm duplicatation. That value can be pushed to a return array.

Hey Roy,
I was working on this same project, but I was trying to remove duplicates from an object within an array.

So something like this:

let array = [{name:"Joe", age: 64}, {name: "Moe", age: 29}, {name: "Joe", age: 30}]

The values mean nothing, the only goal was to remove the two "Joe"s from the array. In the real array, I had over a thousand values which were all unknown to me so the solution had to work for unknown values as well. I think I have a suitable solution for the code, but I just wanted to know if you had any thoughts.

Guessing you caught that, by now… BRB.

Hashmaps have unique keys, if you insert your data into the same map and take the values back out then you’ll be left with only the unique ones. (Same for your original problem)

Hmmm @ionatan honestly I’ve never heard of that before… how would one do that?

do what

This right here, maybe I’m confusing your statement but what would you mean?

That is how, those are the actions you would carry out, that is already directly translatable to code

>>> list(set('abcadeb'))
['b', 'e', 'd', 'a', 'c']

I like the NF picture. I just wanted to let you know because I am a big NF fan and everyone makes fun of me for it. I just wanted to say that I like NF too.

1 Like

Did you end up making sense of that?
I get the feeling that you just gave it a blank look.

As with most problems, this breaks apart into subproblems that you can solve individually. So if there’s something you don’t understand, then what? That’d be something you could look up or worst case ask about.
I mentioned which data structure could be used for this. That is something you can google.
I also mentioned what you would do with it, those are actions that documentation for such a datastructure will say how they can be carried out.
And making it happen in javascript is something you’d do by finding out whether this data structure is provided by javascript. Totally google-able.

1 Like

All the above notwithstanding, the next time you wish to ask me a question, just DM me. All the wisdom and foresight may be lacking, but we could go down some interesting rabbit holes. We’re beginners, after all.

1 Like

@ionatan I think I get what you’re saying.

@mtf I may take advantage of that sometime in the future!

So you’d create keyvalue-pairs and insert them into an object:

Object.fromEntries(
  [ {name: "Joe", age: 64}
  , {name: "Moe", age: 29}
  , {name: "Joe", age: 30}
  ].map(p => [p.name, p])
)

And then take the values out.

Object.values(Object.fromEntries(
  [ {name: "Joe", age: 64}
  , {name: "Moe", age: 29}
  , {name: "Joe", age: 30}
  ].map(p => [p.name, p])
))

You could write that with loops instead as making insertions with assignment might be more familiar. You could also refactor it into a uniqueOn function:

uniqueOn(person => person.name, people)

You might also want to change what happens in case of collisions. If you wanted to keep the first one instead, then you could reverse the list of keyvalue-pairs that you’re inserting, so that entries that originally came first will now be the last ones inserted.

In addition to regular objects, there is Map with similar capabilities (it supports non-string keys, while js objects use strings as keys, though js has some severely limited comparison capabilities so you’re likely to use string as key anyway)

new Map(
  [ {name: "Joe", age: 64}
  , {name: "Moe", age: 29}
  , {name: "Joe", age: 30}
  ].map(p => [p.name, p])
).values()

hmmmm. my only problem with this is that it replaces the duplicate value with its new value. I would still like Joe to stay at 64, but I want to remove Joe @ age 30.

If ‘Joe’ could be identified by DOB, it makes the task a bit easier since Joe’s birthday could have been today and identifying by his past age would no longer work. That’s just a scenario where ‘age’ is computed by the system. It can change; DOB does not.

1 Like