FAQ: Technical Interview Problems in Python: Lists - Remove Duplicates: Optimized

This community-built FAQ covers the “Remove Duplicates: Optimized” exercise from the lesson “Technical Interview Problems in Python: Lists”.

Paths and Courses
This exercise can be found in the following Codecademy content:

Technical Interview Practice: Python

FAQs on the exercise Remove Duplicates: Optimized

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

Make sure you have an in-place solution. Mutate the input list!

I get it that you are expecting this in the solution but isn’t this more efficient?

def move_duplicates(dupe_list):
	return len(set(dupe_list)) - 1

If so, why bother with the other test we you are just concerned with the quote? I feel like I’m being penalized for doing one step better? I just find it hard to imagine someone would want this below over what is above

def move_duplicates(dupe_list):
  unique_idx = 0
  for i in range(len(dupe_list) - 1):
    if dupe_list[i] != dupe_list[i + 1]:
      dupe_list[i], dupe_list[unique_idx] = dupe_list[unique_idx], dupe_list[i]
      unique_idx += 1
  dupe_list[unique_idx], dupe_list[len(dupe_list) - 1] = dupe_list[len(dupe_list) - 1], dupe_list[unique_idx]
  return unique_idx

more efficient for what? counting?

first of all, no, but also, that’s not the only result to produce, and it would fail the requirement of doing it using constant memory, and it also avoids implementing an algorithm for it

Correct which is why I stated from the start with

But as for

How is it not more efficient? There is no iterating, and its using what is built in which is len and set. Mean to tell me those two are less efficient than an iteration?

I’m not saying the problem is that it’s not in-place. I’m saying you’re not doing it at all. Your function cannot be used to de-duplicate something.

the input is sorted. finding unique values in sorted input is a small amount of work, you just need to iterate through it.

set necessarily iterates through whatever you give it. it also creates a set.

the required work is:
iterate through it
creating a set is:
iterate through it + build a set

To carry out the job you’d need to do something like:

from collections import Counter
from itertools import chain


def move_duplicates(xs):
    freq = Counter(xs)
    uniq = freq.keys()
    rest = freq - Counter(uniq)
    xs[:] = chain(uniq, rest.elements())
    return len(uniq) - 1


xs = [1, 1, 2, 2, 3, 3, 3, 4, 5, 6, 7, 7]
print(move_duplicates(xs))
print(xs)

but it’s not more efficient, not for time and not for space

if it was unsorted this might be a good idea. but because it’s sorted, the heavy work is already done

to make it space-constant though, you’d probably want to do an in-place sort and then iterate through that in the way the exercise intends

Think you are misunderstanding my initial post. My question was asked in regards if that would be optimal if you ignore the constraints. Yes, I understand it is incorrect with the given constraints. What I was getting at is that since it passed the test given, would that have been the optimal path taken if constraints was not a factor here?

It’s like implementing sort like this:

def sort(xs):
    return None

and then testing:

xs = [2, 1, 3]
assert sort(xs) is None

yeah it passes, it returns the right value, but there was no sorting

can’t really argue it passes tests if what it was supposed to do isn’t tested

the name of your function is move_duplicates, so it should do that, at a minimum, before talking about how efficiently it’s doing so. talking about how efficiently the task isn’t carried out … it’s not meaningful

but if you redefined the task to be to count the number of unique values and return that amount minus 1 –
then it’s still less efficient any way you can measure other than code length

THIS would be a minimal test of that function:

xs = [1, 1, 2, 2, 3, 3, 3, 4, 5, 6, 7, 7]
i = move_duplicates(xs)
assert i == 6
assert xs[:7] == [1, 2, 3, 4, 5, 6, 7]
assert sorted(xs[7:]) == [1, 2, 3, 3, 7]

all of those things is core to what the function is supposed to do, ignoring all performance constraints

the best way to implement it to satisfy the test and no other constraints would be this:

def move_duplicates(xs):
    return 6

you see why I’m insisting on that it should do a few more things than that?

Yes, yet no. What I see is that I somehow got on your bad side because those past two examples seem to continue to miss the mark on what I am asking, an I’m just giving up on trying to see the point of that when it doesn’t appear to have anything to do with this

print("{0}\n should equal \n{1}\n {2}\n".format(move_duplicates(['a', 'a', 'g', 't', 't', 'x', 'x', 'x']), 3, move_duplicates(['a', 'a', 'g', 't', 't', 'x', 'x', 'x']) == 3))

print("{0}\n should equal \n{1}\n {2}\n".format(move_duplicates(['a', 'a', 'c', 'c', 'd', 'd', 'e', 'e', 'f']), 4, move_duplicates(['a', 'a', 'c', 'c', 'd', 'd', 'e', 'e', 'f']) == 4))

print("{0}\n should equal \n{1}\n {2}\n".format(move_duplicates([13, 13, 13, 13, 13, 42]), 1, move_duplicates([13, 13, 13, 13, 13, 42]) == 1))

Like, its to the point I almost feel like its patronizing and I rather ask for a refund of what remains on what I paid in.

You’re asking if your implementation is more efficient than another one.
But it doesn’t do what the other one does.
Which is more efficient of two things that do different things? This is not comparable.

You have to establish what the function should do first. If it’s supposed to move values, then it’s supposed to move values. Right? The return value is only part of the result, and a very small part. The main part of the result is that something gets moved.

Moving those values isn’t related to HOW the task should be accomplished. It is the task.

I did answer your question - even if you did only want the return value, no that is not more efficient. I explained why it isn’t more efficient.

We disagree on whether modifying the list is a hard requirement.
Read the function name. Read what the instructions are talking about.
The function name says move. the instructions say de-duplicate. neither says count.
Finding that count is not the purpose of the function.