Thread Shed - Self referencing list comprehension

I’m working through the Thread Shed project and on the section to create a list of each color sold, I wrote the following:

colors = [item for item in thread_sold_split if item not in colors]

which is supposed to “filter” the bulk data in thread_sold_split and populate the new list colors with unique values. However, I came across the issue of a list comprehension that references itself, and was not able to figure out how to rewrite this in a way to keep the lst comprehension and not have to resort to a for loop. Is it somehow possible to keep the “one-liner” solution?
Also, while on the topic of list comprehensions, is there any advantage to using it other than readability (efficiency vs. readability)?

TIA.

1 Like

you wouldn’t use a list to do repeated membership testing, there are better data structures for that (for small sizes it’s efficient, as a general algorithm though, not so much)

if it’s for counting the amount of each color then there’s no need to know the colors before you start

1 Like

list comprehensions tend to have quicker compute times.

All list comprehensions can be written as for loops, but not all for loops can be written as list comprehensions.

And in terms of readability, it’s more readable when it’s a single line, but there are limited cases where multi-lined list comprehensions are easier to read.

The intention was to avoid having to manually insert each item in the list of the different colors sold. The exercise just has us explicitly writing the list out, but it seemed to me like the ideal thing to automate (if the whole Pantone catalog of colors was sold, I wouldn’t go about it including each color myself - automate the boring stuff! :stuck_out_tongue:). So the loop would iterate through the bulk list and only append unique values.

colors = []
for item in thread_sold_split:
  if item not in colors:
    colors.append(item)

I guess that is the best solution?

I read something about a set comprehension that could be used to only retrieve unique values, but I’m not sure how or if it could work in this use case (haven’t learned about it here yet).

just delete it there’s nothing that needs doing it’s all redundant

look instead at what you “need” it for. what operation is it you are doing on this? what data structure supports that operation? does that operation for that data structure require this data to exist?

you do not want to do repeated membership testing on a list

that’s a property of set, not of set comprehension, set comprehension is just a way of feeding values into a new set, the comprehension aspect doesn’t remove anything, adding the same value twice to a set is what removes, or rather overwrites

What you’re actually doing, is repeatedly inserting counts of 1 for a color, combining existing entries with new ones using (+)

So you’re doing a lookup for the current value, adding 1, inserting the result.

You have opportunity there to deal with it being missing, you can lookup with a default value

if you were to limit yourself to using list (not dict), then you can sort them

["blue","green","purple","red","red","white","yellow","yellow","yellow","yellow","yellow"]

group by value

[["blue"],["green"],["purple"],["red","red"],["white"],["yellow","yellow","yellow","yellow","yellow"]]

then for each group you have a color and a length

[("blue",1),("green",1),("purple",1),("red",2),("white",1),("yellow",5)]
1 Like

Hm, I think you might be referring to a previous step where we count how many of each color was sold.

I’m talking about steps 21 and 22 where we use a list that contains the individual colors offered so then we can later print the individual color and how may of each was sold:

  1. Define a list called colors that stores all of the colored threads that Thread Shed offers:
colors = ['red','yellow','green','white','black','blue','purple']
  1. Now, using the list colors , the string method .format() , and the function color_count , iterate through thread_sold_split and print a sentence that says how many threads of each color were sold today.

I didn’t want to copy and paste the list written out in step 21, so I tried to retrieve the information from the bulk list in order to accomplish steps 21 and 22.

colors = []
for item in thread_sold_split:
  if item not in colors:
    colors.append(item)

for color in colors:
  print("A total of {} {} threads were sold today.".format(color_count(color), color))

Oh, sorry, I didn’t see this before I posted the last reply.
This looks better!
Then for the last print statement, I would just have to unpack the tuples with color and length? No need for the count function?

it’s not about which function, it’s about which actions

would you rather step through the list adding 1 to each color, or would you step through it once to figure out what colors there are, then for each color step through the list and count that color?

figuring out which colors exist is nearly the same action as counting how many of each color there is
after having carried out roughly that action, you shouldn’t have to do anything else

you start out with a bunch of colors:

["blue","yellow","red","green","purple","white","yellow","yellow","yellow","yellow","red"]

each one is worth 1

[("blue",1),("yellow",1),("red",1),("green",1),("purple",1),("white",1),("yellow",1),("yellow",1),("yellow",1),("yellow",1),("red",1)]

insert each pair, combining with (+) on conflict

If you do membership testing on a list on each insertion, then you end up doing N*N amount of work where N is the number of colors

If you sort them, which takes N * log N amount of work, then you can easily group and count, so this is better

But dict does insertion and lookup in constant time (1), and you have N insertions, so that would take 1 * N … which is N, amount of work

if N is a million, then the first is a trillion work units, the second is about 20 million and the third is about 1 million. 20 and 1 is roughly the same thing so you can expect those two methods to take about the same amount of work while the one that is doing membership testing on a list … you’d probably go want to get a coffee, (unless you know it’s a small amount of unique colors which would keep the list small) past a million we might be talking about a vacation or retirement instead of a coffee.

1 Like

Aah, now I see. I’m not fully versed on Big-O/time complexity (and my math is a bit rusty), but I can see what you mean.
Thank you so much for the input, I’m going to rework the code with that in mind.

Ok, so I’ve been working on this and after trying to leverage your tips and what I could make sense out of of several stackoverflow posts, this is what I came up with:

Leveraging .sort(), len() and itertools.groupBy()
colors = [(color, len(list(group))) for color, group in itertools.groupby(sorted(thread_sold_split)) ]

print(colors) #outputs [('black', 26), ('blue', 22), ('green', 30), ('purple', 17), ('red', 24), ('white', 28), ('yellow', 34)]

for c, q in colors:
  print("A total of {0} {1} threads were sold today.".format(q, c))

#outputs A total of 26 black threads were sold today
#A total of 22 blue threads were sold today.
#A total of 30 green threads were sold today.
#A total of 17 purple threads were sold today.
#A total of 24 red threads were sold today.
#A total of 28 white threads were sold today.
#A total of 34 yellow threads were sold today.

But I have to be honest and say that I did not fully understand how groupBy() works (haven’t learned it here yet), even after reading the docs. What exactly is group before we turn it into a list?

Maybe these are simply a bit too advanced for where I am in the course right now, but the issue is that I also tried to write it out naively (how to group and count), but could not crack it.

If anyone else is also struggling, these were helpful:
[StackOverflow] How do I use itertools.groupby()?
[BinderFullofCode] Create Groups From Lists With itertools.groupby()

right but how much time does it take to plant a square field where each side is 1000m long? that’s pretty much the extent of the math involved, a rough count of the steps

that’s fair, its behaviour is a bit unfortunate, but this says it all:

create groups of consecutive equal values

you can implement that yourself. compared to codecademy things it’s probably really difficult but at the same time, it’s something you can trivially carry out manually, which is saying that you know how it’s done.

what matters is thinking about how the information is moved around and considering how much work that is. writing the matching code is background noise.

dict however, it can do it better, its lookup times are independent from the number of entries.

for list, sorting is a way to get equal values grouped together, making each value easy to count. dict can skip that altogether.

implementing group is probably a really good exercise. go for it. it is a little bit trickier than one might at first think, but there’s nothing difficult involved other than fully considering what’s involved (which from I can tell codecademy doesn’t teach whatsoever). the operations involved are list.append and iterating through list, comparing two things … conditions … creating empty lists