Creating a list of tuples from a list of tuples

Hi,

I am new to the Data Scientist career path, at 21%.

I have a list of tuples with 2 different objects. One object is a number value and the other is “yes”/“no” value. How would I go about creating a separate list with only the “no” values along with the associated number values, and another list with the “yes” values and it’s associated values? For ex: My dataset is the following:
records= [(17085.2676, ‘yes’), (17128.42608, ‘no’), (17178.6824, ‘yes’), (17179.522, ‘yes’), (17352.6803, ‘no’)]

If y = “yes”, I’d like a list with (values, yes) datapoints.
If y= “no”, I’d like a list with (values, no) datapoints.

Ultimately I’d like to find the average of the numbers in all the “yes” datasets and the average of the numbers in all the “no” datasets. I suppose this can also be done without separating the records list I have, but I figure that may be a little more advance.

Any guidance is appreciated it. As a reference, My lessons so far include loops, dictionaries, functions, lists, strings, classes.

1 Like

If you were going to perform your stated task yourself, how would you do it with pen and paper? Would you write down 2 separate lists, one with just the no’s and one with the yes’s? Would you instead, just use a calculator, and add up the no’s, then calculate the average followed by doing the same with the yes’s? You would probably do the later. You can do the same with code. Consider how you could iterate through the list of tuples, and accumulate the values for no’s and the values for yes’s. Keep in mind that the elements of a tuple can be accessed by their index, or they can be unpacked:
a, b = (5, 'no').

Example that may help you get started (if needed):
# apples and oranges
bags = [(15, 'a'), (22, 'o'), (26, 'a'), (15, 'a'), (7, 'o'), (19, 'o'), (3, 'o'), (11, 'a')]

apples = 0
oranges = 0

for bag in bags:
    n, f = bag
    if f == 'a':
        apples += n
    else:
        oranges += n

print(f'Apples: {apples}\nOranges: {oranges}')

# alternatively

apples2 = 0
oranges2 = 0

for bag in bags:
    if bag[1] == 'a':
        apples2 += bag[0]
    else:
        oranges2 += bag[0]

print(f'\nApples: {apples2}\nOranges: {oranges2}')

Output:

Apples: 67
Oranges: 51

Apples: 67
Oranges: 51

4 Likes

Hi midlindner,

This helped me so much! I’ve spent countless hours googling how to do this hoping I could figure it out on my own. Your suggestion definitely help me get started. I know the preferred method was to not separate out the lists into a “no” and a “yes” list, but I couldn’t figure out how to determine the number of occurrences of “no” using the count() method if I kept it as a single list (I’m sure there is a way). Therefore I created a new list with only the “no” and it’s values and then determined the length of it (same with the “yes” list) in order to find the average. Here is how my code turned out:

Output:

I’m sure there is a shorter way to doing this without the additional lines :confused: Hopefully I get better with experience.
Any additional feedback is welcome :slight_smile:

Good work! You came up with a solution that produces expected results. That’s great. The next step, if you feel like it, is to optimize what you came up with.

One thing you might consider is how many times you are repeating some of your calculations. Do they need to be computed with each iteration of the loop, or could they be computed after the loop is finished?

As to creating the 2 additional lists, do you use those lists for anything other than their length property? Turns out you do. You use them to get the highest and lowest cost, so creating the lists seems reasonable. If you wanted to reduce the amount of memory used (not a requirement), by not creating the extra lists, you might consider how you accumulated the total cost for smokers and non-smokers. You could also keep track of the low and high cost for each category with a variable for each.

1 Like

I took your advice!

I’ve learned a lot in this exchange with you. Thanks again!!

1 Like