Help With Election Result Project in Data Science Path

This is a question on Election Results, which is part of the “Statistics in Numpy” course.

The problem I’m having is with Step 6.

As we saw, 47% of people we surveyed said they would vote for Ceballos, but 54% of people voted for Ceballos in the actual election.

Calculate the percentage of surveys that could have an outcome of Ceballos receiving less than 50% of the vote and save it to the variable ceballos_loss_surveys .

Print the variable to the terminal.

The hint for this step is:

np.mean(array < 0.5)

Which I thought would mean this step was fairly simple. So I used this:

ceballos_loss_surveys = np.mean(possible_surveys < .5)

At first I was getting 0.0 as a result, so I checked the code in the tutorial video, and this is what the instructor provided:

possible_survey_length = float(len(possible_surveys))

incorrect_predictions = len(possible_surveys[possible_surveys < .5])

ceballos_loss_surveys = incorrect_predictions / possible_survey_length

Now when I run the two side by side, I get the same results (I can only assume I did something wrong and fixed it later but idk tbh).

Why would the code provided by the instructor be so much more complicated than the extremely simple line of code I used? If I get the same results, what’s the main difference?

Here is the entire code prior to this step:

total_ceballos = sum([1 for n in survey_responses if n == 'Ceballos'])

print(total_ceballos)

# 33

survey_length = float(len(survey_responses))

percentage_ceballos = 100 * total_ceballos/survey_length

print(percentage_ceballos)

#47.1428571429

possible_surveys = np.random.binomial(survey_length, .54, size=10000) / survey_length

plt.hist(possible_surveys, range=(0, 1), bins=20)
plt.show()
1 Like

The diffrence here is that you’re instructor did not use numpy. The math for calculating the mean of a array of numbers is as followed: mean = sum of all number / count of all numbers

Numpy has a function called mean() wich takes an array and returns the mean. You might be using np.mean() instead of the code your instructor provided but behind the scene the same stuff is happening.

1 Like

Could you please explain what would the code be like when using np.mean()?

total_ceballos = sum([1 for responses in survey_responses if responses == ‘Ceballos’])

for the above solution, I do not understand why this " 1 " in the syntax, someone can exlplain? thank you

Hi @cyhngtw,

The purpose of your list comprehension is to determine how many responses consist of 'Ceballos'. For each response where that is the case, a 1 appears in the resulting list. For each case where the response is not 'Ceballos', nothing is added to the list.

Then, after the list comprehension has completed its work, the sum function can reveal how many occurrences of 1 appeared in the resulting list.

Hallo thank you for your answer, if i wanna see other candidate name, then i could also choose 1 or should i change tp other number? Thanks again!

Choose 1, so that the sum function will provide a result equivalent to a count of how many times that candidate’s name occurs in the list. You could choose a different value if you want, then pass the resulting list to the len function instead.

1 Like

A bit easier to remember:

total_ceballos = sum(np.char.count(survey_responses,“Ceballos”))
print(total_ceballos)

Can someone please explain what are good axis labels for the histogram? I’m confused about what the histogram means beyond the fact that the bar heights add up to 10000.
Thanks