FAQ: Usage Funnels - Survey Result

This community-built FAQ covers the “Survey Result” exercise from the lesson “Usage Funnels”.

Paths and Courses
This exercise can be found in the following Codecademy content:
https://www.codecademy.com/paths/data-science/tracks/analyze-data-sql-analyze-real-data/modules/analyze-data-sql-usage-funnels/lessons/sql-funnels/

FAQs on the exercise Survey Result

There are currently no frequently asked questions associated with this exercise – that’s where you come in! You can contribute to this section by offering your own questions, answers, or clarifications on this exercise. Ask or answer a question by clicking reply (reply) below.

If you’ve had an “aha” moment about the concepts, formatting, syntax, or anything else with this exercise, consider sharing those insights! Teaching others and answering their questions is one of the best ways to learn and stay sharp.

Join the Discussion. Help a fellow learner on their journey.

Ask or answer a question about this exercise by clicking reply (reply) below!

Agree with a comment or answer? Like (like) to up-vote the contribution!

Need broader help or resources? Head here.

Looking for motivation to keep learning? Join our wider discussions.

Learn more about how to use this guide.

Found a bug? Report it!

Have a question about your account or billing? Reach out to our customer support team!

None of the above? Find out where to ask other questions here!

How would we calculate the percentages using SQL?

The text says “We could use SQL to calculate the percent change between each question, but…”.

The truth is i’m not clear on how to do maths and calculations like that in SQL. Can anyone give advice on how you’d use SQL to do this?

6 Likes

how do you send sql data to a spreadsheet?

I have the same question, this would be great to have covered here for the people who are interested in doing this calculation in SQL.

To calculate the percent change between questions, you simply divided the number of responses from that question by the number of responses in the previous question and multiply by 100%.

This code is not elegant but gets the job done.

CREATE TABLE denom (

counts NUMBER,

offcounts NUMBER

);

INSERT INTO denom (counts,offcounts)

VALUES (500,500);

INSERT INTO denom (counts,offcounts)

VALUES (475,500);

INSERT INTO denom (counts,offcounts)

VALUES (380,475);

INSERT INTO denom (counts,offcounts)

VALUES (361,380);

INSERT INTO denom (counts,offcounts)

VALUES (270,361);

#Here I create a table with the counts column, and an offset count column to account for the first question during the arithmetic argument below. counts is multiplied by 1.0 to switch out of integer division.

SELECT (counts*1.0/offcounts)*100 AS “Percent change between questions”

FROM denom;

This method works for the aforementioned problem, but would not scale in larger datasets because of the manual input of table creation. I suppose an alternative method exists involving joining tables, but I ran into issues trying to join tables based on aggregated counts.

Hi @somkyd
See if this helps.

My guess is that we could connect the databases in excel/ Tableau/ Power BI (write a query there itself to retrieve data). Though would be great if anyone from @Codecademy team could weigh in as well.

Cheers,
Kabir

I tried the following but it does only calculate the percentage for the first row:

SELECT question_text
, Number
, Total
, Number / Total * 100 AS “Percentage”
FROM (
SELECT question_text
, COUNT(DISTINCT user_id) AS “Number”
, (SELECT
COUNT(DISTINCT user_id)
FROM survey_responses
WHERE question_text LIKE ‘1.%’) AS “Total”
FROM survey_responses
GROUP BY 1);

Does anyone have an idea how to tweak it?

Utilize the lag function.

Creating a funnel, I am selecting the question_text and sorting by that, selecting the number of total responses to that question,

SELECT
question AS ‘Question’ ,
COUNT(DISTINCT(user_id)) AS ‘total_num_completed’
FROM
survey
GROUP BY
1;

and then using the lag function, creating a third column that is the same values as the second column, the total count, only in this column with the lag function, it will move it down one row.

NOTE you do need to use the OVER function to create a premature ORDER BY clause so that the lag function knows in what order the rows are so it can correctly grab the value from above. I am ordering by question_text

SELECT
question AS ‘Question’ ,
COUNT(DISTINCT(user_id)) AS ‘total_num_completed’,
LAG(COUNT(DISTINCT(user_id))) OVER (ORDER BY question)
FROM
survey
GROUP BY
1;

So first question is total number of 500 responses. My third column with lag will be null as there are no values before it.

My second question has total number of 475 answers but my third column now has 500 from the previous row.

Now just throw in a calculation to divide the current value by the lag value.

Good habit to use 1.0 * (equation) as to keep the result from automatically rounding down this calculation

SELECT
question AS ‘Question’ ,
COUNT(DISTINCT(user_id)) AS ‘total_num_completed’,
1.0 * (COUNT(DISTINCT(user_id))) /** (LAG(COUNT(DISTINCT(user_id)
)) OVER (ORDER BY question))
FROM
survey
GROUP BY
1;

Then I will throw a ROUND function at the beginning, round to 4 decimals, then times the whole thing by 100 to get a percentage with two decimal places.

My end code looks like this

SELECT
question AS ‘Question’ ,
COUNT(DISTINCT(user_id)) AS ‘total_num_completed’,
ROUND(1.0 * (COUNT(DISTINCT(user_id))) / (LAG(COUNT(DISTINCT(user_id)
)) OVER (ORDER BY question)) , 4) * 100 AS
‘perc_completed_from_prev’
FROM
survey
GROUP BY
1;