Lead lag command

Hey, my question is about the following exercise:

https://www.codecademy.com/paths/data-science/tracks/analyze-data-sql-analyze-real-data/modules/analyze-data-sql-usage-funnels/projects/analyze-data-sql-usage-funnels-with-warby-parker

Im trying to solve 3.

Thats my current attempt, that works, which means it returns a table, the percentages arent the ones im looking for.

instead of deviding by 500 i want to devide the column by itsself leading 1, so i get the percentage of users who answer each Question from the previous one.
So instead of 500 I want to insert:

LEAD(funnel.qcompletet) OVER(ORDER BY funnel.question)

WITH funnel AS (
SELECT question,
COUNT(DISTINCT user_id)
AS ‘qcompleted’ FROM survey
GROUP BY 1)
SELECT funnel.question,
funnel.qcompleted,
(funnel.qcompleted * 100.0 / 500)
AS ‘pcompleted’
FROM funnel
GROUP BY 1
ORDER BY 1;

Does anybody know how it is supposed to work?

Hello :slight_smile: Welcome to the forum.

Such an interesting question!

This would select the value from the next row in relation to the current row, not from the previous one. And you should define a default value, because if the row you are querying does not exist then you will get a NULL. You can read more about window functions in the documentation -> https://www.sqlite.org/windowfunctions.html#built_in_window_functions.


I would solve this problem by writing this query:

WITH funnel AS (
  SELECT 
    question, 
    COUNT(DISTINCT user_id) AS 'qcompleted' 
  FROM 
    survey 
  GROUP BY 
    question
) 
SELECT 
  funnel.question, 
  funnel.qcompleted, 
  (
    funnel.qcompleted * 100.0 / (
      LAG(qcompleted, 1, qcompleted) OVER (
        ORDER BY 
          question
      )
    )
  ) AS 'qcompleted' 
FROM 
  funnel 
GROUP BY 
  question 
ORDER BY 
  question;

I made only a single change. Instead of 500, I used a LAG function call. The LAG function allows us to evaluate expression against the n-th row before the current row.

Let’s break down what is going on in the function call:

LAG(qcompleted, 1, qcompleted) OVER (
  ORDER BY 
    question
)

The first argument - qcompleted is the expression that we would like to execute. We want to get the value of qcompleted from the previous row, right? :slight_smile:

The second argument - 1 is the offset. We want to execute expression against the previous row, this is the 1 row before our current row.

The third argument - qcompleted is the default, if the row specified by the offset does not exist then this value will be used (if this value is not specified then the call will result in NULL); We are selecting the previous row, this row will not exist only for the first question, and in this case we would like to perform calculation funnel.qcompleted * 100.0 / 500, but 500 is the value of qcompleted, so I decided to use this instead of hard-coding 500 (in the future we might get a few new rows and this version of the query will always work but feel free to change the default value to 500 if you want).

And finally we have an OVER clause which provides context for our LAG function call - window functions must know what does it mean “first row”, “previous row” etc. So we simply define the ORDER, in this case, the ORDER of the LAG is the same as the ORDER of the whole SELECT, but this is not always the case.


Let’s run our query and see what is the result:

question							qcompleted	qcompleted
1. What are you looking for?		500			100.0
2. What's your fit?					475			95.0
3. Which shapes do you like?		380			80.0
4. Which colors do you like?		361			95.0
5. When was your last eye exam?		270			74.792243767313

That looks correct!


Unfortunately, you will not be able to run this query in this exercise. For some reason (probably something related to the security, not sure) codecademy does not allow window functions. That’s probably the reason why they left this note:

Using a spreadsheet program like Excel or Google Sheets, calculate the percentage of users who answer each question:

Nonetheless, I hope this was a bit informative :slight_smile:

1 Like

Thanks a lot, so codecademy sql is limited to the commands listed in the cheat sheet or how do i identify which commands work and which don’t? Also are there similar limitations for python or r?

You’re very welcome :slight_smile:

I don’t know. Right after I submitted my previous comment I reported this as a bug. Even if the given functions are prohibited - the corresponding error message should be returned. So I will get back to you when I will get an answer to my report :slight_smile:

I had never noticed this behaviour in the Python or R course, so I would say no, there are no limitations like this :slight_smile: