What is the difference between using the AVG(value) aggregate and the following two WHERE queries:
SELECT state, ROUND(AVG(value),0) AS 'Value'
FROM home_value_data
WHERE date LIKE '2017%'
GROUP BY 1
ORDER BY value DESC
LIMIT 10;
AND
SELECT state, ROUND(AVG(value),0) AS 'Value'
FROM home_value_data
WHERE date BETWEEN ‘2017-01’ AND ‘2017-11’
GROUP BY 1
ORDER BY value DESC
LIMIT 10;
I’m using these in my queries, assuming they should both return the average values for the entire year, but I’m wrong and I’m not sure why. Can anyone explain?
WHERE date LIKE ‘2017%’
This takes all dates that start with 2017.
WHERE date BETWEEN ‘2017-01’ AND ‘2017-11’
This takes all the data that is BETWEEN these two dates. That means the 2017-1 and 2017-11 is not counted. Not sure if it skips the whole month or just the first/last day of those months.
The diffrence is that with the second querry you are missing a few months of data.
SELECT state, ROUND(AVG(value),0) AS 'Value'
FROM home_value_data
WHERE date BETWEEN ‘2017-01’ AND ‘2017-11’
GROUP BY 1
ORDER BY value DESC
LIMIT 10;
It should be including 2017-11, but I’m not sure if it’s including 2017-01. I changed the data type of the date field to NUMERIC, so it should follow the rules listed here:
Really interesting point to emphasize again:
BETWEEN two letters is not inclusive of the 2nd letter.
BETWEEN two numbers is inclusive of the 2nd number.
Shortly after posting this, I figured out how to get two queries to display the same results, but it’s been almost a week since then and now I can’t remember!
I have my date type set as TEXT and it gave me the same values for
WHERE date BETWEEN '2017-01' AND '2017-12'
and
WHERE date LIKE '2017%'
To my mind, that means SQL assumes “numbers” to be numeric values even when they’re input as TEXT (at least for the purposes of this BETWEEN rule). Can anyone confirm or deny this, please?
In any case, the rules mentioned stress that the first element (the one before BETWEEN) is always included in the calculation.
Interesting catch, Luis. I’d be interested in a solid answer as well.
Did anyone complete this and post the queries they used to github or something similar? I’d be interested in seeing how people went about doing this one.