Data Science Independent Project #5 – Analyze Airfare Data

I actually found a way to combine the two separate tables together as one:

SELECT city1, city2, MIN(nsmiles) as distance FROM airfare
UNION
SELECT city1, city2, MAX(nsmiles) as distance FROM airfare

I had not learn about UNION when I started this project.

1 Like

Hi,
Did anyone solved the last question with the self join?

Considering only the flights that have data available on all 4 quarters of the year, which quarter has the highest overall average fare? lowest? Try breaking it down by year as well.

Your help is much appreciated!

Hi,

I think this will work. I’ve created two temporary tables to calculate the avg far difference for 2017 and 2007 using with and then calculated the percentage using (new-old)/old)*100

With newvalues as (select avg(fare_lg-fare_low) as new from airfare_data where year=2017),
oldvalues as (select avg(fare_lg-fare_low) as old from airfare_data where year=2007)
select ((new-old)/old)*100
from newvalues,oldvalues;

Hi,

I don’t think you’ve answered the question properly - it’s looking for %change in AVG(fare). You’ve calculated the AVG(difference between expensive and lowest fare)

I think I’ve worked it out:

WITH prev AS
(
SELECT Year AS ‘xyear’,
AVG(fare) AS ‘avgf’
FROM airfare_data
GROUP BY Year
),
curr AS
(
SELECT Year AS ‘xyear’,
AVG(fare) AS ‘avgf’
FROM airfare_data
GROUP BY Year
)
SELECT ROUND((curr.avgf-prev.avgf)*100/prev.avgf,2) AS ‘%Change in AVG(fare) from 1997 to 2017’
FROM prev
LEFT JOIN curr ON curr.xyear = prev.xyear + 20
WHERE prev.xyear = 1997

Hi, thanks for pointing that out you are right actually, i think that if i replace fare_lg-fare_low with fare my code would still work. Do you mind explaining the use of the left join in your code?

Hi,
Somebody did the Advance Challenge? I couldn’t filter the flights by the quarters, I’ll appreciate it, thanks.

Hi - I think I have managed to create a table that shows just the routes that occur in all 4 quarters, and then can use WITH to calculate the average fare. Not sure how to take it further and go year by year yet though:

with routes (city1, city2) AS (
select city1, city2
from airfare_data
GROUP BY 1,2
HAVING count(DISTINCT quarter)=4
union
select city1, city2
from airfare_data
GROUP BY 1,2
HAVING count(DISTINCT quarter)=4)
select quarter, round(avg(fare),2)AvgF
from airfare_data, routes
WHERE routes.city1=airfare_data.city1 AND routes.city2=airfare_data.city2
GROUP BY 1
;

2 Likes

Hey thank you! You fixed this issues of uploading the .tsv file instead of .csv file for me. Changing the Field Separator to TAB from , fixed it!

1 Like

I am just beginning so not sure if this simplified method can work as well…I did:

SELECT Year, ROUND(AVG(fare), 2) AS ‘AVG 2007 FARE’
FROM airfare_data
WHERE Year = 2007;

And then separately:

SELECT Year, ROUND(AVG(fare), 2) AS ‘AVG 2017 FARE’
FROM airfare_data
WHERE Year = 2017;

And then manually calculated the result, which came to:

((218.34 - 183.12) / 183.12) x 100 = 19.23% change.

Am I missing something? Any help much appreciated! :slight_smile: