Data Science Independent Project #1 – Watching the Stock Market

Project: Watching the Stock Market

This project will take you off-platform and get you started in your own developer environment! Never done that before? Not to worry - we’ve shared some resources to help you down below. This project can be completed entirely on your own - or, you can join the Codecademy Lifelong Learner Pro Community and find someone to work with! Jump to the community support section to hear more about this.

This project is broken down into key questions that your client or company is looking to answer. As a data scientist, you’ll often become a resource to help businesses answer the key questions about the efficacy of existing or potential strategies & projects.

Overview

Objective

You are asked by a company to help them make more informed decisions on investments. To start, you will be watching the stock market, collecting data, and identifying trends!

Pre-requisites

In order to complete this project, we suggest that you have familiarity with the content in the following courses or lessons on the Codecademy platform:

  1. What is a Relational Database Management System?
  2. Manipulation
  3. Queries

Suggested Technologies

Depending on where you are on your Path, there may be multiple technology options you can use to complete this project - we suggest the following:

  1. DB Browser for SQLite

Project Tasks

Get started - hosting your project

DB Browser for SQLite is a visual tool for working with SQLite databases. Follow the link to download the application for your computer.

  • SQLite can store an entire database in a single file, which usually has a .sqlite or .db extension. To open a database, select Open Database at the top of the window and browse for the file. Alternatively, you can choose to create a New Database by saving a file with the .sqlite or .db extension.
  • To import data from a CSV file into a table, select “File > Import > Table from CSV file…” and browse for the CSV file. (Note: All fields imported from the CSV file will have a data type of TEXT. Be sure to convert fields to numeric type as needed. See here for how to do that.)

There are several tabs near the top of the window for working with the data:

  • Database Structure: View the tables in your database and the columns they contain.
  • Browse Data: Browse the data for each table.
  • Execute SQL: Write and execute SQL queries.

Basic Requirements

Let’s break this project down into a couple different parts.

Manipulation: Collect data on your pick of 5 stocks.

  • Create a table called stocks, where you will be inserting your data.
    • Hint: See here for a review of the CREATE TABLE syntax. What data type should each field be?
  • The stocks table should have a column for symbol, name, datetime, and price.
  • Collect your data! Choose 3 times throughout the day to document the price of each stock and continue for at least 1 week. You can do this moving forward, or just take a retroactive look at the stock market by taking data historically from regular intervals (e.g. the first of the month for the last six months).
    • Hint: See here for a review of the INSERT INTO syntax. When inserting the datetime, use the standard format ‘yyyy-mm-dd hh:mm:ss’. Use the strftime() function to help you get the datetime of ‘now’.

Queries: Perform basic analysis on the data and identify trends.

  • What are the distinct stocks in the table?
  • Query all data for a single stock. Do you notice any overall trends?
  • Which rows have a price above 100? between 40 to 50, etc?
  • Sort the table by price. What are the minimum and maximum prices?

Additional Challenges

Intermediate Challenge

  • Explore using aggregate functions to look at key statistics about the data (e.g., min, max, average).
  • Group the data by stock and repeat. How do the stocks compare to each other?
  • Group the data by day or hour of day. Does day of week or time of day impact prices?
  • Which of the rows have a price greater than the average of all prices in the dataset?

Advanced Challenge

  • In addition to the built-in aggregate functions, explore ways to calculate other key statistics about the data, such as the median or variance.
    • Hint: See here and here for possible solutions.
  • Let’s refactor the data into 2 tables - stock_info to store general info about the stock itself (ie. symbol, name) and stock_prices to store the collected data on price (ie. symbol, datetime, price).
    • Hint: You can use the SQL CREATE TABLE AS statement to create a table by copying the columns of an existing table. Don’t forget to also drop certain columns from the original table and rename it.
  • Now, we do not need to repeat both symbol and name for each row of price data. Instead, join the 2 tables in order to view more information on the stock with each row of price.
  • Add more variables to the stock_info table and update the data (e.g., sector, industry, etc).

Resources & Support

Project-specific resources

  1. SQLite Data Types
  2. SQLite Date and Time Functions
  3. SQLite strftime() Function
  4. SQLite Documentation
  5. SQLite Tutorial

General Resources

  1. How to get set-up for coding on your computer
  2. What is a Relational Database Management System?
  3. What you need to know about Git, GitHub & Coding in Teams
  4. How developer teams work
  5. First steps in tackling a group project
  6. Resource on writing pseudocode to get started with off-platform projects

Community Support

Looking for additional help or someone to work with (or somewhere to brag about your finished project)? Join our Codecademy Lifelong Learner Pro Community to meet other learners like yourself!

  • Collaborate with other learners on data collection! Then, join the datasets together for more interesting analysis.
    • Each learner can collect data on different stocks for a larger sample of stocks.
    • Each learner can collect data on same 5 stocks, but at different points throughout the day in order to spot potential daily trends.

Once you’re done…

Share in the CLL Community or the Forums for feedback and to see some other ways of solving this problem!

24 Likes

Hi there,
I don’t know how to use SQLite. I typed code (manipulation) in Execute SQL but I don’t know how to insert 5 stocks in SQLite.
What can I do?

Hi method,

I havent used Execute SQL yet but I am using DB Browser for SQLite. You need to first create a table using something like:

CREATE TABLE ‘stocks’ (
‘symbol’ TEXT
);

that will get your table up and running (albeit i only wrote in code for the first column. You should add in the other 3.

After that, you need to use INSERT INTO function to manually enter in stock info row by row.

I’ve been using this cheat sheet (https://www.codecademy.com/learn/learn-sql/modules/learn-sql-manipulation/reference) for reference. Good luck!

Hi thank you for this project. I’ve decided what stocks to pick but I cannot select the create table button in DB Browser. Is there anything else I need to do first?

Screen Shot 2020-04-07 at 12.15.16 PM

Hello, Where can I find the data of stock market to input in the database?

you have to create a data base (by clicking New Database), inside it you can create as many tables as you want

Hey there, I’m new to this. How do we search for stock data to put into this project?

All the information you need to answer your question is in the Basic Requirements section of this post.

The instructions in the Basic Requirements section explains where to find stock data and which data to put into your database.

Hello,

I downloaded DB Browser for SQLite and have been able to practice SQL Manipulation and Queries. I am having trouble importing .csv file to the database on DB Broswer for SQLite so that I keep on practicing with SQL.

I clicked on File-> Import -> Table from csv file -> Select the file(s) -> Then changed the Table Name and clicked OK. But I keep getting:

" Error importing data from record number 1000.

Inserting row failed: not an error"

I clicked OK. Then another window pop up saying, “Import Complete”.

But I don’t see any table with the file(s) that I uploaded. I’ve tried with multiple files versus single file and shorten the rows to just 10 rows in a table. And I’ve tried pre-creating the table to upload the file. Nothing works.

Any suggestion would be great. Thank you!

I am unable to figure the solution to the intermediate challenge “Which of the rows have a price greater than the average of all prices in the dataset?”

Can someone help with the solution?

I am not able to open my csv file in the DBbrowser.
I created a csv file in excel and tried to import it in the browser.
It only shows the database from SQL file option and not the csv option.

Try saving the file as .TSV instead of .CSV

Then when you import the filter the files as All Files instead of only CSV files.

Then once you see table pop up on DB Browser prior to importing it, change where it says Comma, to Tab.

Thanks!
I was trying to import the csv file without creating a new database first. I was able to import the csv file after creating a new database.
thank you so much for your help! :slight_smile:

For the question below
“Which of the rows have a price greater than the average of all prices in the dataset?”
I tried the following commands but it didn’t work.
Can anyone tell me where am I going wrong?

1 Like

Hi, seems like we can’t use aggregate function in WHERE statement.

The documentation confirms this behavior (bottom of page):

Aggregate functions may only be used in a SELECT statement.

I tried this and it works (not sure this is the best way, though, i just learned this as well)

Hope this helps!

3 Likes

Looks like you are right! Thank you so much. :slight_smile:

For the variance I tried
SELECT AVG(meansqdiff) FROM
(SELECT ((price-(SELECT AVG(price) FROM stocks))* --get diff between price and mean
(price-(SELECT AVG(price) FROM stocks))) --and square it
AS ‘meansqdiff’ FROM stocks )

IS this less efficient than the hint which took a while to understand and use.

Hi @aytofaiz.h, thanks for your solution. I implemented it and I did get a message that the query finished successfully, but how do I see the average_price table? I tried clicking on Browse Data but couldn’t see anything there.

Sorry if this is a silly question as I’m very new to SQL :grin:

Hi @tag2496708003, I tried your code but I got this syntax error:

Result: near "(": syntax error
At line 1:
SELECT AVG(meansqdiff) FROM
(SELECT ((price-(SELECT AVG(price) FROM stocks))
(

This was my solution to find variance, I basically took it from the hint linked to Stack Overflow:

WITH variance_table AS (
 SELECT price
 FROM stocks)
 

 SELECT AVG((variance_table.price - sub.a) * (variance_table.price - sub.a)) as var from variance_table, 
 (SELECT AVG(price) AS a from variance_table) AS sub; 

I was wondering though, if I switch the last and second to last lines, so the code looks like this:

 WITH variance_table AS (
 SELECT price
 FROM stocks)
 
 SELECT AVG(price) AS a from variance_table) AS sub,
 SELECT AVG((variance_table.price - sub.a) * (variance_table.price - sub.a)) as var from variance_table;
 

I get a syntax error saying:

Execution finished with errors.

Result: near “)”: syntax error

At line 1:

WITH variance_table AS (

SELECT price

FROM stocks)

SELECT AVG(price) AS a from variance_table)

Does anyone know why that might be?