How to import a DataFrame into an SQL Table?

I have a dataframe, and I want to put it into an sql table. I am using DB Browser for this.

Is there another way to do this that solves my issues below?

I tried writing a function for this:

def df_to_db(df,db,table):
    path=f'C:/Users/Home/Desktop/python/SQL/sql_databases/{db}' 
    conn = sqlite3.connect(path)
    cursor = conn.cursor()
    
    rows=df.values.tolist()

    for row in rows:
        var_string = ', '.join('?' * len(row))
        query_string = f'INSERT INTO {table} VALUES ({var_string});'
        cursor.execute(query_string, row)
        
    results=pd.read_sql_query(f'select * from {table};',conn)
    results.to_sql(f"{table}", conn, if_exists="replace")

    cursor.close()
    conn.close()
    

Issues:

  • When I have a dataframe with built in columns like (id, column1, column2), and I try to use the sqlite3 module to insert this into my table, DB Browser creates another column, called index, so I have to keep deleting that or I get an error that saying I am giving n values when there are n columns.

  • Whenever I try to add something new to my DataFrame, it doesn’t automatically update in my SQL table. I want to add this functionality into my function (or write another function with it) but have no idea how to.

I also have one more question, if you go to the section of the Data Science path where you learn Pandas, it gives you the results in tabular form, to the side. Is there a way to replicate this in VSCode?

Why are you manually inserting your DataFrame rows into the sql table, then pulling them back out into a DataFrame and then putting them back in again?

I’m pretty sure this will give you the same result:

df.to_sql(f'{table}', conn, if_exists='replace')

Did you try adding in the kwarg index=False into df.to_sql()? This will prevent Pandas from creating an extra column for the index, which I think is the issue you’re facing. Documentation here.

Yeah, your DataFrame and your SQLite table aren’t connected in a way that would make it automatic. You will have to alter your table manually each time you change your DataFrame. Just run your function after each change you make to the DataFrame and you’ll be good to go. Creating a sort of listener to automate it is too complicated to be worth it unless you are making a fully-fleshed program.

The easiest way to have your Pandas DataFrames output like that is to use Jupyter notebooks. You can either view them on your localhost, or open them in VS Code. The way to format them in that way is to just type the name of the DataFrame (e.g., df) at the end of the cell instead of using the print() function.

Happy coding!

1 Like

Thanks for responding. I’ve tried to use that method actually, but it keeps giving me an error for the connection string for my database. I use something like this:

path=f'C:/Users/Home/Desktop/python/SQL/sql_databases/{db}' 
conn = sqlite3.connect(path)

And then it keeps telling me “cannot parse rftc (or something like that) from connection string.”

I will try this out thank you!

Sorry for all the many questions, I appreciate your help!

1 Like

That’s strange. Are you passing in db as a string with the file extension? If so, it should work just fine.
Here is a quick script I wrote up that works perfectly:

import sqlite3
import pandas as pd

# Never hard-code paths into your functions - 
# You can avoid this completely if you keep your project's 
# Python scripts and databases in the same directory. 
# Otherwise, set the path in SCREAM_CASE at the top of your file.
# If you plan on sharing your script, setup an environment
# variable and import that value to use for your path constant.

SQL_DIR = 'Your/Path/To/Directory/Here' # e.g., 'C:/Users/Home/Desktop/python/SQL/sql_databases'

def df_to_db(df, db, table_name):
    path = f'{SQL_DIR}/{db}'
    conn = sqlite3.connect(path)
    df.to_sql(table_name, conn, index=False, if_exists='replace')
    conn.close()

# Create test DataFrame
d = {'col1': [1, 2], 'col2': [3, 4], 'col3': [5, 6]}
test_df = pd.DataFrame(data=d)

# Create the SQLite table from the DataFrame
df_to_db(test_df, 'test_db.db', 'FirstTest') # replace 'test_db.db' with the name of your database
2 Likes

By the way, did you figure out your issues with Jupyter Notebooks? I’ve never had the password issue when I’ve run jupyter notebook from the command line.

Well, I can open them in VSCode and just type there, without having to go to the command line. But whenever I try to run them from the terminal, I get a tab opened that asks me to enter in a password.

image
Of course, I have no idea what this password might be.

By the way, why is it bad to “hard-code” paths into functions?

Have you ever had it open without a password from the terminal?

If not, I think you have to paste something in to the browser from the terminal the first time you connect.
It should look something like this:
image
(obviously your port number and token will be different than this person’s)

Once you’ve done this once, I believe Jupyter should remember you and automatically authenticate w/o needing a password.

How would I get my token? No matter how many times I open it it keeps asking me for a password.

It will show up in your terminal after using the jupyter notebook command.

If it’s not the last line, it will be a few lines up.

I can see my port number, but I can’t see my token. :confused:

Hm…without seeing your computer it’s hard to troubleshoot.

What does the message say?

It should say something like this:

To access the notebook, open this file in a browser:
file:///C:/Users/USERNAME/AppData/Roaming/jupyter/runtime/nbserver-10860-open.html
Or copy and paste one of these URLs:
http://localhost:PORT/?token=TOKEN
or http://127.0.0.1:PORT/?token=TOKEN

So, IDK what I just did, but I can now see this page:

Well that’s a good sign. That is what you should be seeing in the first place. Did it provide a token this time?

If you mean that a token showed up in my terminal, still no. But now I don’t have to enter a password anymore.

hmm…well at least it worked! :partying_face:

Hopefully it works every time from now on.

I am interested why in the first screenshot you posted, it said something about a winpty error. How are you accessing Jupyter? Through conda, or did you set up a custom alias?

1 Like

I’m honestly not sure. Maybe it has to do with the fact that I’m using Git Bash? And I did pip install jupyter notebook.

Also, thanks for helping me out with the writing to database thing! My program is finally working and I can move on to other stages now!

2 Likes

Hm, yeah, this might have something to do with it. Do you have Anaconda or Miniconda installed? If so (and if you’re interested), you can get conda running on Git Bash — it just takes some magic :mage:

No problem!

Primarily because it makes them less reusable. You don’t want to have to rewrite your functions each time you use a file located somewhere else. Honestly, I probably should have made the function more flexible by passing in the directory as a kwarg:

def df_to_db(df, db, table_name, dir=SQL_DIR):
    path = f'{dir}/{db}'
    conn = sqlite3.connect(path)
    df.to_sql(table_name, conn, index=False, if_exists='replace')
    conn.close()

That way, it defaults to SQL_DIR, but if you wanted to access a database saved outside your normal sql directory, you just pass it in to the function.

2 Likes

Nope.

I’m totally interested. I’ve been hearing people say that Anaconda is really good for Data Science. Besides, there isn’t a limit to what you can learn.

Cool. Rather than write the instructions here, I’ll try to make a post about it today in #community:tips-and-resources so that it’s easier to find for other community members.

3 Likes