Not sure if I understood Indexes correctly

Hey everyone, it’s me :nerd_face:

I’ve been taking the Design Databases with PostgreSQL skill path, and right now I’m in the Indexes section. I already finished the first lesson on indexes, but before moving on to what’s next I just wanna make sure I understood what an index is.

So, here goes my explanation:
An index is a data structure that helps you find data in your database faster. It basically takes all the values from the column you specify and uses them to create a binary tree structure. One of its cons is that it obviously takes up space, and it can increase the size of your database quite quickly. Also, whenever you perform a DELETE, UPDATE or INSERT statement that affects the values of the column(s) you defined an index for, the index will have to be completely reorganized, so it can slow down those processes.

Did I miss something? Did I get something wrong?
I’d truly appreciate some feedback :slightly_smiling_face:

3 Likes

I think you’ve got it. But, I double checked the documentation anyway. :slight_smile:
(I’ve never used them [indexes] myself.) I think in order to execute certain queries they’d be useful, but then I’d drop them. ex: “To remove an index, use the DROP INDEX command. Indexes can be added to and removed from tables at any time.”

3 Likes

Thanks for the feedback :slightly_smiling_face:

Oh right, I forgot that :upside_down_face:

1 Like

But, building an application, certain queries will happen a lot, doesn’t it make sense to keep the indexes around? Especially if your databases become large (possible millions of records)

re-building the indexes every time for a table with millions of records sounds expensive.

Yes, that’s true.
But, I’ve never been in such a position. :frowning:

From what I understood, the best time to use indexes is when you have a large dataset (like you said) that’s not going to be updated that often (because UPDATE, INSERT and DELETE statements cause the index to be completely reorganized to fit the new data). That way it will make the process of finding records way faster and not cause much trouble.

(am I right? I get really nervous trying to explain stuff so I’m trying to practice lol)

1 Like

this is very good, trying to do this will increase your skill and will lead to meaningful discussions

Take this forum for example, this forum has existed for a couple of years now. Where I would start placing indexes for example are:

topic names. we can search through topic names, searching through un-indexed records can take a long time

if you look at this page: https://discuss.codecademy.com/latest

you can see that you can sort topics, that is also something which will need index after a while.

the amount of read/get operations is very likely much higher then the amount of new topics/replies being created. People look at old topics/replies all the time

also, once an index exists, and a new record is added, I don’t think the entire index needs to be rebuild. I would argue that this process is a lot more sophisticated

2 Likes

Oh I see, I just thought so because of this:

Updates and deletes have similar drawbacks. When deleting a record that is associated with an index, it might be faster to find the record — by leveraging the index’s ability to search. However, once the record is found, removing or editing it will result in the same issue as inserting a new record. The index itself will need to be redone.

But then they don’t get completely redone, right? They’re just like…updated? But that process still takes some time?

Yes, this is correct :slight_smile:. It’s typically best to have your indexes created on unique values such as ids. For a reorg to happen, the queries would have to be running very slow. If you were ever converting/importing volumes of data into a database, you may want to consider removing the indexes and then adding them back after the data has been loaded.

2 Likes

So, in the documentation, where it states this:
“But indexes also add overhead to the database system as a whole, so they should be used sensibly.”
and this: “After an index is created, the system has to keep it synchronized with the table. This adds overhead to data manipulation operations. Therefore indexes that are seldom or never used in queries should be removed.”

What is the overhead? Does it slow down processing?

2 Likes

This makes it an intricate balance. If you have just a couple hundred of records in a table, indexes might indeed create more overhead then performance gained.

When you start a new project, you will very likely not at many indexes (if it all). Once your project and user base starts to grow, and certain queries become slow, you might want to look at the options available: optimizing queries or using/adding indexes.

overhead can also simple mean more processing power is needed, so the process might take no extra time (or very little) to complete the process, but put more strain at the CPU or storage.

we could make matters even more complicated. Some optimizations might not even be worth the time and effort, given the cost of developing/improving might be higher then the cost saved on servers.

there are so many nuances, I think we could talk about it all day.

3 Likes

Here’s an another article about it in the Analyze Data with SQL Skill Path:

https://www.codecademy.com/paths/analyze-data-with-sql/tracks/analyze-data-sql-get-started-with-sql/modules/analyze-data-sql-learn-manipulation/articles/sql-indexes

1 Like

Thanks, I’ll make sure to check that one out.