I’ve been taking the Design Databases with PostgreSQL skill path, and right now I’m in the Indexes section. I already finished the first lesson on indexes, but before moving on to what’s next I just wanna make sure I understood what an index is.
So, here goes my explanation:
An index is a data structure that helps you find data in your database faster. It basically takes all the values from the column you specify and uses them to create a binary tree structure. One of its cons is that it obviously takes up space, and it can increase the size of your database quite quickly. Also, whenever you perform a DELETE, UPDATE or INSERT statement that affects the values of the column(s) you defined an index for, the index will have to be completely reorganized, so it can slow down those processes.
Did I miss something? Did I get something wrong?
I’d truly appreciate some feedback
I think you’ve got it. But, I double checked the documentation anyway.
(I’ve never used them [indexes] myself.) I think in order to execute certain queries they’d be useful, but then I’d drop them. ex: “To remove an index, use the DROP INDEX command. Indexes can be added to and removed from tables at any time.”
From what I understood, the best time to use indexes is when you have a large dataset (like you said) that’s not going to be updated that often (because UPDATE, INSERT and DELETE statements cause the index to be completely reorganized to fit the new data). That way it will make the process of finding records way faster and not cause much trouble.
(am I right? I get really nervous trying to explain stuff so I’m trying to practice lol)
Updates and deletes have similar drawbacks. When deleting a record that is associated with an index, it might be faster to find the record — by leveraging the index’s ability to search. However, once the record is found, removing or editing it will result in the same issue as inserting a new record. The index itself will need to be redone.
But then they don’t get completely redone, right? They’re just like…updated? But that process still takes some time?
Yes, this is correct . It’s typically best to have your indexes created on unique values such as ids. For a reorg to happen, the queries would have to be running very slow. If you were ever converting/importing volumes of data into a database, you may want to consider removing the indexes and then adding them back after the data has been loaded.
So, in the documentation, where it states this:
“But indexes also add overhead to the database system as a whole, so they should be used sensibly.”
and this: “After an index is created, the system has to keep it synchronized with the table. This adds overhead to data manipulation operations. Therefore indexes that are seldom or never used in queries should be removed.”
What is the overhead? Does it slow down processing?
This makes it an intricate balance. If you have just a couple hundred of records in a table, indexes might indeed create more overhead then performance gained.
When you start a new project, you will very likely not at many indexes (if it all). Once your project and user base starts to grow, and certain queries become slow, you might want to look at the options available: optimizing queries or using/adding indexes.
overhead can also simple mean more processing power is needed, so the process might take no extra time (or very little) to complete the process, but put more strain at the CPU or storage.
we could make matters even more complicated. Some optimizations might not even be worth the time and effort, given the cost of developing/improving might be higher then the cost saved on servers.
there are so many nuances, I think we could talk about it all day.