In the context of this exercise, are there any guidelines and reasons for splitting the table as shown in the example?
Yes, although there might not necessarily be a one-fits-all solution to every possible database structure, there are some guidelines that can help you when you need to split a table into multiple tables.
A concept that most programmers use to refer to restructuring a database in this manner is “Database Normalization”. A very brief explanation of this concept is that it aims to accomplish that databases tables are structured so that they avoid data redundancy and keep the data accurate and consistent. “Data redundancy” is when we have a lot of repeated, or redundant, data.
In the example given in this exercise, it explains how we might split a table into multiple tables, which is essentially doing normalization. Initially, the table has many columns, and as a result, some of the values will most likely be repeated many times, which introduces data redundancy.
Say for example, we added 1 million rows to this table. Some values like
customer_address might end up being stored many thousands of times, when we really only need to store it once per customer.
To avoid this issue, we would split the table so that the information is stored more concisely. If we need the customer information, we can obtain it from the
customers table, by their
customer_id. And, if we need the information for a subscription, all of its information is stored in the
subscriptions table. We only need the
subscription_id in the
orders table, and we can obtain their information from their respective tables.