How to create hierarchical data sets

Hi! First-time poster. I have no programming experience and I’m teaching myself SQL on Codecademy, planning on going through the Data Science path. I have a question about how to create hierarchical data sets.

EXAMPLE: Say you were making a database showing which people have held positions in which government agencies. If someone was, say, the head of the F.B.I., you’d want to indicate that the F.B.I. is part of the Justice Department which is in turn part of the Executive Branch.

I get how you’d make, say, a People table and an Agency table. I think I get how you could do something like create, I don’t know, a Sub-Agency table to show the agencies within a larger agency. But I’m confused about how you’d do this on a larger scale, so that you’d have a vast array of agencies nested within other agencies. (And I chose government agencies because there isn’t much consistency about what’s at what level; it’s not like every Department is made up of Agencies which are made up of Bureaus which are made up of Offices, so there aren’t really common threads.)

So… how would that work? Thanks! (And I hope I’m asking this the right way and in the right place!)

1 Like

Hi @objectjumper92185, Welcome to the forums!

Good question! The first part of the DS path is SQL, so, you will be learning about DBs, and tables, PRIMARY KEY & FOREIGN KEY and how to JOIN them. :slight_smile:

To do what you are suggesting, you’d have to have some sort of ID (or other kind of) column in each table where it’s the PRIMARY KEY in that table and a FOREIGN KEY in another table.

I would assume that each table would have an identifying column that tells you what agency & dept and branch (or who they report to/their boss) it’s a part of. You could JOIN them or UNION the multiple tables into one combined table.

Here’s a SQL Server example that’s pretty advanced/detailed:

1 Like

Thanks for the reply @lisalisaj !

I do understand how to do joins using a primary key and foreign key (and have completed Codecademy’s SQL course). What I’m having trouble with is figuring out what sort of join or schema would enable hierarchical relationships.

Using my example: say I want to indicate which agencies are within the Justice Department. Maybe the Justice Department is part of a table called Cabinet and has CabinetID 4, and maybe there’s a separate table called CabinetSubdivisions, and maybe that has a column called ParentOrganization which uses a foreign key of ParentOrganizationCabinetID and this one is 4. What I don’t quite get is how you’d do that across the entire bureaucracy, particularly with the lack of consistency across bureaucratic levels.

I could imagine maybe each organization being in one big table with an OrganizationID, and having a ParentID column that refers back to a different OrganizationID from that same table… but that also sounds complicated and messy, and seems counterintuitive to me.

So… I’m still confused about how one would do this :slight_smile:

I think you answered your own question prior to this sentence. :slight_smile:

You can create tables and populate them and double check if the data is accurate. How many columns of info each table has depends on how the data is organized (if you’re the one responsible for that too).
For example,

CREATE TABLE table_name (
    column1 datatype,
    column2 datatype,
    column3 datatype,

I’m not sure I follow. (I do follow how to create and populate tables; it’s the hierarchy issue I’m confused about.)

So you’d do an Organizations table with Id as the primary key and ParentID as a self-contained foreign key, so that the Justice Department has ID 4, the F.B.I. has ID 12 and ParentID 4, and a division within the F.B.I. has ID 36 and ParentID 12, and then you’d join the table to itself? Like, join organizations on organizations.ID = organizations.ParentID ? (That works?)

Now my brain hurts. :joy:

I think that’s how it’d be done. Though, maybe it’d be cleaner if each dept had their own employee table…?

If it’s easier to visualize–create a table (or several) in Excel and sort & pivot table/group them that way(?)

I thought about that-- so like, there’s a JusticeSubdivisions table, an FBISubdivisions table, etc. My concern is that that’d be a LOT of tables, which might make queries needlessly complicated. But… I’m not sure I can think of a better way.

Surely this is an issue that comes up a lot, right? Hierarchical structures exist all over the place. I’m surprised there isn’t a more obvious way to do this!

Sounds like bureaucracy to me. :slight_smile:

I posted a link above in this thread about hierarchical structures. I just googled it. StackOverflow might have some good responses, better than I could provide. :woman_technologist:t2: :woman_facepalming:t2:

1 Like