I currently working on a project using mysql where I need to generate a large new table out of other large tables. In the end I would like to have a csv file as well as a working dataset in mysql.
I have a table (table1) which contains unique names in each row, a data table which stores every hour and date for the last 7 years and large table (table2) which contains the production and consumption of each name for each hour of the last 7 years. I would now like to generate a new table which contains for each name this data in the following manner:
date | time | name1_production | name1_consumtpion | name2_production | name2_consumtpion | …
To make sure the data is correctly progrest I would also like to check that the data and time is listed in my date table as well that the name in table1 and table2 are the same.
I am however not quite sure what the best approach for this might be. My dataset is quite big, were there are over 2’000 names and has therefore each over 120’000 data entries. Should I export the data to python using panda or should I progress this in sql? And how would be the best way to do this?
Many thanks already in advance for the help and I hope I was able to describe my problem in a understandable manner
Mysql would (if done decently) very likely be faster (shorter run time), but if you are more experienced with panda and have nearly no mysql knowledge, you might complete the task faster (less development time)
Can you provide the structure of the tables you currently have? Sounds like a simple join is sufficient to export your data to csv.
Ok sorry I am still a bit new to this. I created the database myself from a bunch of csv files. table2 is the original “raw” table from this datacollection and I passed then all unique names from this table2 to table1 with some other informations.
There are no foreign keys in either one table when I check the table inspector. I added the primary keys only after the data collection.
Yes name_id_1 and name_id_2 are the names for the primary key I created.
Sorry for my mistake, I have corrected it now. How could I achieve this with joins? Do I need to change something with my data structure first for this?
Thank you very much, the search is in progress. However, it has been running already for a little over 14 hours… Now I was wondering if I can see somewhere the progress in percent of the progress or the estimated end time? And can I already check if the table is made according to my wishes while the progress is running?
Ok thanks I understand completly. I have however another syntax/methodic related questions if this is alright…
The query has finally finished after something over 15 hours. However I must have made a mistake in the query as I now got a table in the following form:
DateTime | production
Where I used the following code:
drop table if exists name_production;
CREATE TABLE name_production AS(
select table1.`DateTime`, table1.production from table2
inner join table1 on table2.name= table1.name
inner join time_table on time_table.txt = table1.`DateTime`
order by table1.`DateTime`);
However I would like somthing like:
DateTime | name1 (column of production) | name2 (column of production) | name3 (column of production) | …
How to I need my code for to achieve this?
Is the problem that my names in table2 are in rows? Should I somehow “transpose” this table first and then use the join statment? I think I also need to change the column “name” from production to the actual name which produces the production. This means to the name I actually compare in my statement.
I can’t keep working from these half examples, you need to give me something I can really work with. Given I am just a volunteer on the forum, I am getting close to the limit of how much heavy lifting I want to do
you intent to create a table with hundreds of columns? Or do I misunderstand? Because that is a dreadful way to work with databases
Sorry I didn’t mean to be a pain, I’m still new to this and therefore don’t know what information is needed. My approach was to explain the problem as logically as possible from my data set. I am really greatful for your time and effort and did not wanted to cause any distress.
My table1 in the descripton is the table “plant_information”, table2 is “act_gen_raw” and the time table is “dt”. I run the following query:
drop table if exists plants_production_test;
CREATE TABLE plants_production_test AS(
select dt.d as Date, dt.t as Time, act_gen_raw.ActualGenerationOutput as PowerSystemResourceName from plant_information
inner join act_gen_raw on plant_information.PowerSystemResourceName = act_gen_raw.PowerSystemResourceName
inner join dt on dt.txt = act_gen_raw.`DateTime`
#order by act_gen_raw.`DateTime`
LIMIT 10);
I put my data under the following link, where I extracted only some sample, which is why the tables are appended with “_sample”.
I run also:
-show create table plant_information;
‘plant_information’, ‘CREATE TABLE plant_information (\n ResolutionCode text,\n AreaCode text,\n AreaTypeCode text,\n AreaName text NOT NULL,\n MapCode text,\n GenerationUnitEIC text,\n PowerSystemResourceName text NOT NULL,\n ProductionType text NOT NULL,\n InstalledGenCapacity double DEFAULT NULL,\n id_plant int NOT NULL AUTO_INCREMENT,\n PRIMARY KEY (id_plant),\n KEY country (MapCode(500)),\n KEY installed (InstalledGenCapacity),\n KEY production_type (ProductionType(500)),\n KEY time_intervall (ResolutionCode(500)),\n KEY cta (AreaTypeCode(500))\n) ENGINE=InnoDB AUTO_INCREMENT=2154 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci’
‘dt’, ‘CREATE TABLE dt (\n txt text,\n d date DEFAULT NULL,\n t time DEFAULT NULL\n) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci’
I would however understand if you prefer not to spend anymore time on this problem And yes I itend to create hunderts of columns as it is my order of my supervisor to do so.