Data Science Independent project: predict outcome of production


I finished machine learning courses here on codecademy and now would like to pursue my own projects. I am still very new to this so I thought at first I would ask for advice before starting coding and making wrong conclusions due to mistakes in code.

Problem I want to solve is regarding production. A line is producing different products during the shift. Up to even 30 different products from the same raw material. Products are only made when order is present. Once order is fulfilled the product is stopped and raw material is used for other products. Principle in production is to use the material so that minimum amount of raw material is going to waste. Raw material is natural and has different defects, some products allow more defects, some less. These principles dictate which products are produced in higher quantities and which are rarer. For example if there is no order for top quality products, medium quality products will be produced in higher quantities becuase the material will fall to these qualities because there is no top quality order and so on. So the amount of different products and products themselves vary over time. I have 3 months worth of production data. Data consists of date, products names, which were produced on that day and how much of each product was produced. What I would like is to put this data in a model, then say that now I have orders on these products to this amount, please tell me how many days does it take to finish each product.

I am thinking that i would have to train a model on the existing data and say that on this day these products were produced in this amount.
Then I should have array or something where I keep the orders which I have: name of product and ordered amount. Then I should probably give active orders to model and ask prediction for day, then subtract the predicted amount from the orders array and if remaining amount for a product becomes less than one then it should be excluded from the next predictions.

Questions I have is that which model with which parameters to use and how to train it. Or am I completely lost and this question should be solved in some other way?

Welcome to the forums!

I don’t think you’re completely lost at all. I think you have some good ideas and you need to try them both out and see what works, right? IMO failure is an opportunity to improve.
I think I would first try to train the model based on existing data. Perhaps others who are more well-versed in ML could chime in here with their thoughts!

OK moved on a bit but now I am really stuck. I made a pandas dataframe where columns are date and then all the unique products names. Each row consists of date and how much was produced during the day each product, if product was not produced then there is nan.

Should I make a model with the same amount of inputs as I have different products. Should I input 0 and 1 for each product for each day, 0 meaning that the product was not produced and 1 meaning it was in production. Produced amounts would be then the label for training. For prediction I should input an array of all the products and input 0 and 1 according to what I want to produce?

1 Like

Okay so I’d suggest to not use NaN and actually replace NaN with 0, since NaN means “missing data” and we usually remove it when we’re cleaning the dataset.

And about your question, yes if you’re trying to categorise your data into “produced” and “not produced” , you can categorise them into 0 and 1!

1 Like