So I’m building a personal finance tracker, and am using csv files and pandas for my data manipulation. I haven’t worked with manipulating sql in python just yet otherwise I think I would use it. Any ways, I have some questions about ongoing data manipulation in general. When do I update the main csv files? just when i close the app, or is it better that I do it any time a change in the original data frames should reflect in the files?
thanks guys
I this is a great question!
TLDR Answer: It depends
It depends on the complexity of the processes that are happening, and how they impact the computers resources. If you have a dataframe of 20 rows and you are appending a new column with a calculated value to all rows by iterating through a view will effect the memory and CPU differently than if you use a copy. (pandas: Views and copies in DataFrame | note.nkmk.me) Or what if you needed to load all 10000 rows to calculate a total value? What if the calculation for is done on row at a time and is very convoluted using 4 different datarames to calculate values?
Now my personal opinion:
- If you have this personal finance tracker set where you enter a transaction, I would have it save after you press enter.
- If you have this personal finance tracker set where you can enter multiple transactions, I would have it save after you press enter on the last one (assuming that you are not manually entering >10 or so in one go).
- If you wanted to make data visualizations for the data, I would have them save after they generate.
- I would make sure to have each action the user takes gives a response to indicate the state of the data is saved.
- If you have the data save on close make sure to have a closing message to indicate the state of the data is saved.
Hope that helps!
Also, feel free to link the code for a review.