What are some common things to do when cleaning data?



In the context of this exercise, what are some other things to keep in mind when cleaning data?


The cleaning data step of the data process can be a very important step because it makes the data organized and usable for the purpose of evaluation and analysis.

When cleaning data, we usually want to make sure to normalize and categorize the data. This reduces redundancy and removes any duplicate or repeated values in the data. This can be done by splitting a table or dataframe into separate tables or dataframes. Pandas includes some useful methods to do this such as dropna() which can remove rows or columns that have NaN or None values.

Furthermore, we can also remove missing or invalid data, if it reasonable to do so, that might otherwise mess up our evaluation of it.