Problem with date column in python pandas


Here in the dataset on using the pivot function it converts column of dates monthwise like
1st of every month , then 2nd of every month … I want it to be in ascending order instead. Like it should show all dates of january first . Please tell me how to do it .

@systemrunner30812,

If you look closely, the original rows are ordered by index, not by date. It just so happens that the index is in the same order as the date. Once you pivot the table, Pandas orders the columns as best it can by the values in the columns= column that you provide. Pandas defaults to the American way of parsing dates, which is MM/DD/YY or %m/%d/%Y if you are using strftime syntax.

Since you clearly want your date columns to be sorted by DD-MM-YYYY, there are a couple of ways to go about this. The first and easiest one is by specifying the date format when you read the CSV with Pandas. If you look at the documentation, you will see that read_csv() has many, many parameters that you can supply it. Here, you would want to use the parse_dates parameter and the dayfirst parameter. You could also try infer_datetime_format and see if it is faster than dayfirst. Either one should work in this case. Once you’ve read the csv using those parameters, your pivot table should have the columns in the correct order.

Another way you could do this is by using pd.to_datetime() to convert the date string into a datetime object (see documentation here), and assuming you supplied it the correct strftime syntax Pandas would then order the columns just fine. However, I think it is best to use the first method and not have to worry about an additional data manipulation step.

3 Likes


See there , I tried using “pd.to_datetime” but after 11-12 rows it doesn’t function properly and messes up .

PS: there are no issues in date column of original csv file … all entries are in dd-mm-yyyy format

Did you read the documentation I linked to? You should have included the strftime syntax in pd.to_datetime() . Also, I never said that there was an issue in the original csv, I think you misunderstood what I was saying.

The problem is that Pandas automatically assumes those dates are mm-dd-yyyy when it converts the csv into a DataFrame. In order to make sure that it parses those dates correctly on reading the csv, you should use the parse_dates and dayfirst parameters that I mentioned earlier. Please read and follow the documentation, as that is the best way to learn and understand how to solve these problems in the future.

1 Like

I tried using dayfirst and parse_dates … It sorts the date in pivot table but in YYYY-MM-DD format … If I use strftime funciton to convert it to DD-MM-YYYY it again changes dtype of column to “object” which again shuffles the order of dates in pivot table.
I tried surfing the documentation but couldn’t find any fix to it … :frowning:

Datetime objects are in YYYY-MM-DD format by nature. The important thing is that your columns are in the correct order.

I just tested this out and all three options I explained above give you your columns in the correct order:

# Option 1: parse_dates + dayfirst
df = pd.read_csv('test.csv', parse_dates=[0], dayfirst=True)
df_pivot = df.pivot(columns='date', index='state', values='cases')

# Option 2: parse_dates + infer_datetime_format
df = pd.read_csv('test.csv', parse_dates=[0], infer_datetime_format=True)
df_pivot = df.pivot(columns='date', index='state', values='cases')

# Option 3: Read CSV normally + convert `date` with to_datetime afterward
df = pd.read_csv('test.csv')
df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
df_pivot = df.pivot(columns='date', index='state', values='cases')

These column names will look something like this if you print them out:
image

Or like this if you just call df_pivot inside the Jupyter notebook:

As you can see, the columns are clearly in the correct order by date.

Now, if for some reason you don’t like the names of the columns, those are easily changed now that we have them in the correct order. Pandas DataFrames have a .rename() method (please check out the documentation here) where we can pass in a dictionary to rename the columns. The dictionary should be in the format: {current_name1: new_name1}.

Although you could make this dictionary by hand, e.g., {'2020-01-30': '30-01-2020'}, this would take forever and be highly prone to error. So, the best method is using dictionary comprehension.

Give it a try and see if you can figure it out. If you’ve done your best and you’re still confused, check out the solution below.

Click here if you can't figure it out!
# Note: you probably don't need to import datetime to use strftime here because
# the dates are already datetime objects within Pandas.
df_pivot.rename(columns={col: col.strftime('%d-%m-%Y') for col in df_pivot.columns})

Happy coding!

3 Likes

Thank you for the help ! :star_struck: