Netflix Data Capstone

Im working on the Netflix Data capstone project in Data Visualization for Python: https://www.codecademy.com/paths/visualize-data-with-python/tracks/capstone-projects-dvp/modules/capstone-projects-dvp/informationals/capstone-project-netflix-data

in one of the final steps in the Jupiter Notebook it has me making two subplots. I followed the directions and got it right however I didn’t like the formatting for the sticks. So I changed the xticklabel formatting and its working however its giving a funky warning I haven’t seen before. Is anyone able to explain what the error means?


the error and the charts

[/codebyte]
f = plt.figure(figsize=(20,3))

Left plot Netflix

ax1 = plt.subplot(1, 2, 1)
plt.plot(netflix_stocks[‘Date’], netflix_stocks[‘Price’])
ax1.set_title(“Netflix”)
ax1.set_xlabel(‘Date’)
ax1.set_ylabel(‘Price’)
ax1.set_xticklabels(netflix_stocks[‘Date’], rotation = 45)

Right plot Dow Jones

ax2 = plt.subplot(1, 2, 2)
plt.plot(dowjones_stocks[‘Date’], dowjones_stocks[‘Price’],)
ax2.set_title(“Dow Jones”)

ax2.set_xlabel(‘Date’)
ax2.set_ylabel(‘price’)
ax2.set_xticklabels(dowjones_stocks[‘Date’], rotation = 45)

plt.subplots_adjust(wspace=.5)
[/codebyte]

You can hunt down that particular warning with a web search and you’ll probably find the information you need. It is only a warning, it’s just used because ax.set_xticklabels adds strings to the labels (ticks that used to be organised by value are now associated with a fixed string). For example your strings might wind up in the wrong tick position compared to what you expect or you might get problematic overlaps. The point being the behaviour can be unpredictable and you should explicitly state tick positions if using static labels (it’s not interpreting them as dates, just strings).

The docs also mention this warning and you may find it worthwhile to have a little look into how the ticks are a) positioned and b) formatted (there’s also nice tools in there like EngFormatter)- https://matplotlib.org/stable/api/ticker_api.html#module-matplotlib.ticker
I think a little look into the background might be beneficial in the long run so you know what you can do with labelling. If you want organised date formatting, check out- https://matplotlib.org/stable/api/dates_api.html#date-formatters

See the following for some workarounds if you’d rather not go as deep as the docs or have a look around as there are few other similar posts-

One minor thing is I personally like is when the end of the string (the date here) aligns with the tick (at the minute it’s kind of centred) you can do this by passing horizontalalignment="right" ax a kwarg to .set_xticklabels. If you wanted to that is.

2 Likes

In that S.O. link there’s a comment a little ways down on the page that says if one uses the set_xticks method before set_xticklabels that should sort it out(?) That same comment mentions a Pandas bug report. Wouldn’t that work?
https://github.com/pandas-dev/pandas/issues/35684

2 Likes

Sounds reasonable to me. The docs say it sets tick locations https://matplotlib.org/stable/api/_as_gen/matplotlib.axes.Axes.set_xticks.html#matplotlib.axes.Axes.set_xticks which I guess makes the locations fixed so you get around the issue.

The issue comes about as the xaxis uses the units as a form of spacing for ticks (normally). As a rather silly example plotting values from 0 to 10 for example you can set “xticks” to numerical values (increments of 0.1 from 0.4 to 0.9 here) and it will use their numerical locations as their labels-
silly
Rather pointless in that example but it can be beneficial when you want custom labels but don’t want to have to set both position and label when they’re identical.

When you add strings as labels directly it doesn’t automatically know where on the axis to put them. It might try to put labels on existing tick locations, but this method is unreliable (I’ve definitely seen labels go missing entirely when relying on this, outside the range for example).

What you need is to explicitly state where on the axis the ticks should be and then you can make the labels for those ticks whatever you want. The warning is just because matplotlib does not guarantee that setting labels on ticks that used to be automatically placed on an axis will work correctly (this might be a silly error like a missing label or the labels could be shifted and suddenly your graph is wrong, it’s just a bit unpredictable).


Edit: Just found a simple example of this in practice. Depending on your viewer/screen size this could be different but I’ll show what happens on mine.

fig, ax = plt.subplots(1, 1)
ax.plot(range(10))

At this point the graph is boring but clear. There are 5 major VISIBLE ticks (0, 2, 4, 6, 8) and the range of the xaxis is perhaps around -0.2 to -9.8

What happens then if I force the ticks labels…

ax.set_xticklabels(["red", "orange", "yellow", "green", "blue"])
# At this point the output of xticklabels is-
#[Text(-2.0, 0, 'red'),
# Text(0.0, 0, 'orange'),
# Text(2.0, 0, 'yellow'),
# Text(4.0, 0, 'green'),
# Text(6.0, 0, 'blue'),
# Text(8.0, 0, ''),
# Text(10.0, 0, '')]

If you note the locations and labels you’ll already see they’ve gone a bit wonky. Some are outside our normal range, others have just gone missing. Upading the graph now with fig.canvas.draw() I see-

So we’ve lost “red” and also introduced some empty ticks. As for why the original ticks were at these locations: [-2., 0., 2., 4., 6., 8., 10.]. We can’t even see some of these ticks in the range we have but they’re there nontheless.

There are ways around this but I think the warning basically says, DON’T ASSUME for tick locations when applying text labels. They’re a fickle beast.

2 Likes

@tgrtim and @lisalisaj Thank you for the explanation. I didnt even think to use the ax.set_xticks before labeling them. also thank you for the clear deeper explanation of how the labels and ticks work! yall are the best!

2 Likes