What are some differences between Pandas, Numpy and Matplotlib?

Question

What are some differences between the Python data science modules Pandas, Numpy and Matplotlib?

Answer

Although they may appear similar, these modules have unique purposes and functionalities.

The Pandas module is used for working with tabular data. It allows us to work with data in table form, such as in CSV or SQL database formats. We can also create tables of our own, and edit or add columns or rows to tables. Pandas provides us with some powerful objects like DataFrames and Series which are very useful for working with and analyzing data.

The Numpy module is mainly used for working with numerical data. It provides us with a powerful object known as an Array. With Arrays, we can perform mathematical operations on multiple values in the Arrays at the same time, and also perform operations between different Arrays, similar to matrix operations.

Last, but not least, the Matplotlib module is used for data visualization. It provides functionality for us to draw charts and graphs, so that we can better understand and present the data visually.

These modules have different purposes and functionality they excel at, and together they allow us to analyze, manipulate and visualize data in very useful ways.

6 Likes

How is seaborn different from matplotlib?

3 Likes

I know this is late but I’ll try to answer anyway; I think seaborn is made to present patterns & findings towards stakeholders without little to no technical knowledge (business partners, directors, etc.) and matplotlib is more like a quick-view plotting package to do some initial exploratory data analysis.

Matplotlib is also more robust iirc, we’ll be using a lot of matplotlib if we’re to model optimization algorithms performance on a training dataset.

1 Like

A quick-view option :open_mouth: heresy! My opinion would land seaborn as the quick presentation option. But I’m biased heavily against some of seaborn’s defaults so take my opinion with a grain of salt :wink:.

Since we’re on an FAQ it’s worth noting that matplotlib is the acutal plotting library; it does all the heavy lifting (seaborn requires matplotlib). You can create identical figures with matplotlib alone but Seaborn is intended to make this task easier.

It acts as more of a wrapper on top of matplotlib that introduces a number of defaults and useful functions that are designed to create a visually appealing plot quickly. In many cases you can almost treat it like a selection of handy style options for visualation but it is also designed to work with the pandas library which many users find to be very useful.

If you want full control of customising a plot you need to work with matplotlib whereas in some cases seaborn can reduce the amount of time and effort you need to put in for creating presentable figures.

For anyone just starting out it’s certainly worth having a look at seaborn’s options to see if they suit your intended visualisations.

Quite a few sites discuss the two so if you need more input have a search around.
For example- https://towardsdatascience.com/matplotlib-seaborn-pandas-an-ideal-amalgamation-for-statistical-data-visualisation-f619c8e8baa3?gi=d7a9d74e73a7

3 Likes

Amazing explanation, Thanks a lot!

1 Like