Suppose we have 2 dataframes of appointments for 2 doctors, and wish to have a common list. Now, in this case, the columns are identical for two dataframes, so no matter what I use, an outer merge, or a concat, the output should be the same. Right?
What is the difference between the two, and and in what scenarios is this difference highlighted?
@cloudsolver36218,
The short answer is that pd.concat()
basically stitches two DataFrames together along either the rows or the columns.
pd.merge()
, on the other hand, uses a database-style join on the specified columns (or if none are given, it will default to the columns that match each other).
When you are trying to achieve a full outer merge, you can get very similar results with the two. However, with pd.concat()
, you are selecting either axis=0
or axis=1
, instead of which column(s) to merge on. A default concat in Pandas will basically mimic a UNION ALL in SQL.
The reality, however, is that the distinction gets way more complicated depending on what you are trying to do. For a very in-depth discussion on the differences and use cases of pd.concat()
, pd.append()
, DataFrame.join()
and pd.merge()
, see the documentation here.
Another great way to determine which one is better for your use case is to experiment with a couple of small DataFrames. As a starting point, you can use this example that I wrote up on repl.it.
Hope this helps clear some of the confusion.
Happy coding!
1 Like