Suppose we have 2 dataframes of appointments for 2 doctors, and wish to have a common list. Now, in this case, the columns are identical for two dataframes, so no matter what I use, an outer merge, or a concat, the output should be the same. Right?
What is the difference between the two, and and in what scenarios is this difference highlighted?
The short answer is that
pd.concat() basically stitches two DataFrames together along either the rows or the columns.
pd.merge(), on the other hand, uses a database-style join on the specified columns (or if none are given, it will default to the columns that match each other).
When you are trying to achieve a full outer merge, you can get very similar results with the two. However, with
pd.concat(), you are selecting either
axis=1, instead of which column(s) to merge on. A default concat in Pandas will basically mimic a UNION ALL in SQL.
The reality, however, is that the distinction gets way more complicated depending on what you are trying to do. For a very in-depth discussion on the differences and use cases of
pd.merge(), see the documentation here.
Another great way to determine which one is better for your use case is to experiment with a couple of small DataFrames. As a starting point, you can use this example that I wrote up on repl.it.
Hope this helps clear some of the confusion.