In the context of this exercise, if two dataframes share more than one column name, how are they merged? Are they just merged on the first matching column, or every matching column?
Answer
The merge will check all columns that match between the two dataframes if they share more than one column name.
By default, if we run the pd.merge() method, it performs an inner join. With an inner join, all values of every matching column must match in order for the rows to be returned.
In the following example, only the rows for which all values of every matching column are the same will be returned.
I was carrying out the requested exercise and I tried to produce some code which saved the rows where revenue was more than target as a separate variable. I’m not sure why my code didn’t work, if anyone could explain that would be much appreciated.
This line evaluates to true or false. It checks whether revenue is greater than the target and returns true if revenue is greater than target else false.
Use
crushing_it = sales_vs_targets[sales_vs_targets.revenue > sales_vs_targets.target]
Or
crushing_it = sales_vs_targets.loc[(sales_vs_targets.revenue) > (sales_vs_targets.target)]
crushing_it = sales_vs_targets .revenue > sales_vs_targets.target this code has only selected the columns of a table that has such columns. we need to specify the table we are fetching this info from
so your code needs to state the data-frame and then the columns like this crushing_it = sales_vs_targets [sales_vs_targets.revenue > sales_vs_targets.target]