Question
In the context of this exercise, for the provided dataframe, what was the purpose of the “label” value?
Answer
In this exercise, we are provided the following dataframe
n=500
df = pd.DataFrame({
"label": ["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n,
"value": np.concatenate([set_one, set_two, set_three, set_four])
})
The purpose of the “label” value is to provide the x values for every y value of the plot. The x values being “label”, and the y values being “value”.
Each dataset, set_one
, set_two
, set_three
, and set_four
have 500 values each, which is why the variable n
has been set to 500.
This gives us a total of 500 + 500 + 500 + 500 = 2000 values in the concatenated dataset. As a result, we need 500 x values for each dataset. The x values will be set as the strings "set_one", "set_two", "set_three", "set_four"
. So,
["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n
=
["set_one"] * 500 + ["set_two"] * 500 + ["set_three"] * 500 + ["set_four"] * 500
# gives us 500 of each label string,
# for a total of 2000 x values in a single list:
["set_one", ..., "set_two", ..., "set_three", ..., "set_four", ...]
We thus have the 2000 x values, so that each one is paired one-to-one with each of the 2000 y values.