For the provided dataframe, what was the purpose of the "label" value?


#1

Question

In the context of this exercise, for the provided dataframe, what was the purpose of the “label” value?

Answer

In this exercise, we are provided the following dataframe

n=500
df = pd.DataFrame({
    "label": ["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n,
    "value": np.concatenate([set_one, set_two, set_three, set_four])
})

The purpose of the “label” value is to provide the x values for every y value of the plot. The x values being “label”, and the y values being “value”.

Each dataset, set_one, set_two, set_three, and set_four have 500 values each, which is why the variable n has been set to 500.

This gives us a total of 500 + 500 + 500 + 500 = 2000 values in the concatenated dataset. As a result, we need 500 x values for each dataset. The x values will be set as the strings "set_one", "set_two", "set_three", "set_four". So,

["set_one"] * n + ["set_two"] * n + ["set_three"] * n + ["set_four"] * n
= 
["set_one"] * 500 + ["set_two"] * 500 + ["set_three"] * 500 + ["set_four"] * 500

# gives us 500 of each label string, 
# for a total of 2000 x values in a single list:
["set_one", ..., "set_two", ..., "set_three", ..., "set_four", ...]

We thus have the 2000 x values, so that each one is paired one-to-one with each of the 2000 y values.