Off-platform Project with R - Coronavirus Pandemic - extra challenges

Hello! I am following a skill path Analyze Data with R and currently I am a bit stuck with coronavirus off-platform project and its extra challenges. I’ve managed to solve the third extra challenge where you are supposed to create a side-by-side boxplot, but it seems to me my code looks a bit ugly and long. Could you please advise me a more concise manner of arriving to the same result?

Here is what I managed to do:

Firstly, I selected only the needed columns:

confirmed_day_by_day <- confirmed %>%
  group_by(`Country/Region`) %>%
  select(-Lat, -Long, -`Province/State`)

Then transposed the data:

transposed_conf_day_by_day <- confirmed_day_by_day %>%
  t() %>%

transposed_conf_day_by_day <- transposed_conf_day_by_day %>%
  row_to_names(row_number = 1) %>%
  apply(MARGIN = 2, as.numeric)

United columns with the same name into one:

transposed_conf_day_by_day <- t(rowsum(t(transposed_conf_day_by_day), group = colnames(transposed_conf_day_by_day), na.rm = T))

transposed_conf_day_by_day <-

Finally I selected only the values for Italy:

italy_day_by_day_conf <- transposed_conf_day_by_day %>%
  select(Italy, date) %>%
  rename(confirmed = Italy)

Then I did the same with recovered and deaths tables. Finally I joined all the Italy tables into one:

italy_day_by_day <- italy_day_by_day_conf %>%
  full_join(italy_day_by_day_recov) %>%

Then I formatted data into a long format and created a graph

italy_day_by_day_long <- italy_day_by_day %>%
  gather(event, total, confirmed:recovered:death)

ggplot(italy_day_by_day_long, aes(x=date, y=total, fill = event)) +
  geom_bar(stat = 'identity', position = 'dodge') +
  labs(x='Days From January 20, 2020', y='Total', title = 'COVID 19 in Italy')

Here is it. But it feels like there should be a better way of completing this task. I would be very grateful for your help!

Any feedback is also welcome!

Thanks and have a great day!

It looks like you have done a good job with your code! However, there is indeed a simpler way to achieve the same result.

Instead of transposing the data and joining the tables, you can use the pivot_longer() function from the tidyr package to convert the data from wide to long format. Here’s an example:


italy_day_by_day ← confirmed %>%
filter(Country/Region == “Italy”) %>%
select(-Lat, -Long, -Province/State) %>%
pivot_longer(cols = -date, names_to = “event”, values_to = “total”)

ggplot(italy_day_by_day, aes(x = date, y = total, fill = event)) +
geom_bar(stat = “identity”, position = “dodge”) +
labs(x = “Days From January 20, 2020”, y = “Total”, title = “COVID-19 in Italy”)

Here, we first filter the confirmed dataset to keep only the rows for Italy, and then use pivot_longer() to convert the columns into rows. The cols = -date argument tells pivot_longer() to keep the date column as is, and convert all the other columns into rows. The names_to argument specifies the name of the new column that will contain the old column names, and the values_to argument specifies the name of the new column that will contain the old column values.

With the data in long format, we can now use ggplot2 to create a side-by-side boxplot by mapping the event variable to the fill aesthetic. This avoids the need to join multiple tables and makes the code much simpler.

I hope this helps! Let me know if you have any other questions.