Museums and Nature Centers - Museums by revenue

Hi, everyone!

I am working my way through the Museums and Nature Centers and I got stuck in the second part of the project - Museums by revenue.

The link to the project:
https://www.codecademy.com/paths/analyze-data-with-r/tracks/data-visualization-in-r-skill-path/modules/intermediate-data-visualization-with-ggplot-2/projects/data-visualization-in-r-museums

Step 11
For the next few tasks, we’ll switch to looking at how much money each institution brought in and how that varies across geographies. Because we only have revenue data at the parent organization level, we’ll want to first filter our dataset to omit any duplicates. Next, we’ll create a few data frames from our starting data to look at different groups of museums by how much money they bring in.

Create a new data frame called museums_revenue_df that retains only unique values of Legal.Name in museums_df . Additionally, filter this data frame to include only entities with Annual.Revenue greater than 0.

Create a second data frame from museums_revenue_df (the first data frame we created in this task) called museums_small_df that retains only museums with Annual.Revenue less than $1,000,000.

Create a third data frame from museums_revenue_df (the first data frame we created in this task) called museums_large_df that retains only museums with Annual.Revenue greater than $1,000,000,000.

Filter data frame

museums_revenue_df <- museums_df %>%

distinct(Legal.Name, .keep_all = TRUE) %>%

filter(Annual.Revenue > 0)

Filter for only small museums

museums_small_df <- museums_revenue_df %>%

filter(Annual.Revenue < 1000000)

Filter for only large museums

museums_large_df <- museums_revenue_df %>%

filter(Annual.Revenue >= 1000000000)

Step 12
Let’s start by visualizing the distribution of annual revenue for our small museums dataset. Create a histogram called revenue_histogram using museums_small_df with Annual.Revenue mapped to the x axis. Experiment with different binwidth values to see what works best for our data, considering that our x axis variable ranges from 0 to $1,000,000.

revenue_histogram <- ggplot(museums_small_df, aes(x = Annual.Revenue)) +

geom_histogram(binwidth=25000) +

scale_x_continuous(scales::dollar_format())

revenue_histogram


This is the result of my code which doesn’t seem to match my code. The histogram is showing the x axis also under 0 when I filtered the museums_revenue_df to values > 0.

Also, changing the binwidth doesn’t seem to have any effect over the histogram, I tried multiple values and there is no change. I tried refreshing the page but still no change when I changed the values.

Therefore I am stuck and I’d appreciate some help with this exercise.
Thank you! :slight_smile:

Hi, welcome to the forums!

I’m not fluent in R, but, I believe this is happening with your x axis numbers b/c of the way R formats large numbers–as scientific numbers.

Check this out:

And maybe this (you can format the numbers):
https://www.r-bloggers.com/2010/05/number-formatting/

Thank you for the advice, I will check the extra documentation!

1 Like

Let us know if you’re able to fix it. :slight_smile: