Learning Objectives

Following this assignment students should be able to:

  • understand the syntax of a ggplot2 object
  • graph raw data to diagnose QC and completeness
  • match a ggplot2 geometry function to the appropriate QC task

Reading

Lecture Notes


Exercises

  1. -- Join fish tables --

    # see what's available in this week's data folder
    list.files("DataViz-26March2018/data/", pattern = '.csv')
    
    

    Using the fish families data set in the data folder for this week, create a table that includes as columns:

    • fish species
    • fish family
    • count

    and show only those species with more than 7 counts.

    [click here for output]
  2. -- Chlorophyll A histogram --

    Using the lakes dataset, generate a histogram of chlorophyll A levels in the samples. (Hint: use geom_histogram)

    You may see a message in the console that reads:

    `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
    

    Use the binwidth option in the geom function to create a histogram with data by bins of 10.

  3. -- Scatterplot --

    Using the lakes dataset and ggplot2, create a bivariate scatterplot with algal species richness on the x axis and chlorophyll a on the y axis.

    What else would you add to this plot if you were trying to understand this data more deeply?

  4. -- Grouping ggplot --

    Using the lakes dataset and ggplot2, create a bivariate scatterplot with algal species richness on the x axis and chlorophyll a on the y axis.

    Using the shape grouping argument in aes(), assign a different shape to each lack in the scatterplot.

    What does this plot look like?

    What if anything does the message in the console say?

    What might be a better way to plot this relationship for the different lakes?

  5. -- Faceting ggplot --

    Using the lakes dataset and ggplot2, create a factor plot with treatment on the x axis and chlorophyll a on the y axis.

    Make this a faceted plot with an individual panel for each lake. Make sure to display all the data.

    Which lake is the most variable in its treatment response?