Learning Objectives
Following this assignment students should be able to:
- understand the syntax of a
ggplot2
object- graph raw data to diagnose QC and completeness
- match a ggplot2 geometry function to the appropriate QC task
Reading
-
Topics
- Purposes and principles of data visualization
ggplot
-
Readings
Lecture Notes
- Joins
- Principles and practice of data visualization
- ggplot
- introduction & syntax
- grouping & faceting
- handling & plotting time series
Exercises
-- Join fish tables --
# see what's available in this week's data folder list.files("DataViz-26March2018/data/", pattern = '.csv')
Using the fish families data set in the data folder for this week, create a table that includes as columns:
- fish species
- fish family
- count
and show only those species with more than 7 counts.
[click here for output]-- Chlorophyll A histogram --
Using the lakes dataset, generate a histogram of chlorophyll A levels in the samples. (Hint: use
geom_histogram
)You may see a message in the console that reads:
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Use the binwidth option in the
geom
function to create a histogram with data by bins of 10.-- Scatterplot --
Using the lakes dataset and
ggplot2
, create a bivariate scatterplot with algal species richness on the x axis and chlorophyll a on the y axis.What else would you add to this plot if you were trying to understand this data more deeply?
-- Grouping ggplot --
Using the lakes dataset and
ggplot2
, create a bivariate scatterplot with algal species richness on the x axis and chlorophyll a on the y axis.Using the
shape
grouping argument inaes()
, assign a different shape to each lack in the scatterplot.What does this plot look like?
What if anything does the message in the console say?
What might be a better way to plot this relationship for the different lakes?
-- Faceting ggplot --
Using the lakes dataset and
ggplot2
, create a factor plot with treatment on the x axis and chlorophyll a on the y axis.Make this a faceted plot with an individual panel for each lake. Make sure to display all the data.
Which lake is the most variable in its treatment response?