Learning Objectives
Following this assignment students should be able to:
- understand the data manipulation functions of
dplyr
- understand the
Reading
-
Topics
dplyr
data.table
-
Readings
Lecture Notes
Assignment Introduction: Creating a schema
Wrapping up SQL
Data manipulation in R
- dplyr
- filter
- select
- arrange
- mutate
- summarise
- pipes & syntax
- data.table
Exercises
-- dplyr --
Install and familiarize yourself with the
dplyr
package. Thelibrary()
step(s) should always be located at the very top of a script.install.packages("dplyr") library(dplyr) help(package = dplyr)
This vignette is a great reference for data manipulation verbs to keep in mind.
-- Bee Trait Data Basics --
Connect to the BeeLab dataset on your local MySQL instance. In R:
- Create a connection to the database using
RMariaDB
- Import the
bee_traits
table into a named data frame. - Check the column names in the data using the function
names()
. - Use
str()
to show the structure of the data frame and its individual columns. -
Print out the first few rows of the data using the function
head()
.Use
dplyr
to complete the remaining tasks. - Select the first few rows of data from the Taxon_Author column and print it out.
- Select the data from the Family, Subfamily, and Tribe columns and print it out.
- Filter the data for all of the bees that are Parasitic and print out their genus and species.
- Create a new data frame called
endangered_bees
which have a value of “endangered” in the USFWS_status column. Print it out.
- Create a connection to the database using
-- Bee Trait Aggregate --
This is a follow-up to Bee Trait Data Basics.
For data.frame
df
thisdplyr
code calculates the average day of emergence by Family:Famdf <- group_by(df[!is.na(Pheno_mean)], Family) summarize(Famdf, emerge_Julian = mean(Pheno_mean))
- Modify the code to calculate and print the average emergence of a bee species in each Subfamily.
- Use
max()
to determine the latest bee emergence in each Family.
-- Bee Table Merge --
This is a follow-up to Bee Trait Aggregation.
From the BeeLab database, import the
mn_bees
table and then useinner_join
to combine it with the bee_traits table to add aBee_Family
column to themn_bees
data.-- Bee Piping --
This is a follow-up to Bee Trait Aggregation and Bee Table Merge.
-
Import the
sites
table from the BeeLab database into a data.frame. You should already havebee_traits
andspecimens
in your R environment.Use
dplyr
to perform the following - Calculate the site that has the highest bee species richness.
- Calculate the unique number of Parasitic bees observed in each soil type.
- For all parasitic bees in
bee_traits
, make a table showing the number of species in each genus by Lecticity (the columns will be something likegenus
,Lecticity
,count_sp
).
-