Learning Objectives
Following this assignment students should be able to:
- understand the data manipulation functions of
dplyr- understand the
Reading
-
Topics
dplyrdata.table
-
Readings
Lecture Notes
Assignment Introduction: Creating a schema
Wrapping up SQL
Data manipulation in R
- dplyr
- filter
- select
- arrange
- mutate
- summarise
- pipes & syntax
- data.table
Exercises
-- dplyr --
Install and familiarize yourself with the
dplyrpackage. Thelibrary()step(s) should always be located at the very top of a script.install.packages("dplyr") library(dplyr) help(package = dplyr)This vignette is a great reference for data manipulation verbs to keep in mind.
-- Bee Trait Data Basics --
Connect to the BeeLab dataset on your local MySQL instance. In R:
- Create a connection to the database using
RMariaDB - Import the
bee_traitstable into a named data frame. - Check the column names in the data using the function
names(). - Use
str()to show the structure of the data frame and its individual columns. -
Print out the first few rows of the data using the function
head().Use
dplyrto complete the remaining tasks. - Select the first few rows of data from the Taxon_Author column and print it out.
- Select the data from the Family, Subfamily, and Tribe columns and print it out.
- Filter the data for all of the bees that are Parasitic and print out their genus and species.
- Create a new data frame called
endangered_beeswhich have a value of “endangered” in the USFWS_status column. Print it out.
- Create a connection to the database using
-- Bee Trait Aggregate --
This is a follow-up to Bee Trait Data Basics.
For data.frame
dfthisdplyrcode calculates the average day of emergence by Family:Famdf <- group_by(df[!is.na(Pheno_mean)], Family) summarize(Famdf, emerge_Julian = mean(Pheno_mean))- Modify the code to calculate and print the average emergence of a bee species in each Subfamily.
- Use
max()to determine the latest bee emergence in each Family.
-- Bee Table Merge --
This is a follow-up to Bee Trait Aggregation.
From the BeeLab database, import the
mn_beestable and then useinner_jointo combine it with the bee_traits table to add aBee_Familycolumn to themn_beesdata.-- Bee Piping --
This is a follow-up to Bee Trait Aggregation and Bee Table Merge.
-
Import the
sitestable from the BeeLab database into a data.frame. You should already havebee_traitsandspecimensin your R environment.Use
dplyrto perform the following - Calculate the site that has the highest bee species richness.
- Calculate the unique number of Parasitic bees observed in each soil type.
- For all parasitic bees in
bee_traits, make a table showing the number of species in each genus by Lecticity (the columns will be something likegenus,Lecticity,count_sp).
-
