Data Manipulation in R

We will use the connection to the BeeLab dataset hosted on your local machine for these exercises:

  1. Start MAMP - ensure it is running
  2. Open the R Studio project file you have associated with your local repo of ENT5920_DatabaseTutorial
  3. Before beginning, pull to get the latest version of any shared code.

Basic dplyr

Pull in data
bee_traits <- dbReadTable(con, "bee_traits")

library(dplyr)
library(data.table)

dat <- setDT(bee_traits)
df <- data.frame(bee_traits)
class(dat); class(df)

Do Exercise 2 - Bee Trait Data Basics.

Aggregation

Famdf <- group_by(df, Family)

Do Exercise 3 - Bee Trait Aggregate.

Joins

specimens <- dbReadTable(con, "specimens")
combined <- inner_join(specimens, df, by = c('genus', 'species'))
head(combined)

Do Exercise 4 - Bee Table Merge.

Pipes

parasiticBees <- filter(df, Parasitic == 'Yes')
parasiticBees_byFam <- group_by(parasiticBees, Family)
parasitic_avg_emerge_byFam <- summarize(parasiticBees_byFam, 
                                 avg_emerge = mean(Pheno_mean, na.rm=TRUE))
parasitic_avg_emerge_byFam
df %>% # Command-Shift-M is the shortcut to insert a pipe %>% 
  filter(Parasitic == 'Yes') %>% 
  group_by(Family) %>% 
  summarize(avg_emerge = mean(Pheno_mean, na.rm = TRUE))

Do Exercise 5 - Bee Piping.