Joins

There is a taxonomy of joins or merges of different data frames which is largely derived from SQL practice, but not always totally adhered to or clearly delineated.

Do Exercise 1: Join fish tables

Casting into wide form

Species data is often visualized as a matrix with sampled locations for rows and observed species for columns, with abundances or other indices as the cells.

To demonstrate this we will use the fish tables from the exercise above.

First add another site with some random counts:

# original data is site 1
fish_counts$site <- "site1"

fish_wts$COUNT <- rpois(5, lambda=5)
#add a site name
fish_wts$site <- "site2"

#select columns and order for easy binding of dataframes
fish_wts <- select(fish_wts, fish_species, COUNT, site)

#bind rows to create a dataframe
fish2 <- rbind.data.frame(fish_counts, fish_wts)

The dplyr reshape function to go from long to wide is spread

spread(fish2, fish_species, COUNT, fill=0)

The data.table way is to use dcast

# just change weights into counts for the exercise
setnames(wts, 'wt', 'COUNT')
fish2 <- rbindlist(list(cnt, wts), idcol = 'site')
dcast(fish2, formula = site ~ fish_species, fill = 0)

Reshaping data from long to wide form and back again is a commonly desired task that is often frustrating. It has also led to my (EL’s) single favorite comment on all of stack overflow:

How to reshape data…

hadley comment