In this screencast tutorial, Pat Schloss shows how you can use the different join functions from the tidyverse's dplyr package to combine multiple data frames that have different structures, but describe similar entities. He uses the inner_join and anti_join functions to connect the species name of each genome in the rrnDB database. This episode is part of a larger arc of episodes investigating the sensitivity and specificity of amplicon sequence variants (ASVs), also known as exact sequence variants (ESVs). ASVs are growing in popularity for analyzing microbial communities using 16S rRNA gene sequences. Pat demonstrates these concepts by live coding at the command line interface using RStudio, GitHub Flow, and make.
The accompanying blog post contains the exercises and solutions can be found at http://www.riffomonas.org/code_club/2....