How to split files from the command line and integrate bash and R scripts (CC252)

Опубликовано: 29 Сентябрь 2022
на канале: Riffomonas Project
1,999
58

The split function is a useful command line tool for splitting a file by size, number of lines, or into a specified number of files. In this episode, Pat uses split to subdivide a big problem that can't easily be processed in R. He then shows how to integrate an executable R script into a bash script to make easy work of a very large file. The overall goal of this project is to highlight reproducible research practices using a number of tools. The specific output from this project will be a map-based visual that shows the level of drought across the globe.

You can find my blog post for this episode at https://www.riffomonas.org/code_club/....

#split #bash #Rstats #R

Support Riffomonas by becoming a Patreon member!
  / riffomonas  

Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at https://shop.riffomonas.org/youtube to get practice problems, tips, and insights.

If you're interested in taking an upcoming 3 day R workshop be sure to check out our schedule at https://riffomonas.org/workshops/

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: https://www.riffomonas.org/minimalR/
General data: https://www.riffomonas.org/generalR/

0:00 Introduction
4:06 Updating coreutils with conda/mamba
5:11 Using the split function
9:37 Implementing split in our pipeline
13:32 Modifying R script for one file
16:48 Finding dates within a desired window
25:56 Finding total precipitation in each window
26:45 Modifying R script to process all of files
30:36 Adding R script to bash script and Snakefile
33:21 Cleaning up directory and repository