Refactoring R code to make it faster and more memory efficient (CC281)

Опубликовано: 09 Май 2024
на канале: Riffomonas Project
894
39

Based on the extensive benchmarking Pat's been doing in recent episodes, it's now time to apply his results to get the code running without blowing out the required amount of RAM. Pat will use the test driven development (TDD) framework to refactor his code to have confidence that the improved code is correct. He also does a quick benchmarking comparison of table and tabulate. This episode is part of an ongoing effort to develop an R package that implements the naive Bayesian classifier.

If you want to get a physical copy of R Packages: https://amzn.to/43pMR8L
If you want a free, online version of R packages: https://r-pkgs.org/

You can find my blog post for this episode at https://www.riffomonas.org/code_club/....

Check out the GitHub repository at the:
Beginning of the episode: https://github.com/riffomonas/phyloty...
End of the episode: https://github.com/riffomonas/phyloty...


#rstats #refactor #testthat #tdd #microbenchmark #vectors #rdp #16S #classification #classifier #microbialecology #microbiome

Support Riffomonas by becoming a Patreon member!
  / riffomonas  

Want more practice on the concepts covered in Code Club? You can sign up for my weekly newsletter at https://shop.riffomonas.org/youtube to get practice problems, tips, and insights.

If you're interested in purchasing a video workshop be sure to check out https://riffomonas.org/workshops/

You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: https://www.riffomonas.org/minimalR/
General data: https://www.riffomonas.org/generalR/



0:00 Introduction
1:26 Pinpointing where memory issue occurs
6:17 Refactoring detect_kmers
10:59 Refactoring detect_kmers_across_sequences
14:57 Refactoring calc_word_specific_priors
20:38 Benchmarking table and tabulate
26:00 Refactoring calc_genus_conditional_prob
30:43 Refactoring build_kmer_database
32:24 Will the vignette run without crashing?
35:04 Replace sapply with a for loop