Tools for Reproducible Real-World Analyses
This course focuses on the concepts and tools of reproducible research and reporting of modern data analyses. The need for more reproducible tools in health economics and outcomes research is growing rapidly as analyses of real world data become more frequent, involve larger datasets, and employ more complex computations.
We cover the principles of structuring and organizing a modern data analysis, literate statistical analysis tools, formal version control, software testing and debugging, and developing reproducible reports. Numerous real-world examples and an interactive class exercises reinforce the concepts and tools introduced.
RStudio Cloud is used for exercises. Participants who wish to gain hands-on experience should bring their laptops.
Concepts
- What is reproducible research?
- Why is reproducibility so important?
- How do we get there?
- Organizing data
- Writing clear code
- Disseminating code & findings
- Catching mistakes
Materials and Exercises
Click here for slides, code, in-class exercises, and more fun things [PASSWORD REQUIRED].
Dataset for hands-on exercises during the class [PASSWORD REQUIRED].
Supplemental Resources
coding style & culture
- Writing system software: code comments
- Style guide from Google
- The Tidyverse style guide
- Software Carpentry: best practices for writing R code
- Code review best practices
learning R
- RStudio webinars
- RStudio cheat sheets
- R for Data Science textbook (also available as paperback via Amazon)
- Advanced R programming (covers many advanced topics)
reproducible research
- Good Practices for Real-World Data Studies of Treatment and/or Comparative Effectiveness: Recommendations from the Joint ISPOR-ISPE Special Task Force on Real-World Evidence in Health Care Decision Making
- Reproducibility checklist
- rOpenSci – a non-profit website that fosters reproducible research
- Coursera Reproducible Research