Single-Course English 5 ECTS

R for Bio Data Science

Overall Course Objectives

The aim of this course is to equip students with practical skills in modern bio data science using Tidyverse R, RStudio IDE, and the Quarto reporting system. Throughout the course, students will learn how to transform messy data sets into clean and organized ones, perform data analysis, gain insights through exploratory data analysis, and communicate results via data visualization and dynamic reporting. Emphasis will be placed on the importance of reproducible data analysis and designing, organizing and executing collaborative bio data science projects using Tidyverse R and git/GitHub. The course will exclusively focus on biological data sets.

Learning Objectives

  • explain why reproducible data analysis is important and the difference between replicability and reproducibility.
  • describe the basic concepts of data cleaning and transformation and how they relate to reproducible data analysis.
  • explain which Tidyverse tools do what and correctly identify appropriate tools for given tasks.
  • apply Tidyverse tools to convert a messy data set into a clean and consistent one in context with exploratory data analysis and gain insights into biological data.
  • use RStudio and git/GitHub to collaborate on bio data science projects.
  • perform basic statistical tests and linear models using Tidyverse framework.
  • create a simple R package.
  • create a simple Shiny app.
  • independently identify and adapt relevant novel state-of-the-art data science tools.
  • use Large-Language-Model (LLM) technology such as chatGPT as a sparring-partner in a Bio Data Science project and assess the potential pitfalls and impact hereof.
  • design and organize a collaborative end-to-end bio data science project using Tidyverse R and git/GitHub and present the results in a comprehensive dynamic Quarto report/presentation.
  • analyze an already performed Bio Data Science project to assess the choice of methods, reproducibility, and quality of data communication.

Course Content

Modern bio data science in Tidyverse R, data -cleaning, -wrangling, -visualisation and -communication. Tidyverse R, RStudio, Quarto, dplyr, ggplot, reproducible bio data analysis, rstudio.cloud, shinyapps.io, R-packages, git/GitHub, bio data science project organisation all with an applied focus. Some elements of applied basic statistics and machine learning.

Recommended prerequisites

01005/02402/27024/23214/22101/02631/02632/02633/27002/27008/22111, It is assumed that the student has existing knowledge of mathematics, statistics, basic programming (language irrelevant), life science and bioinformatics corresponding to the level of bachelor’s courses at DTU (see relevant course numbers above).

Teaching Method

Semi-flipped classroom. Students prepare assigned written and video based materials before class. Classes are initiated by a brief summary of key points from last week’s exercises followed by a brief introduction to key points of the topic of the day. The remainder of the class is Cloud based exercises. Students are required to bring their own laptop with functioning wireless internet and a valid DTU account.

Faculty

See course in the course database.

Registration

Language

English

Duration

13 weeks

Institute

Health Tech

Place

DTU Lyngby Campus

Course code 22100
Course type Bachelor
Semester start Week 35
Semester end Week 48
Days Tues 8-12
Price

7.500,00 DKK

Registration