NTU Course

R programming and application to Public Health data

Offered in 114-2
  • Serial Number

    47904

  • Course Number

    EPM5060

  • Course Identifier

    849 U0600

  • No Class

  • 3 Credits
  • Elective

    DEPARTMENT OF PUBLIC HEALTH / Health Data Analytics and Statistics / Graduate Institute of Epidemiology and Preventive Medicine

      Elective
    • DEPARTMENT OF PUBLIC HEALTH

    • Health Data Analytics and Statistics

    • Graduate Institute of Epidemiology and Preventive Medicine

  • Amrita Chattopadhyay
  • Thu 2, 3, 4
  • 公衛212

  • Type 2

  • 30 Student Quota

    NTU 30

  • No Specialization Program

  • English
  • NTU COOL
  • Core Capabilities and Curriculum Planning
  • Notes

    The course is conducted in English。

  • NTU Enrollment Status

    Enrolled
    0/30
    Other Depts
    0/0
    Remaining
    0
    Registered
    0
  • Course Description
    This course aims to provide a thorough introduction to R programming skills and enable students with comprehensive understanding and practical experience in public health data analysis. The course is structured into two distinct sections. The first section will train the students on using R, statistical software (freely available), towards writing smart codes for accomplishing data manipulation, data-processing and statistical analysis. In section 2, the students will be provided with real health data and will be trained to conduct a step by step analyses protocol implementing the techniques that they learnt in section 1. Additionally, a theoretical introduction will be provided at the beginning of each class to ensure a wholesome understanding of the concepts underlying each days task. • R-programming: importing data, data handling and manipulation, resampling strategies, statistical analysis techniques encompassing descriptive statistics, testing of hypothesis and regression. Data visualization techniques using ggplot2 and r-base plots, reading plots towards correct interpretation. • Health datasets analysis: Real de-identified datasets will be provided. Alternately, students can acquire health datasets by themselves (if they want) or use their own research datasets too. The students will be trained and guided to conduct hands-on analysis in a step-by step manner to accomplish descriptive data analysis, variable selection, association analysis/survival analysis, on the provided dataset(s). The students will also be allowed to apply any bioinformatics tools for visualization techniques. Combining R-programming, theoretical introduction along with hands-on analysis, the course equips participants with the skills to effectively analyze public health datasets and make informed, data-driven decisions in their research and practice. The course will for most part be computer based.
  • Course Objective
    Upon completion the students will be able tosuccesfully do the following • Use R to conduct all kinds of data manipulation and data cleaning • Develop statistical thinking and apply statistics in modern public health research and practice • Describe a data set using descriptive statistics and graphical methods as an initial step for more advanced analysis in R software. • Implement suitable methods to formulate and analyze statistical associations between variables in a data set using R. • Interpret the results and provide potential explanations for the findings. Skills that the student will gain: • Data analysis with R • Linear Regression • Logistic regression • Group comparison testing • Survival analysis • Visualization of data • Statistical thinking.
  • Course Requirement
    Biostatistics, Statistics, Basic programming (optional), Data preprocessing (optional), Data acquisition (optional)
  • Expected weekly study hours before and/or after class
  • Office Hour
    by appointment
  • Designated Reading
    1. Biostatistics with R, a guide for Medical doctors, Marco Moscarelli, Springer 2. A learning guide to R, Remko Duursma, Jeff Powel, Glenn Stone, Western Sydney University 3. Survival Analysis in Medicine and Genetics, Jialiang Li, Shuangge Ma, Chapman and Hall 4. Working with Data in Public Health, A practical pathway with R, Peng Zhao, Springer
  • References
    1. The Practical Guide to Clinical Research and Publication, Academic Press, Uzuung Yoon 2. Practical Clinical Research Design and Application, A Primer for Physicians, Surgeons, and Clinical Healthcare Professionals, Springer Open, Peter D. Fabricant
  • Grading
    20%

    Attendance, Class involvement, class interaction and participation: Evaluations will be done by end of class progress for every week, attendance, and interaction with the teacher.

    30%

    homework: Core capacity A, B, C, E and F will be judged by the home-work assignments each week, which is critical for continuance of analysis in the following week.

    25%

    midterm: the core capacity A, C, E, and F will be evaluated by their written report and Q&A.

    25%

    final analysis and presentation: the core capacity A,B, C, E, and F will be evaluated by dataset analysis report, oral presentation in class and Q&A.


    1. NTU has not set an upper limit on the percentage of A+ grades.
    2. NTU uses a letter grade system for assessment. The grade percentage ranges and the single-subject grade conversion table in the NATIONAL TAIWAN UNIVERSITY Regulations Governing Academic Grading are for reference only. Instructors may adjust the percentage ranges according to the grade definitions. For more information, see the Assessment for Learning Section
  • Adjustment methods for students
  • Make-up Class Information
  • Course Schedule
    Week 1R course day 1 a. Course introduction b. R, R studio, R package installations c. Data Import, data frames, Tibbles
    Week 2Application Day 1 Real dataset: Health data a. Data import using R b. Data structure using R c. R object and datatypes d. R vector and matrix e. Recoding categorical data using R f. Missing data using R g. Data cleaning using R
    Week 3R course day 2 Exploratory data analysis using R a. R subsetting b. R functions c. R loops d. descriptive statistics using R e. Summary statistics using R f. R base plots
    Week 4Application day 2 Health data: Categorical variables a. Data cleaning using R b. Missing data using R c. Data manipulation using R d. Stratification (by categories) using R e. Subsetting (desired columns and rows) using R f. Descriptive statistics, and summary statistics using R g. Visualize categorical data using R: Barplots, piecharts, dotplots
    Week 5Application day 3 Health data: Continuous (numeric) variables a. Data cleaning using R b. Missing data removal/imputation using R c. Data manipulation using R d. Subsetting using R e. Descriptive statistics, and summary statistics using R f. Visualize continuous data using R: Histograms, Boxplots
    Week 6Review week: mock exercise Write R codes on a practice data: a. Data import b. Data cleaning c. Data structure d. Identify numerical and categorical data e. Create subsets f. Descriptive/summary statistics g. Visualization
    Week 7Midterm practical exam Use R to analyze Real health data 1. Compare subdata groups –categorical 2. Compare sub data groups-continuous 3. Descriptive analysis 4. Visualizations
    Week 8R course Day 3 a. Regression Using R: logistic, linear, Cox-proportional Hazards b. Parametric tests using R c. Nonparametric tests using R d. ANOVA tests using R to compare more than two groups e. Visualization using R: R base plots, ggplot2: Scatterplots, correlation plots
    Week 9Application Day 4 Dataset: Health Data: discrete variables Statistical tests using R a. Create contingency tables b. Fisher exact test, Chi-Square test, Rank tests, Kruskal Wallis test c. One-sided test, two sided test
    Week 10Application Day 5 Dataset: Health data - Continuous variables Statistical tests using R a. Normality testing: Shapiro Wilks test b. Equality of Variance test: Bartlett test c. Proportion test, Z-test, T-test, Mann-Whitney test, Wilcoxon rank sum test. d. One sided and two sided tests
    Week 11Application Day 6: Dataset: Health dataset a. Linear regression b. Variable selection strategies using R c. Correlation analysis and correlation test using R d. Data fitting and visualization (scatterplots), correlation plots using R
    Week 12Application Day 7: Dataset: Health dataset a. Association analysis, Logistic regression b. Variable selection strategies using R c. Correlation analysis and correlation test using R d. Data fitting and visualization (scatterplots), correlation plots using R
    Week 13R course day 4: Survival data analysis using R a. Pre-Processing b. Kaplan Meier Analysis using R c. Cox Proportional Hazards regression using R d. Univariate and multivariate analysis using R e. Model performance metrics using R a. Cross-validation using R, R loops
    Week 14Application Day 8: Survival data analysis using R a. Kaplan Meier analysis b. Cox-Proportional Hazards regression analysis c. KM plots, Forest plots d. Discriminant analysis e. Cross-validation performance analysis
    Week 15Review week: mock exercise Use example datasets to do the following a. Statistical tests using R (discrete and continuous) b. Regression analysis using R c. Survival analysis using R
    Week 16Final Practical Exam End to end data analysis (health data or survival data)