Serial Number
47904
Course Number
EPM5060
Course Identifier
849 U0600
No Class
- 3 Credits
Elective
DEPARTMENT OF PUBLIC HEALTH / Health Data Analytics and Statistics / Graduate Institute of Epidemiology and Preventive Medicine
DEPARTMENT OF PUBLIC HEALTH
Health Data Analytics and Statistics
Graduate Institute of Epidemiology and Preventive Medicine
Elective- Amrita Chattopadhyay
- View Courses Offered by Instructor
COLLEGE OF PUBLIC HEALTH Graduate Institute of Epidemiology and Preventive Medicine
- Thu 2, 3, 4
公衛212
Type 2
30 Student Quota
NTU 30
No Specialization Program
- English
- NTU COOL
- Core Capabilities and Curriculum Planning
- Notes
The course is conducted in English。
NTU Enrollment Status
Enrolled0/30Other Depts0/0Remaining0Registered0- Course DescriptionThis course aims to provide a thorough introduction to R programming skills and enable students with comprehensive understanding and practical experience in public health data analysis. The course is structured into two distinct sections. The first section will train the students on using R, statistical software (freely available), towards writing smart codes for accomplishing data manipulation, data-processing and statistical analysis. In section 2, the students will be provided with real health data and will be trained to conduct a step by step analyses protocol implementing the techniques that they learnt in section 1. Additionally, a theoretical introduction will be provided at the beginning of each class to ensure a wholesome understanding of the concepts underlying each days task. • R-programming: importing data, data handling and manipulation, resampling strategies, statistical analysis techniques encompassing descriptive statistics, testing of hypothesis and regression. Data visualization techniques using ggplot2 and r-base plots, reading plots towards correct interpretation. • Health datasets analysis: Real de-identified datasets will be provided. Alternately, students can acquire health datasets by themselves (if they want) or use their own research datasets too. The students will be trained and guided to conduct hands-on analysis in a step-by step manner to accomplish descriptive data analysis, variable selection, association analysis/survival analysis, on the provided dataset(s). The students will also be allowed to apply any bioinformatics tools for visualization techniques. Combining R-programming, theoretical introduction along with hands-on analysis, the course equips participants with the skills to effectively analyze public health datasets and make informed, data-driven decisions in their research and practice. The course will for most part be computer based.
- Course ObjectiveUpon completion the students will be able tosuccesfully do the following • Use R to conduct all kinds of data manipulation and data cleaning • Develop statistical thinking and apply statistics in modern public health research and practice • Describe a data set using descriptive statistics and graphical methods as an initial step for more advanced analysis in R software. • Implement suitable methods to formulate and analyze statistical associations between variables in a data set using R. • Interpret the results and provide potential explanations for the findings. Skills that the student will gain: • Data analysis with R • Linear Regression • Logistic regression • Group comparison testing • Survival analysis • Visualization of data • Statistical thinking.
- Course RequirementBiostatistics, Statistics, Basic programming (optional), Data preprocessing (optional), Data acquisition (optional)
- Expected weekly study hours before and/or after class
- Office Hour
by appointment - Designated Reading1. Biostatistics with R, a guide for Medical doctors, Marco Moscarelli, Springer 2. A learning guide to R, Remko Duursma, Jeff Powel, Glenn Stone, Western Sydney University 3. Survival Analysis in Medicine and Genetics, Jialiang Li, Shuangge Ma, Chapman and Hall 4. Working with Data in Public Health, A practical pathway with R, Peng Zhao, Springer
- References1. The Practical Guide to Clinical Research and Publication, Academic Press, Uzuung Yoon 2. Practical Clinical Research Design and Application, A Primer for Physicians, Surgeons, and Clinical Healthcare Professionals, Springer Open, Peter D. Fabricant
- Grading
20% Attendance, Class involvement, class interaction and participation: Evaluations will be done by end of class progress for every week, attendance, and interaction with the teacher.
30% homework: Core capacity A, B, C, E and F will be judged by the home-work assignments each week, which is critical for continuance of analysis in the following week.
25% midterm: the core capacity A, C, E, and F will be evaluated by their written report and Q&A.
25% final analysis and presentation: the core capacity A,B, C, E, and F will be evaluated by dataset analysis report, oral presentation in class and Q&A.
- NTU has not set an upper limit on the percentage of A+ grades.
- NTU uses a letter grade system for assessment. The grade percentage ranges and the single-subject grade conversion table in the NATIONAL TAIWAN UNIVERSITY Regulations Governing Academic Grading are for reference only. Instructors may adjust the percentage ranges according to the grade definitions. For more information, see the Assessment for Learning Section。
- Adjustment methods for students
- Make-up Class Information
- Course Schedule
Week 1 R course day 1 a. Course introduction b. R, R studio, R package installations c. Data Import, data frames, Tibbles Week 2 Application Day 1 Real dataset: Health data a. Data import using R b. Data structure using R c. R object and datatypes d. R vector and matrix e. Recoding categorical data using R f. Missing data using R g. Data cleaning using R Week 3 R course day 2 Exploratory data analysis using R a. R subsetting b. R functions c. R loops d. descriptive statistics using R e. Summary statistics using R f. R base plots Week 4 Application day 2 Health data: Categorical variables a. Data cleaning using R b. Missing data using R c. Data manipulation using R d. Stratification (by categories) using R e. Subsetting (desired columns and rows) using R f. Descriptive statistics, and summary statistics using R g. Visualize categorical data using R: Barplots, piecharts, dotplots Week 5 Application day 3 Health data: Continuous (numeric) variables a. Data cleaning using R b. Missing data removal/imputation using R c. Data manipulation using R d. Subsetting using R e. Descriptive statistics, and summary statistics using R f. Visualize continuous data using R: Histograms, Boxplots Week 6 Review week: mock exercise Write R codes on a practice data: a. Data import b. Data cleaning c. Data structure d. Identify numerical and categorical data e. Create subsets f. Descriptive/summary statistics g. Visualization Week 7 Midterm practical exam Use R to analyze Real health data 1. Compare subdata groups –categorical 2. Compare sub data groups-continuous 3. Descriptive analysis 4. Visualizations Week 8 R course Day 3 a. Regression Using R: logistic, linear, Cox-proportional Hazards b. Parametric tests using R c. Nonparametric tests using R d. ANOVA tests using R to compare more than two groups e. Visualization using R: R base plots, ggplot2: Scatterplots, correlation plots Week 9 Application Day 4 Dataset: Health Data: discrete variables Statistical tests using R a. Create contingency tables b. Fisher exact test, Chi-Square test, Rank tests, Kruskal Wallis test c. One-sided test, two sided test Week 10 Application Day 5 Dataset: Health data - Continuous variables Statistical tests using R a. Normality testing: Shapiro Wilks test b. Equality of Variance test: Bartlett test c. Proportion test, Z-test, T-test, Mann-Whitney test, Wilcoxon rank sum test. d. One sided and two sided tests Week 11 Application Day 6: Dataset: Health dataset a. Linear regression b. Variable selection strategies using R c. Correlation analysis and correlation test using R d. Data fitting and visualization (scatterplots), correlation plots using R Week 12 Application Day 7: Dataset: Health dataset a. Association analysis, Logistic regression b. Variable selection strategies using R c. Correlation analysis and correlation test using R d. Data fitting and visualization (scatterplots), correlation plots using R Week 13 R course day 4: Survival data analysis using R a. Pre-Processing b. Kaplan Meier Analysis using R c. Cox Proportional Hazards regression using R d. Univariate and multivariate analysis using R e. Model performance metrics using R a. Cross-validation using R, R loops Week 14 Application Day 8: Survival data analysis using R a. Kaplan Meier analysis b. Cox-Proportional Hazards regression analysis c. KM plots, Forest plots d. Discriminant analysis e. Cross-validation performance analysis Week 15 Review week: mock exercise Use example datasets to do the following a. Statistical tests using R (discrete and continuous) b. Regression analysis using R c. Survival analysis using R Week 16 Final Practical Exam End to end data analysis (health data or survival data)