NTU Course
NewsHelpOverview

Cloud Computing for High Dimensional Data

Offered in 113-1Updated
  • Serial Number

    38061

  • Course Number

    IMPS5010

  • Course Identifier

    H41 U0120

  • Class 01
  • 3 Credits
  • Elective

    Master Program in Statistics of National Taiwan University

      Elective
    • Master Program in Statistics of National Taiwan University

  • CHEN, YAN-BIN
  • Tue 7, 8, 9
  • 新401

  • Type 2

  • 15 Student Quota

    NTU 15

  • No Specialization Program

  • English
  • NTU COOL
  • Core Capabilities and Curriculum Planning
  • Notes
    The course is conducted in English。
  • NTU Enrollment Status

    Enrolled
    0/15
    Other Depts
    0/0
    Remaining
    0
    Registered
    0
  • Course Description
    == Fall 2024 == This course offers practical training in data science, focusing on high-dimensional data computing and dimension reduction algorithms. The characteristics of this course are the hands-on experience with high-performance computers and the observation of real data from a statistical perspective. Practical exercises will be conducted on high performance GPU servers on the cloud, possibly utilizing resources such as the NVIDIA V100 from our NTU or Google Colab. In addition to the hands-on exercises, statistical theories related to dimension reduction algorithms, data visualization, and data interpretation will be introduced. The Python programming skills will be taught during the first month as part of a combined and quick recap course. The course is taught in English, but bilingual Q&A sessions are acceptable. Teaching methods in each week: 50 mins: Lecture. 90 mins: Students engage in hands-on exercises and paper presentations. 10 mins: Conclusion of hands-on exercises and fundamental knowledge. *** Notice *** Kindly notice that there is no need to send me an email for course enrollment. If you would like to take the course but were unable to successfully enroll, please come to class in the first week. We may deliver the authorization codes. The unsuccessful enrollment status will be announced after the preliminary course selection on August 29th.
  • Course Objective
    The students will learn the inherent characteristics of high-dimensional data and dimension reduction techniques. Additionally, they will gain hands-on experience in operating and accessing high-dimensional data on high-performance GPU servers. Students will be expected to complete projects that involve preprocessing, computing, and operating high-dimensional data on the high-performance GPU servers.
  • Course Requirement
    1. The students should have programming skills (very basic level) in Python before taking. 2. The students should take along with their laptops in the class session.
  • Expected weekly study hours before and/or after class
    3 hours
  • Office Hour
    *This office hour requires an appointment
  • Designated Reading
    Month 1: Book1, Chapter 3,5,9 Month 2: Book2, Chapter 1,2 Month 3: Book2, Chapter 5,6 Month 4: Paper study
  • References
    Book 1: Python for Data Analysis, 3E --- Data Wrangling with Pandas, NumPy, and Jupyter, 2022 By Wes McKinney Book 2: Nonlinear Dimensionality Reduction Techniques -- A Data Structure Preservation Approach, 2021 By Sylvain Lespinats, Benoit Colange, Denys Dutykh
  • Grading
    10%

    In class

    Exercise in class session

    40%

    Midterm

    Paper presentation

    50%

    Final

    Final project

  • Adjustment methods for students
    Adjustment MethodDescription
    A3

    提供學生彈性出席課程方式

    Provide students with flexible ways of attending courses

    B6

    學生與授課老師協議改以其他形式呈現

    Mutual agreement to present in other ways between students and instructors

    C2

    書面(口頭)報告取代考試

    Written (oral) reports replace exams

  • Make-up Class Information
  • Course Schedule
    9/03Week 1Introduction
    9/10Week 2[Part1: A Quick Recap of Python] Python Environment Setup
    9/17Week 3Public holiday
    9/24Week 4Data Structures and Functions Pandas
    10/01Week 5Plot and Visualization
    10/08Week 6[Part2: Dimensionality Reduction Techniques] Similarity Measure and Distance Function
    10/15Week 7Nearest Neighbors in Scikit-learn
    10/22Week 8Machine Learning for Artificial Intelligence
    10/29Week 9Supervised Learning
    11/05Week 10Unsupervised Dimensionality Reduction: PCA, t-SNE
    11/12Week 11Deep Learning: CNN
    11/19Week 12Natural Language Processing: NLTK
    11/26Week 13Research Issue: Feature Representation Learning
    12/03Week 14Final Project Presentation I
    12/10Week 15Final Project Presentation II
    12/17Week 16Real Case Study and Discussion