深度學習於音樂分析及生成

113-1 開課
  • 流水號

    32271

  • 課號

    CommE5070

  • 課程識別碼

    942 U0840

  • 無分班

  • 3 學分
  • 選修

    電機工程學研究所 / 電信工程學研究所

      選修
    • 電機工程學研究所

    • 電信工程學研究所

  • 楊奕軒
  • 四 6, 7, 8
  • 請洽系所辦

  • 2 類加選

  • 修課總人數 60 人

    本校 60 人

  • 無領域專長

  • 中文授課
  • NTU COOL
  • 核心能力與課程規劃關聯圖
  • 備註
    上課地點:學新118
  • 本校選課狀況

    載入中
  • 課程概述
    “Music Information Research” (MIR) is an interdisciplinary research field that concerns with the analysis, retrieval, processing, and generation of musical content or information. Researchers involved in MIR may have a background in signal processing, machine learning, information retrieval, human-computer interaction, musicology, psychoacoustics, psychology, or some combination of these. In this course, we are mainly interested in the application of machine learning, in particular deep learning, to address music related problems. Specifically, the course is divided to two parts: analysis and generation. The first part is about the analysis of musical audio signals, covering topics such as feature extraction and representation learning for musical audio, music audio classification, melody extraction, automatic music transcription, and musical source separation. The second part is about the generation of musical material, including symbolic-domain MIDI or tablatures, and audio-domain music signals such as singing voices and instrumental music. This would involve deep generative models such as generative adversarial networks (GANs), variational autoencoders (VAE), Transformers, and diffusion models. Here is a tentative schedule of the course: W1. Introduction to the course W2. Fundamentals & Music representation W3. Analysis I (timbre): Automatic music classification and representation learning (HW1: to be announced) W4. Generation I: Source separation W5. Generation II: GAN & Vocoders W6. Generation III: Synthesis of notes and loops (HW2: to be announced) W7. Analysis II (pitch): Music transcription, Melody extraction, and Chord Recognition W8. Generation IV: Symbolic MIDI generation W9. Generation V: Symbolic MIDI generation: Advanced Topics (HW3: to be announced) W10. Generation VI: Singing voice generation W11. Generation VII: Text-to-music generation W12. Proposal of ideas of final projects W13. Generation VIII: Differentiable DSP models and automatic mixing W14. Miscellaneous Topics W15. Break W16. Oral presentation of final projects
  • 課程目標
    1. Understanding of different aspects of music: timbre, rhythm, pitch, harmony, and structure, and the use of domain knowledge for corresponding music signal analysis tasks. 2. Understanding of and hands-on experiences with deep learning techniques to music audio signal analysis 3. Understanding of and hands-on experiences with deep generative models for both musical audio and text-like music data such as MIDI 4. A taste of the fun of research
  • 課程要求
    I would assume that students taking this course to * have good background in machine learning and mathematics (e.g., have taken courses such as Machine Learning, Deep Learning, Signals and Systems, Digital Signal Processing, Linear Algebra, Probability and Statistics) * have good coding experience in python and a deep learning framework such as PyTorch * have great interest in music
  • 預期每週課後學習時數
  • Office Hour
  • 指定閱讀
  • 參考書目
    Jakub M. Tomcza, Deep Generative Modeling. 978-3-030-93158-2. Springer, 2022.
  • 評量方式
    60%

    Homeworks

    3-4 coding assignments related to building ML/DL models

    40%

    Final project

    For teams of 2 or 3, oral presentation + written report

  • 針對學生困難提供學生調整方式
    調整方式說明
    上課形式

    提供學生彈性出席課程方式

  • 課程進度