Computer audition is the study of how to design a computational system that can analyze and process auditory scenes. Problems in this field include source separation (splitting audio mixtures into individual source tracks), pitch estimation (estimating the pitches played by each instrument), streaming (finding which sounds belong to a single event/source), source localization (finding where the sound comes from) and source identification (labeling a sound source).
This course is a graduate course (cross listed for senior undergrads) covering current research in the field. The class starts with a brief review of signal processing techniques, then introduces auditory models, audio features, and audio modeling methods. Recent advances in state-of-the-art research topics including multi-pitch analysis, source separation, source localization, instrument identification then follow.
In the first half of the semester, students will complete four homework assignments (Matlab programming) that cover the basics. Students are also required to read ten recently published papers in the field and write reviews about them. In the second half of the semester, each student will present a research paper in class. Students will also complete a final project, including selecting a topic, read several related papers, proposing and implementing their ideas, and writing a report. Students’ presentations and final project reports will be uploaded to the course website for others to look at. Students will give feedbacks to other students’ presentations and final projects.
Lectures: 12:30-1:45PM on Tuesdays and Thursdays
Classroom: CSB 601
Prerequisites: ECE 246/446 or ECE 272/472 or other equivalent signal processing courses, and Matlab programming. Knowledge of machine learning techniques such as Markov models, support vector machines, and neural networks is also helpful, but not required.
Textbook: No textbook is required. We will read a number of research papers in the field. The following texts are for references and have been put on reserve at UR library. Some excerpts of them will be provided to students.
- Meinard Mueller, Fundamentals of Music Processing: Audio, Analysis, Algorithms, Applications. Springer, 2015.
- Albert S. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, 1990.
- DeLiang Wang, and Guy J. Brown, editors. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. IEEE Press / Wiley-Interscience, 2006.
- Anssi Klapuri, and Manuel Davy, editors. Signal Processing Methods for Music Transcription. Springer, 2006.
- Theodoros Giannakopoulos, and Aggelos Pikrakis, Introduction to Audio Analysis: A MATLAB Approach. Academic Press, 2014. Electronic version available for UR students here.
Instructor: Zhiyao Duan
Office: CSB 720
Email: zhiyao.duan (at) rochester.edu
Office hour: Wednesdays 3-5 PM
TAs and Office Hours:
Christos Benetatos <cbenetat (at) ur.rochester.edu>, Thursdays 3-5 PM in CSB 527
Ge Zhu < email@example.com>, Fridays 3-5 PM in CSB 527
Sheng Xu < firstname.lastname@example.org>, Fridays 10-12 PM in CSB 527