Music Query by Video: Cross-modal Music Retrieval

This project is in collaboration with Spotify, mentored by Aparna Kumar.


Problem Statement

  • Input: A short video clip.
  • Ouptut: A list of retrieved song from an existing music database.


Demo Results

Demo 1

  • Query video: The test split of Cowen2017 dataset.
  • Musc Database: The test split of AudioSet Music Mood Subset (354 music excerpts).


  • Each Youtube link includes 30 query videos, each one is presented with the 5 top retrieved music excerpts.
  • The rank and cross-modal distance is displayed.
  • Query videos have various durations, but all music has 10 seconds. So some video frames will end earlier than audio track.

Demo 2

  • Query video: The Instagram top video posts from the keyword: Friends, Food, Gadget, Pets, Activities, Selfies, and Fashion.
  • Musc Database: Spotify popular genres (1195 music excerpts).


  • For each video, only the top retrieved music is presented.

Last Update: June 2019