Skeleton Plays Piano: Online Generation of Pianist Body Movements from MIDI Performance

Bochen Li, Akira Maezawa, and Zhiyao Duan

This project is in collaboration with the Yamaha Corporation. This project is partially supported by the National Science Foundation under grant No. 1741472, titled "BIGDATA: F: Audio-Visual Scene Understanding".
Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.


What is the problem?

We aim to train a system to generate a virtual pianist animation with expressive performance motions given a symbolic music in MIDI format.



What is our approach?

We first use two CNN structures to parse the raw input of the MIDI note stream and the metric structure, and then feed the extracted feature representations to an LSTM network to generate the body movements, as a sequence of upper-body joint coordinates forming a skeleton.

Our Results

Subjective Evaluations

We conduct subjective evaluations to rate the expressiveness and naturalness of the generated skeleton movements compared with the ones extracted from real human players. More specifically, we recruit 18 subjects from Yamaha company to watch 32 10-sec video excerpts of "skeleton plays piano", 16 from the generated ones, and 16 from the real ones. The rating result is plotted in the following figure, where the tracks with significant different ratings are marked with "*".

Demo Videos

All the generated skeleton movements (compared with real human) for the 16 tracks are listed here:

Visit the YouTube playlist for the above 16 demo videos <here>

Visit the YouTube playlist for demo videos without comparing with real human <here>


Bochen Li, Akira Maezawa, and Zhiyao Duan, Skeleton plays piano: online generation of pianist body movements from MIDI performance, in Proc. International Society for Music Information Retrieval (ISMIR), 2018. <pdf>