Tailed U-Net: Multi-Scale Music Representation Learning
| Authors | |
|---|---|
| Publication date | 2022 |
| Host editors |
|
| Book title | Proceedings of the 23rd International Society for Music Information Retrieval Conference |
| Book subtitle | Bengaluru, India, December 04-08, 2022 |
| ISBN (electronic) |
|
| Event | 23rd International Society for Music Information Retrieval Conference |
| Pages (from-to) | 67-75 |
| Number of pages | 9 |
| Publisher | ISMIR |
| Organisations |
|
| Abstract |
Self-supervised learning has steadily been gaining traction in recent years. In music information retrieval (MIR), one promising recent application of self-supervised learning is the CLMR framework (contrastive learning of musical representations). CLMR has shown good performance, achieving results on par with state-of-the-art end-to-end classification models, but it is strictly an encoding framework. It suffers the characteristic limitation of any encoder that it cannot explicitly combine multi-timescale information, whereas a characteristic feature of human audio perception is that we tend to perceive all frequencies simultaneously. To this end, we propose a generalization of CLMR that learns to extract and explicitly combine representations across different frequency resolutions, which we coin the tailed U-Net (TUNe). TUNe architectures combine multi-timescale information during a decoding phase, similar to U-Net architectures used in computer vision and source separation, but have a tail added to reduce sample-level information to a smaller pre-defined number of representation dimensions. The size of the decoding phase is a hyperparameter, and in the case of a zero-layer decoding phase, TUNe reduces to CLMR. The best TUNe architectures, however, require less training time to match CLMR performance, have superior transfer learning performance, and are competitive with state-of-the-art models even at dramatically reduced dimensionalities.
|
| Document type | Conference contribution |
| Language | English |
| Published at | https://doi.org/10.5281/zenodo.7316596 |
| Other links | https://ismir2022program.ismir.net/poster_109.html https://www.ismir.net/conferences/ismir2022.html |
| Downloads |
000007
(Final published version)
|
| Permalink to this page | |
