A Multimodal Text-To-MIDI Transformer Model with Special Consideration To Artist Usage

Miles, Samuel Alan

Miles, Samuel Alan

View/Open

A Multimodal Text-To-MIDI Transformer Model with Special Consideration To Artist Usage / jamformer (11.16Mb)

A Multimodal Text-To-MIDI Transformer Model with Special Consideration To Artist Usage (621.2Kb)

Date

2023

Metadata

[+] Show full item record

Abstract

Generative AI systems are of great interest in the field of automated music production. Current well-known state-of-the-art music generation systems such as MusicLM, while incredibly versatile and tractable, do not allow for direct control of tone or texture of the generated instruments other than by altering the text prompt, which also alters the structure of the generated composition. Therefore, symbolic music generation would still be of great use to musicians. Using current state-of-the-art advancements in Deep Learning and an off-the-shelf MIDI dataset and pretrained English encoder, this work proposes a novel sequence-to-sequence music generation model that converts written descriptions of the style and artistic themes of a song into coherent MIDI musical representations. The architecture and synthetic dataset used in training this model were constructed with special consideration to the needs of musicians, namely, a native musical format, lack of association between any particular artist and musical style, and the possibility of live usage with a human accompaniment.

Introduction -- Background and related work -- Methodology -- Results and discussion -- Conclusion

URI

https://hdl.handle.net/10355/96469

Degree

M.S. (Master of Science)

Thesis Department

Electrical and Computer Engineering (UMKC)