Section 1: Baseline Model
Dataset: Maestro / Context Length: 32,768 / Segmentation: Default / Cross Attention Mask: None
In this work, we introduced a novel model, PerceiverS, which builds on the Perceiver AR architecture by incorporating Effective Segmentation and a Multi-Scale attention mechanism. The Effective Segmentation approach progressively expands the context segment during training, aligning more closely with autoregressive generation and enabling smooth, coherent generation across ultra-long symbolic music sequences. The Multi-Scale attention mechanism further enhances the model's ability to capture both long-term structural dependencies and short-term expressive details.
To cite this paper, please use the following format:
@misc{yi2024perceiversmultiscaleperceivereffective, title={PerceiverS: A Multi-Scale Perceiver with Effective Segmentation for Long-Term Expressive Symbolic Music Generation}, author={Yungang Yi and Weihua Li and Matthew Kuo and Quan Bai}, year={2024}, eprint={2411.08307}, archivePrefix={arXiv}, primaryClass={cs.AI}, url={https://arxiv.org/abs/2411.08307}, }