Introduction
Within the huge realm of synthetic intelligence, deep studying has revolutionized quite a few domains, together with pure language processing, laptop imaginative and prescient, and speech recognition. Nonetheless, one fascinating space that has captivated researchers and music fans alike is the era of music utilizing synthetic intelligence algorithms. MusicGen, a state-of-the-art controllable text-to-music mannequin that seamlessly interprets textual prompts into charming musical compositions.
What’s MusicGen?
MusicGen is a outstanding mannequin designed for music era that provides simplicity and controllability. In contrast to present strategies equivalent to MusicLM, MusicGen stands out by eliminating the necessity for a self-supervised semantic illustration. The mannequin employs a single-stage auto-regressive Transformer structure and is educated utilizing a 32kHz EnCodec tokenizer. Notably, MusicGen generates all 4 codebooks in a single cross, setting it other than standard approaches. By introducing a slight delay between the codebooks, the mannequin demonstrates the flexibility to foretell them in parallel, leading to a mere 50 auto-regressive steps per second of audio. This revolutionary method optimizes the effectivity and velocity of the music era course of.
MusicGen is educated on 20k hours of licensed music. Additionally they educated it on the interior dataset of 10K high-quality music tracks, and on the ShutterStock and Pond5 music information.
Pre-requisites:
As per the official MusicGen GitHub repo https://github.com/facebookresearch/audiocraft/tree/main.
- GPU with at the least 16 GB of reminiscence
Obtainable MusicGen fashions
There are 4 pre-trained fashions out there and they’re as follows:
- small: 300M mannequin, Textual content to music solely
- medium: 1.5B mannequin, Textual content to music solely
- melody: 1.5B mannequin, Textual content to music and textual content+melody to music
- massive: 3.3B mannequin, Textual content to music solely
Experiments
Beneath is the output of the Conditional music era utilizing the MusicGen massive mannequin.
Textual content Enter: Jingle bell tune with violin and piano
Output: (Utilizing MusicGen "massive" mannequin)
Beneath is the output of the MusicGen “melody” mannequin. We used the above audio and textual content enter to generate the next audio.
Textual content Enter: Add heavy drums drums and solely drums
Output: (Utilizing MusicGen "melody" mannequin)
How one can setup the MusicGen on Colab
Be sure to are utilizing GPU for sooner inference. It took ~9 minutes to generate 10 seconds of audio utilizing CPU whereas utilizing GPU(T4) it took simply 35 seconds.
- Earlier than beginning make certain torch and torchaudio are put in within the colab.
Set up the audiocraft library from Fb.
!python3 -m pip set up -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraft
Import crucial libraries.
from audiocraft.fashions import musicgen
from audiocraft.utils.pocket book import display_audio
import torchfrom audiocraft.information.audio import audio_write
Load the mannequin
The record of fashions is as follows:
# | mannequin varieties are => small, medium, melody, massive |
# | dimension of fashions are => 300M, 1.5B, 1.5B, 3.3B |
mannequin = musicgen.MusicGen.get_pretrained('massive', machine='cuda')
Set the parameters (Non-obligatory)
mannequin.set_generation_params(length=60) # this can generate 60 seconds of audio.
Conditional Music Technology ( Generate the music by offering textual content. )
mannequin.set_generation_params(length=60)
res = mannequin.generate( [ 'Jingle bell tune with violin and piano' ], progress=True)
# This can present the music controls on the colab
To generate unconditional music
res = mannequin.generate_unconditional( num_samples=1, progress=True)
# this can present the music controls on the screendisplay_audio(res, 16000)
To generate music continuation
To create music continuation we’ll want an audio file. We are going to feed that file to the mannequin and the mannequin will generate and add extra music to it.
from audiocraft.utils.pocket book import display_audio
import torchaudio
path_to_audio = "path-to-audio-file.wav"
description = "Jazz jazz and solely jazz"
# Load audio from a file. Be sure to trim the file whether it is too lengthy!
prompt_waveform, prompt_sr = torchaudio.load( path_to_audio )
prompt_duration = 15
prompt_waveform = prompt_waveform[..., :int(prompt_duration * prompt_sr)]
output = mannequin.generate_continuation(prompt_waveform, prompt_sample_rate=prompt_sr,
descriptions=[ description ], progress=True)
display_audio(output, sample_rate=32000)
To generate melody conditional era
mannequin = musicgen.MusicGen.get_pretrained('melody', machine='cuda')
mannequin.set_generation_params(length=20)
melody_waveform, sr = torchaudio.load("path-to-audio-file.wav")
melody_waveform = melody_waveform.unsqueeze(0).repeat(2, 1, 1)
output = mannequin.generate_with_chroma(
descriptions=['Add heavy drums'], melody_wavs=melody_waveform, melody_sample_rate=sr,progress=True)
display_audio(output, sample_rate=32000)
Write the audio file to the disk.
If you wish to obtain the file from the colab then you will have to jot down the wav file on the disk. Right here is the operate that can write a wav file onto the disk. It should take the mannequin output as a primary enter and the filename as a second enter.
def write_wav(output, file_initials):
strive:
for idx, one_wav in enumerate(output):
audio_write(f'{file_initials}_{idx}', one_wav.cpu(), mannequin.sample_rate, technique="loudness", loudness_compressor=True)
return True
besides Exception as e:
print("error whereas writing the file ", e)
return None
# this can write a file that begins with bollywood
write_wav(res, "audio-file")
Full Implementation (Google colab file hyperlink)
Full implementation of the Meta’s MusicGen library by Pragnakalp Techlabs is given within the colab file. Be happy to discover and create music utilizing it.
Pragnakalp Techlabs | Meta’s MusicGen Implementation
Conclusion
In conclusion, Audiocraft’s MusicGen is a strong and controllable music era mannequin. Wanting forward, Audiocraft holds thrilling future potential for developments in AI-generated music. Whether or not you’re a musician or an AI fanatic, Audiocraft’s MusicGen opens up a world of artistic potentialities.