Meta发布AIGC生成式人工智能模型来生成音乐与音效——AudioCraft

2023-10-15 21:24:07 AIGC ℃

后台-插件-广告管理-内容页头部广告（手机）

近年来，包括语言模型在内的生成式人工智能模型取得了巨大进步，特别是ChatGPT的发布，让大家看到了大语言模型的魅力。无论是计算机视觉，还是nlp领域的文本描述生成各种图像和视频，到执行机器翻译，文本生成等等大模型上，其都取得了令人意想不到的发展。但音乐与音频上似乎总是有点落后。是否可以使用人工智能技术来合成不同的音乐或者音效？

AudioCraft 包含三个模型：MusicGen、AudioGen和EnCodec。

MusicGen：使用 Meta 拥有且专门授权的音乐进行训练，根据用户输入的文本生成音乐。

AudioGen 使用公共音效进行训练，根据用户输入的文本生成音频音效。

EnCodec 解码器，它可以用更少的音损生成更高质量的音乐，类似音频压缩技术。EnCodec 是一种有损神经编解码器，经过专门训练，可以压缩任何类型的音频并以高保真度重建原始信号。

AudioCraft 系列模型能够产生具有长期一致性的高质量音频，并且可以通过UI界面轻松交互。通过 AudioCraft，简化了音频生成模型的整体设计，我们可以直接利用开源代码进行音乐的生成。

%cd /content!git clone https://github.com/facebookresearch/audiocraft%cd /content/audiocraft!pip install -r requirements.txt!python -m demos.musicgen_app --share

我们可以直接使用以上代码生成一个可视化的UI界面，我们只需要在输入框中，输入相应的文本，就可以利用模型生成音乐了。

为了方便开发者使用AudioCraft，模型已经开源，且我们可以直接使用开源的代码进行音乐的合成。

!python3 -m pip install -U git+https://github.com/facebookresearch/audiocraft#egg=audiocraftfrom audiocraft.models import musicgenfrom audiocraft.utils.notebook import display_audioimport torchmodel = musicgen.MusicGen.get_pretrained('medium', device='cuda')model.set_generation_params(duration=8)

首先我们需要使用pip 来安装AudioCraft，并从AudioCraft的models导入musicgen音乐生成函数。

这里我们使用musicgen.MusicGen.get_pretrained来加载模型的预训练模型，函数运行到此步后，会自动搜索项目文件夹中是否有模型，并自动进行下载。

Downloading state_dict.bin: 100% 3.68G/3.68G [03:42<00:00, 19.4MB/s]Downloading (…)ve/main/spiece.model: 100% 792k/792k [00:00<00:00, 10.5MB/s]Downloading (…)lve/main/config.json: 100% 1.21k/1.21k [00:00<00:00, 45.8kB/s]Downloading model.safetensors: 100% 892M/892M [00:10<00:00, 46.2MB/s]Downloading (…)ssion_state_dict.bin: 100% 236M/236M [03:45<00:00, 1.05MB/s]res = model.generate([    'crazy EDM, heavy bang',     'classic reggae track with an electronic guitar solo',    'lofi slow bpm electro chill with organic samples',    'rock with saturated guitars, a heavy bass line and crazy drum break and fills.',    'earthy tones, environmentally conscious, ukulele-infused, harmonic, breezy, easygoing, organic instrumentation, gentle grooves',],     progress=True)display_audio(res, 32000)

模型下载完成后，我们就可以使用model.generate函数来生成音乐了，这里可以一次输入多个文本，模型会自动根据输入的文本，生成多个音频文件，最后，我们可以display或者下载生成好的音乐文件。

Text Prompt: Pop dance track with catchy melodies, tropical percussions, and upbeat rhythms, perfect for the beach

当然此模型已经发布在hugging face的transformers库中，我们也可以直接使用transformers库来运行此代码。

pip install git+https://github.com/huggingface/transformers.gitfrom transformers import AutoProcessor, MusicgenForConditionalGenerationprocessor = AutoProcessor.from_pretrained("facebook/musicgen-small")model = MusicgenForConditionalGeneration.from_pretrained("facebook/musicgen-small")inputs = processor(    text=["80s pop track with bassy drums and synth", "90s rock song with loud guitars and heavy drums"],    padding=True,    return_tensors="pt",)audio_values = model.generate(**inputs, max_new_tokens=256)

当然，这里我们不需要安装AudioCraft，而是安装transformers库，然后从transformers库中导入相关的AudioCraft应用。然后也是加载相关的模型文件，并输入需要生成的音乐文本，最后就可以使用model.generate函数来生成音乐文件了。

from IPython.display import Audiosampling_rate = model.config.audio_encoder.sampling_rateAudio(audio_values[0].numpy(), rate=sampling_rate)import scipysampling_rate = model.config.audio_encoder.sampling_ratescipy.io.wavfile.write("musicgen_out.wav", rate=sampling_rate, data=audio_values[0, 0].numpy())

生成好的音乐文件，我们可以使用以上函数进行播放或者进行存储，方便后期进行处理操作。当然以上的代码都是MusicGen音乐生成的代码实现，其他AudioGen和EnCodec的代码实现过程，可以参考GitHub源码。

https://github.com/facebookresearch/audiocraft

后台-插件-广告管理-内容页尾部广告（手机）

标签：

上一篇：五度转型失败的鸿博股份“傍上”英伟达

下一篇：AI人才争夺战：平均年薪40万，互联网大厂成加分项

人工智能物联网_17aiot.com

Meta发布AIGC生成式人工智能模型来生成音乐与音效——AudioCraft

评论留言

我要留言

Meta发布AIGC生成式人工智能模型来生成音乐与音效——AudioCraft

相关推荐

评论留言

我要留言