espnet2.beats.audio_tokenizer.AudioTokenizer
espnet2.beats.audio_tokenizer.AudioTokenizer
class espnet2.beats.audio_tokenizer.AudioTokenizer(codec_choice: str, codec_fs: int, device: str = 'cpu', dump_audio: bool = False, checkpoint_path: str | None = None, config_path: str | None = None, max_token_per_frame: int = 32, waveform_input: bool = True)
Bases: AbsTokenizer
Codec Tokenizer implementation
Use cases: : - use encode for discrete (de)tokenization
Codec Tokenizer initialization
Each of the codec implementation should assign all following features: : self.n_codebook (int): the number of codec codebooks. self.size_codebook (int): the dimension of codebooks. self.sample_rate (int): the sample rate the model trained on. self.subsample (int): the subsample rate, a.k.a., frame shift.
encode(wavs, wav_lens=None)
Convert audio waveforms into codec codes.
Input: : wavs (torch.Tensor): float tensor in shape [B, n_sample, D], wav_lens (torch.Tensor): int tensor in shape [B]
Output: : codes (torch.Tensor): Int tensor in shape [B, T, n_codebook]
forward(wavs, wav_lens=None)
Convert audio waveforms into flatten codec codes and resynthesis the audio.
Input: : wavs (torch.Tensor): float tensor in shape [B, n_sample, D], wav_lens (torch.Tensor): int tensor in shape [B]
Output: : codes (torch.Tensor): Int tensor in shape [B, T * n_codebook], code_lengths (torch.Tensor): Int tensor in shape [B]
