espnet2.beats.audio_tokenizer.AudioTokenizer

Less than 1 minute

espnet2.beats.audio_tokenizer.AudioTokenizer

class espnet2.beats.audio_tokenizer.AudioTokenizer(codec_choice: str, codec_fs: int, device: str = 'cpu', dump_audio: bool = False, checkpoint_path: str | None = None, config_path: str | None = None, max_token_per_frame: int = 32, waveform_input: bool = True)

Bases: AbsTokenizer

Codec Tokenizer implementation

Use cases: : - use encode for discrete (de)tokenization

Codec Tokenizer initialization

Each of the codec implementation should assign all following features: : self.n_codebook (int): the number of codec codebooks. self.size_codebook (int): the dimension of codebooks. self.sample_rate (int): the sample rate the model trained on. self.subsample (int): the subsample rate, a.k.a., frame shift.

encode(wavs, wav_lens=None)

Convert audio waveforms into codec codes.

Input: : wavs (torch.Tensor): float tensor in shape [B, n_sample, D], wav_lens (torch.Tensor): int tensor in shape [B]

Output: : codes (torch.Tensor): Int tensor in shape [B, T, n_codebook]

forward(wavs, wav_lens=None)

Convert audio waveforms into flatten codec codes and resynthesis the audio.

Input: : wavs (torch.Tensor): float tensor in shape [B, n_sample, D], wav_lens (torch.Tensor): int tensor in shape [B]

Output: : codes (torch.Tensor): Int tensor in shape [B, T * n_codebook], code_lengths (torch.Tensor): Int tensor in shape [B]