espnet2.gan_codec.hificodec.hificodec.HiFiCodecGenerator
espnet2.gan_codec.hificodec.hificodec.HiFiCodecGenerator
class espnet2.gan_codec.hificodec.hificodec.HiFiCodecGenerator(sample_rate: int = 16000, hidden_dim: int = 128, resblock_num: str = '1', resblock_kernel_sizes: List[int] = [3, 7, 11], resblock_dilation_sizes: List[List[int]] = [[1, 3, 5], [1, 3, 5], [1, 3, 5]], upsample_rates: List[int] = [8, 5, 4, 2], upsample_kernel_sizes: List[int] = [16, 11, 8, 4], upsample_initial_channel: int = 512, quantizer_n_q: int = 8, quantizer_bins: int = 1024, quantizer_decay: float = 0.99, quantizer_kmeans_init: bool = True, quantizer_kmeans_iters: int = 50, quantizer_threshold_ema_dead_code: int = 2, quantizer_target_bandwidth: List[float] = [7.5, 15])
Bases: Module
HiFiCodec generator module.
Initialize HiFiCodec Generator. :param TODO:
decode(codes: Tensor)
HiFiCodec codec decoding.
- Parameters:codecs (torch.Tensor) – neural codecs in shape ().
- Returns: resynthesized audio.
- Return type: torch.Tensor
encode(x: Tensor, target_bw: float | None = None)
HiFiCodec codec encoding.
- Parameters:x (torch.Tensor) – Input tensor of shape (B, 1, T).
- Returns: neural codecs in shape ().
- Return type: torch.Tensor
forward(x: Tensor, use_dual_decoder: bool = False)
HiFiCodec forward propagation.
- Parameters:
- x (torch.Tensor) – Input tensor of shape (B, 1, T).
- use_dual_decoder (bool) – Whether to use dual decoder for encoder out
- Returns: resynthesized audio. torch.Tensor: commitment loss. torch.Tensor: quantization loss torch.Tensor: resynthesized audio from encoder.
- Return type: torch.Tensor