espnet2.gan_codec.hificodec.hificodec.HiFiCodecGenerator

Less than 1 minute

espnet2.gan_codec.hificodec.hificodec.HiFiCodecGenerator

class espnet2.gan_codec.hificodec.hificodec.HiFiCodecGenerator(sample_rate: int = 16000, hidden_dim: int = 128, resblock_num: str = '1', resblock_kernel_sizes: List[int] = [3, 7, 11], resblock_dilation_sizes: List[List[int]] = [[1, 3, 5], [1, 3, 5], [1, 3, 5]], upsample_rates: List[int] = [8, 5, 4, 2], upsample_kernel_sizes: List[int] = [16, 11, 8, 4], upsample_initial_channel: int = 512, quantizer_n_q: int = 8, quantizer_bins: int = 1024, quantizer_decay: float = 0.99, quantizer_kmeans_init: bool = True, quantizer_kmeans_iters: int = 50, quantizer_threshold_ema_dead_code: int = 2, quantizer_target_bandwidth: List[float] = [7.5, 15])

Bases: Module

HiFiCodec generator module.

Initialize HiFiCodec Generator.

Parameters:TODO

decode(codes: Tensor)

HiFiCodec codec decoding.

Parameters:codecs (torch.Tensor) – neural codecs in shape ().
Returns: resynthesized audio.
Return type: torch.Tensor

encode(x: Tensor, target_bw: float | None = None)

HiFiCodec codec encoding.

Parameters:x (torch.Tensor) – Input tensor of shape (B, 1, T).
Returns: neural codecs in shape ().
Return type: torch.Tensor

forward(x: Tensor, use_dual_decoder: bool = False)

HiFiCodec forward propagation.

Parameters:
- x (torch.Tensor) – Input tensor of shape (B, 1, T).
- use_dual_decoder (bool) – Whether to use dual decoder for encoder out
Returns: resynthesized audio. torch.Tensor: commitment loss. torch.Tensor: quantization loss torch.Tensor: resynthesized audio from encoder.
Return type: torch.Tensor