espnet2.gan_codec.funcodec.funcodec.FunCodecGenerator
espnet2.gan_codec.funcodec.funcodec.FunCodecGenerator
class espnet2.gan_codec.funcodec.funcodec.FunCodecGenerator(sample_rate: int = 24000, hidden_dim: int = 128, codebook_dim: int = 8, encdec_channels: int = 1, encdec_n_filters: int = 32, encdec_n_residual_layers: int = 1, encdec_ratios: List[Tuple[int, int]] = [(4, 1), (4, 1), (4, 2), (4, 1)], encdec_activation: str = 'ELU', encdec_activation_params: Dict[str, Any] = {'alpha': 1.0}, encdec_norm: str = 'weight_norm', encdec_norm_params: Dict[str, Any] = {}, encdec_kernel_size: int = 7, encdec_residual_kernel_size: int = 7, encdec_last_kernel_size: int = 7, encdec_dilation_base: int = 2, encdec_causal: bool = False, encdec_pad_mode: str = 'reflect', encdec_true_skip: bool = False, encdec_compress: int = 2, encdec_lstm: int = 2, decoder_trim_right_ratio: float = 1.0, decoder_final_activation: str | None = None, decoder_final_activation_params: dict | None = None, quantizer_n_q: int = 8, quantizer_bins: int = 1024, quantizer_decay: float = 0.99, quantizer_kmeans_init: bool = True, quantizer_kmeans_iters: int = 50, quantizer_threshold_ema_dead_code: int = 2, quantizer_target_bandwidth: List[float] = [7.5, 15], quantizer_dropout: bool = True, codec_domain: List = ('time', 'time'), domain_conf: Dict | None = {}, audio_normalize: bool = False)
Bases: Module
FunCodec generator module.
Initialize FunCodec Generator.
- Parameters:TODO (jiatong)
decode(codes: Tensor)
FunCodec codec decoding.
- Parameters:codecs (torch.Tensor) – neural codecs in shape ().
- Returns: resynthesized audio.
- Return type: torch.Tensor
encode(x: Tensor, target_bw: float | None = None)
FunCodec codec encoding.
- Parameters:x (torch.Tensor) – Input tensor of shape (B, 1, T).
- Returns: neural codecs in shape ().
- Return type: torch.Tensor
forward(x: Tensor, use_dual_decoder: bool = False)
FunCodec forward propagation.
- Parameters:
- x (torch.Tensor) – Input tensor of shape (B, 1, T).
- use_dual_decoder (bool) – Whether to use dual decoder for encoder out
- Returns: resynthesized audio. torch.Tensor: commitment loss. torch.Tensor: quantization loss torch.Tensor: resynthesized audio from encoder.
- Return type: torch.Tensor
freq_to_time_transfer(x: Tensor, scale: Tensor | None = None)
time_to_freq_transfer(x: Tensor)