espnet2.gan_codec.espnet_model.ESPnetGANCodecModel

About 1 min

class espnet2.gan_codec.espnet_model.ESPnetGANCodecModel(codec: AbsGANCodec)

ESPnet model for GAN-based neural codec task.

Initialize ESPnetGANCodecModel module.

collect_feats(audio: Tensor, **kwargs) → Dict[str, Tensor]

Calculate features and return them as a dict.

decode(codes: Tensor)

Codec Decoding Process.

decode_continuous(z: Tensor)

Codec Decoding Process without dequntization.

encode(audio: Tensor, **kwargs)

Codec Encoding Process.

Parameters:audio (Tensor) – Audio waveform tensor (B, 1, T_wav) or (B, T_wav) or (T_wav)
Returns: Generated codecs (N_stream, B, T)
Return type: Tensor

encode_continuous(audio)

Codec Encoding Process without quantization.

Parameters:audio (Tensor) – Audio waveform tensor: (B, 1, T_wav) or (B, T_wav) or (T_wav)
Returns: Generated codes (B, D, T)
Return type: Tensor

forward(audio: Tensor, forward_generator: bool = True, **kwargs) → Dict[str, Any]

Return generator or discriminator loss with dict format.

Parameters:
- audio (Tensor) – Audio waveform tensor (B, T_wav).
- forward_generator (bool) – Whether to forward generator.
- kwargs – “utt_id” is among the input.
Returns:
- loss (Tensor): Loss scalar tensor.
- stats (Dict[str, float]): Statistics to be monitored.
- weight (Tensor): Weight tensor to summarize losses.
- optim_idx (int): Optimizer index (0 for G and 1 for D).
Return type: Dict[str, Any]

meta_info() → Dict[str, Any]

Return meta information of the codec.