espnet2.gan_codec.espnet_model.ESPnetGANCodecModel
About 1 min
espnet2.gan_codec.espnet_model.ESPnetGANCodecModel
class espnet2.gan_codec.espnet_model.ESPnetGANCodecModel(codec: AbsGANCodec)
Bases: AbsGANESPnetModel
ESPnet model for GAN-based neural codec task.
Initialize ESPnetGANCodecModel module.
collect_feats(audio: Tensor, **kwargs) → Dict[str, Tensor]
Calculate features and return them as a dict.
- Parameters:audio (Tensor) – Audio waveform tensor (B, T_wav).
- Returns: Dict of features.
- Return type: Dict[str, Tensor]
decode(codes: Tensor)
Codec Decoding Process.
- Parameters:codes (Tensor) – codec tokens [N_stream, B, T]
- Returns: Generated waveform (B, 1, n_sample)
- Return type: Tensor
decode_continuous(z: Tensor)
Codec Decoding Process without dequntization.
- Parameters:z (Tensor) – continuous codec representation (B, D, T)
- Returns: Generated waveform (B, 1, n_sample)
- Return type: Tensor
encode(audio: Tensor, **kwargs)
Codec Encoding Process.
- Parameters:audio (Tensor) – Audio waveform tensor (B, 1, T_wav) or (B, T_wav) or (T_wav)
- Returns: Generated codecs (N_stream, B, T)
- Return type: Tensor
encode_continuous(audio)
Codec Encoding Process without quantization.
- Parameters:audio (Tensor) – Audio waveform tensor: (B, 1, T_wav) or (B, T_wav) or (T_wav)
- Returns: Generated codes (B, D, T)
- Return type: Tensor
forward(audio: Tensor, forward_generator: bool = True, **kwargs) → Dict[str, Any]
Return generator or discriminator loss with dict format.
- Parameters:
- audio (Tensor) – Audio waveform tensor (B, T_wav).
- forward_generator (bool) – Whether to forward generator.
- kwargs – “utt_id” is among the input.
- Returns:
- loss (Tensor): Loss scalar tensor.
- stats (Dict[str, float]): Statistics to be monitored.
- weight (Tensor): Weight tensor to summarize losses.
- optim_idx (int): Optimizer index (0 for G and 1 for D).
- Return type: Dict[str, Any]
meta_info() → Dict[str, Any]
Return meta information of the codec.