espnet2.gan_codec.soundstream.soundstream.SoundStreamGenerator
espnet2.gan_codec.soundstream.soundstream.SoundStreamGenerator
class espnet2.gan_codec.soundstream.soundstream.SoundStreamGenerator(sample_rate: int = 24000, hidden_dim: int = 128, encdec_channels: int = 1, encdec_n_filters: int = 32, encdec_n_residual_layers: int = 1, encdec_ratios: List[int] = [8, 5, 4, 2], encdec_activation: str = 'ELU', encdec_activation_params: Dict[str, Any] = {'alpha': 1.0}, encdec_norm: str = 'weight_norm', encdec_norm_params: Dict[str, Any] = {}, encdec_kernel_size: int = 7, encdec_residual_kernel_size: int = 7, encdec_last_kernel_size: int = 7, encdec_dilation_base: int = 2, encdec_causal: bool = False, encdec_pad_mode: str = 'reflect', encdec_true_skip: bool = False, encdec_compress: int = 2, encdec_lstm: int = 2, decoder_trim_right_ratio: float = 1.0, decoder_final_activation: str | None = None, decoder_final_activation_params: dict | None = None, quantizer_n_q: int = 8, quantizer_bins: int = 1024, quantizer_decay: float = 0.99, quantizer_kmeans_init: bool = True, quantizer_kmeans_iters: int = 50, quantizer_threshold_ema_dead_code: int = 2, quantizer_target_bandwidth: List[float] = [7.5, 15])
Bases: Module
SoundStream generator module.
Initialize SoundStream Generator.
- Parameters:TODO (jiatong)
decode(codes: Tensor)
Soundstream codec decoding.
- Parameters:codecs (torch.Tensor) – neural codecs in shape ().
- Returns: resynthesized audio.
- Return type: torch.Tensor
encode(x: Tensor, target_bw: float | None = None)
Soundstream codec encoding.
- Parameters:x (torch.Tensor) – Input tensor of shape (B, 1, T).
- Returns: neural codecs in shape ().
- Return type: torch.Tensor
forward(x: Tensor, use_dual_decoder: bool = False)
Soundstream forward propagation.
- Parameters:
- x (torch.Tensor) – Input tensor of shape (B, 1, T).
- use_dual_decoder (bool) – Whether to use dual decoder for encoder out
- Returns: resynthesized audio. torch.Tensor: commitment loss. torch.Tensor: quantization loss torch.Tensor: resynthesized audio from encoder.
- Return type: torch.Tensor