espnet2.gan_codec.shared.decoder.seanet.SEANetDecoder

About 1 min

espnet2.gan_codec.shared.decoder.seanet.SEANetDecoder

class espnet2.gan_codec.shared.decoder.seanet.SEANetDecoder(channels: int = 1, dimension: int = 128, n_filters: int = 32, n_residual_layers: int = 1, ratios: List[int] = [8, 5, 4, 2], activation: str = 'ELU', activation_params: dict = {'alpha': 1.0}, final_activation: str | None = None, final_activation_params: dict | None = None, norm: str = 'weight_norm', norm_params: Dict[str, Any] = {}, kernel_size: int = 7, last_kernel_size: int = 7, residual_kernel_size: int = 3, dilation_base: int = 2, causal: bool = False, pad_mode: str = 'reflect', true_skip: bool = False, compress: int = 2, lstm: int = 2, trim_right_ratio: float = 1.0)

Bases: Module

SEANet decoder.

Parameters:
- channels (int) – Audio channels.
- dimension (int) – Intermediate representation dimension.
- n_filters (int) – Base width for the model.
- n_residual_layers (int) – nb of residual layers.
- ratios (Sequence *[*int ]) – kernel size and stride ratios
- activation (str) – Activation function.
- activation_params (dict) – Parameters to provide to the activation function
- final_activation (str) – Final activation function after all convolutions.
- final_activation_params (dict) – Parameters to provide to the activation function
- norm (str) – Normalization method.
- norm_params (dict) – Parameters to provide to the underlying normalization used along with the convolution.
- kernel_size (int) – Kernel size for the initial convolution.
- last_kernel_size (int) – Kernel size for the initial convolution.
- residual_kernel_size (int) – Kernel size for the residual layers.
- dilation_base (int) – How much to increase the dilation with each layer.
- causal (bool) – Whether to use fully causal convolution.
- pad_mode (str) – Padding mode for the convolutions.
- true_skip (bool) – Whether to use true skip connection or a simple (streamable) convolution as the skip connection in the residual network blocks.
- compress (int) – Reduced dimensionality in residual branches (from Demucs v3).
- lstm (int) – Number of LSTM layers at the end of the encoder.
- trim_right_ratio (float) – Ratio for trimming at the right of the transposed convolution under the causal setup. If equal to 1.0, it means that all the trimming is done at the right.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(z)

Defines the computation performed at every call.

Should be overridden by all subclasses.

NOTE

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.