espnet2.gan_codec.shared.decoder.seanet_2d.SEANetDecoder2d
espnet2.gan_codec.shared.decoder.seanet_2d.SEANetDecoder2d
class espnet2.gan_codec.shared.decoder.seanet_2d.SEANetDecoder2d(channels: int = 1, dimension: int = 128, n_filters: int = 32, n_residual_layers: int = 1, ratios: List[Tuple[int, int]] = [(4, 1), (4, 1), (4, 2), (4, 1)], activation: str = 'ELU', activation_params: dict = {'alpha': 1.0}, final_activation: str | None = None, final_activation_params: dict | None = None, norm: str = 'weight_norm', norm_params: Dict[str, Any] = {}, kernel_size: int = 7, last_kernel_size: int = 7, residual_kernel_size: int = 3, dilation_base: int = 2, causal: bool = False, pad_mode: str = 'reflect', true_skip: bool = False, compress: int = 2, lstm: int = 2, trim_right_ratio: float = 1.0, res_seq=True, last_out_padding: List[int] = [(0, 1), (0, 0)], tr_conv_group_ratio: int = -1, conv_group_ratio: int = -1)
Bases: Module
SEANet decoder. :param channels: Audio channels. :type channels: int :param dimension: Intermediate representation dimension. :type dimension: int :param n_filters: Base width for the model. :type n_filters: int :param n_residual_layers: nb of residual layers. :type n_residual_layers: int :param ratios: kernel size and stride ratios :type ratios: Sequence[int] :param activation: Activation function. :type activation: str :param activation_params: Parameters to provide to the activation function :type activation_params: dict :param final_activation: Final activation function after all convolutions. :type final_activation: str :param final_activation_params: Parameters to provide to the activation function :type final_activation_params: dict :param norm: Normalization method. :type norm: str :param norm_params: Parameters to provide to the underlying normalization used
along with the convolution.
- Parameters:
- kernel_size (int) – Kernel size for the initial convolution.
- last_kernel_size (int) – Kernel size for the initial convolution.
- residual_kernel_size (int) – Kernel size for the residual layers.
- dilation_base (int) – How much to increase the dilation with each layer.
- causal (bool) – Whether to use fully causal convolution.
- pad_mode (str) – Padding mode for the convolutions.
- true_skip (bool) – Whether to use true skip connection or a simple (streamable) convolution as the skip connection in the residual network blocks.
- compress (int) – Reduced dimensionality in residual branches (from Demucs v3).
- lstm (int) – Number of LSTM layers at the end of the encoder.
- trim_right_ratio (float) – Ratio for trimming at the right of the transposed convolution under the causal setup. If equal to 1.0, it means that all the trimming is done at the right.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
forward(z)
Defines the computation performed at every call.
Should be overridden by all subclasses.
NOTE
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
output_size()