espnet.nets.pytorch_backend.transformer.encoder_mix.EncoderMix

Less than 1 minute

espnet.nets.pytorch_backend.transformer.encoder_mix.EncoderMix

class espnet.nets.pytorch_backend.transformer.encoder_mix.EncoderMix(idim, attention_dim=256, attention_heads=4, linear_units=2048, num_blocks_sd=4, num_blocks_rec=8, dropout_rate=0.1, positional_dropout_rate=0.1, attention_dropout_rate=0.0, input_layer='conv2d', pos_enc_class=<class 'espnet.nets.pytorch_backend.transformer.embedding.PositionalEncoding'>, normalize_before=True, concat_after=False, positionwise_layer_type='linear', positionwise_conv_kernel_size=1, padding_idx=-1, num_spkrs=2)

Bases: Encoder, Module

Transformer encoder module.

Parameters:
- idim (int) – input dim
- attention_dim (int) – dimension of attention
- attention_heads (int) – the number of heads of multi head attention
- linear_units (int) – the number of units of position-wise feed forward
- num_blocks (int) – the number of decoder blocks
- dropout_rate (float) – dropout rate
- attention_dropout_rate (float) – dropout rate in attention
- positional_dropout_rate (float) – dropout rate after adding positional encoding
- input_layer (str or torch.nn.Module) – input layer type
- pos_enc_class (class) – PositionalEncoding or ScaledPositionalEncoding
- normalize_before (bool) – whether to use layer_norm before the first block
- concat_after (bool) – whether to concat attention layer’s input and output if True, additional linear will be applied. i.e. x -> x + linear(concat(x, att(x))) if False, no additional linear will be applied. i.e. x -> x + att(x)
- positionwise_layer_type (str) – linear of conv1d
- positionwise_conv_kernel_size (int) – kernel size of positionwise conv1d layer
- padding_idx (int) – padding_idx for input_layer=embed

Construct an Encoder object.

forward(xs, masks)

Encode input sequence.

Parameters:
- xs (torch.Tensor) – input tensor
- masks (torch.Tensor) – input mask
Returns: position embedded tensor and mask
Rtype Tuple[torch.Tensor, torch.Tensor]:

forward_one_step(xs, masks, *, cache=None)

Encode input frame.

Parameters:
- xs (torch.Tensor) – input tensor
- masks (torch.Tensor) – input mask
- cache (List *[*torch.Tensor ]) – cache tensors
Returns: position embedded tensor, mask and new cache
Rtype Tuple[torch.Tensor, torch.Tensor, List[torch.Tensor]]: