espnet2.slu package¶
espnet2.slu.__init__¶
espnet2.slu.espnet_model¶
-
class
espnet2.slu.espnet_model.
ESPnetSLUModel
(vocab_size: int, token_list: Union[Tuple[str, ...], List[str]], frontend: Optional[espnet2.asr.frontend.abs_frontend.AbsFrontend], specaug: Optional[espnet2.asr.specaug.abs_specaug.AbsSpecAug], normalize: Optional[espnet2.layers.abs_normalize.AbsNormalize], preencoder: Optional[espnet2.asr.preencoder.abs_preencoder.AbsPreEncoder], encoder: espnet2.asr.encoder.abs_encoder.AbsEncoder, postencoder: Optional[espnet2.asr.postencoder.abs_postencoder.AbsPostEncoder], decoder: espnet2.asr.decoder.abs_decoder.AbsDecoder, ctc: espnet2.asr.ctc.CTC, joint_network: Optional[torch.nn.modules.module.Module], postdecoder: Optional[espnet2.slu.postdecoder.abs_postdecoder.AbsPostDecoder] = None, deliberationencoder: Optional[espnet2.asr.postencoder.abs_postencoder.AbsPostEncoder] = None, transcript_token_list: Union[Tuple[str, ...], List[str]] = None, ctc_weight: float = 0.5, interctc_weight: float = 0.0, ignore_id: int = -1, lsm_weight: float = 0.0, length_normalized_loss: bool = False, report_cer: bool = True, report_wer: bool = True, sym_space: str = '<space>', sym_blank: str = '<blank>', extract_feats_in_collect_stats: bool = True, two_pass: bool = False, pre_postencoder_norm: bool = False)[source]¶ Bases:
espnet2.asr.espnet_model.ESPnetASRModel
CTC-attention hybrid Encoder-Decoder model
-
collect_feats
(speech: torch.Tensor, speech_lengths: torch.Tensor, text: torch.Tensor, text_lengths: torch.Tensor, transcript: torch.Tensor = None, transcript_lengths: torch.Tensor = None, **kwargs) → Dict[str, torch.Tensor][source]¶
-
encode
(speech: torch.Tensor, speech_lengths: torch.Tensor, transcript_pad: torch.Tensor = None, transcript_pad_lens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]¶ Frontend + Encoder. Note that this method is used by asr_inference.py
- Parameters:
speech – (Batch, Length, …)
speech_lengths – (Batch, )
-
forward
(speech: torch.Tensor, speech_lengths: torch.Tensor, text: torch.Tensor, text_lengths: torch.Tensor, transcript: torch.Tensor = None, transcript_lengths: torch.Tensor = None, **kwargs) → Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor][source]¶ Frontend + Encoder + Decoder + Calc loss
- Parameters:
speech – (Batch, Length, …)
speech_lengths – (Batch, )
text – (Batch, Length)
text_lengths – (Batch,)
kwargs – “utt_id” is among the input.
-
espnet2.slu.postencoder.transformer_postencoder¶
Encoder definition.
-
class
espnet2.slu.postencoder.transformer_postencoder.
TransformerPostEncoder
(input_size: int, output_size: int = 256, attention_heads: int = 4, linear_units: int = 2048, num_blocks: int = 6, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.0, input_layer: Optional[str] = 'linear', pos_enc_class=<class 'espnet.nets.pytorch_backend.transformer.embedding.PositionalEncoding'>, normalize_before: bool = True, concat_after: bool = False, positionwise_layer_type: str = 'linear', positionwise_conv_kernel_size: int = 1, padding_idx: int = -1)[source]¶ Bases:
espnet2.asr.postencoder.abs_postencoder.AbsPostEncoder
Transformer encoder module.
- Parameters:
input_size – input dim
output_size – dimension of attention
attention_heads – the number of heads of multi head attention
linear_units – the number of units of position-wise feed forward
num_blocks – the number of decoder blocks
dropout_rate – dropout rate
attention_dropout_rate – dropout rate in attention
positional_dropout_rate – dropout rate after adding positional encoding
input_layer – input layer type
pos_enc_class – PositionalEncoding or ScaledPositionalEncoding
normalize_before – whether to use layer_norm before the first block
concat_after – whether to concat attention layer’s input and output if True, additional linear will be applied. i.e. x -> x + linear(concat(x, att(x))) if False, no additional linear will be applied. i.e. x -> x + att(x)
positionwise_layer_type – linear of conv1d
positionwise_conv_kernel_size – kernel size of positionwise conv1d layer
padding_idx – padding_idx for input_layer=embed
-
forward
(xs_pad: torch.Tensor, ilens: torch.Tensor, prev_states: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor, Optional[torch.Tensor]][source]¶ Embed positions in tensor.
- Parameters:
xs_pad – input tensor (B, L, D)
ilens – input length (B)
prev_states – Not to be used now.
- Returns:
position embedded tensor and mask
espnet2.slu.postencoder.__init__¶
espnet2.slu.postencoder.conformer_postencoder¶
Conformers PostEncoder.
-
class
espnet2.slu.postencoder.conformer_postencoder.
ConformerPostEncoder
(input_size: int, output_size: int = 256, attention_heads: int = 4, linear_units: int = 2048, num_blocks: int = 6, dropout_rate: float = 0.1, positional_dropout_rate: float = 0.1, attention_dropout_rate: float = 0.0, input_layer: str = 'linear', normalize_before: bool = True, concat_after: bool = False, positionwise_layer_type: str = 'linear', positionwise_conv_kernel_size: int = 3, macaron_style: bool = False, rel_pos_type: str = 'legacy', pos_enc_layer_type: str = 'rel_pos', selfattention_layer_type: str = 'rel_selfattn', activation_type: str = 'swish', use_cnn_module: bool = True, zero_triu: bool = False, cnn_module_kernel: int = 31, padding_idx: int = -1)[source]¶ Bases:
espnet2.asr.postencoder.abs_postencoder.AbsPostEncoder
Hugging Face Transformers PostEncoder.
espnet2.slu.postdecoder.__init__¶
espnet2.slu.postdecoder.hugging_face_transformers_postdecoder¶
Hugging Face Transformers PostDecoder.
-
class
espnet2.slu.postdecoder.hugging_face_transformers_postdecoder.
HuggingFaceTransformersPostDecoder
(model_name_or_path: str, output_size=256)[source]¶ Bases:
espnet2.slu.postdecoder.abs_postdecoder.AbsPostDecoder
Hugging Face Transformers PostEncoder.
Initialize the module.
espnet2.slu.postdecoder.abs_postdecoder¶
-
class
espnet2.slu.postdecoder.abs_postdecoder.
AbsPostDecoder
[source]¶ Bases:
torch.nn.modules.module.Module
,abc.ABC
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(transcript_input_ids: torch.LongTensor, transcript_attention_mask: torch.LongTensor, transcript_token_type_ids: torch.LongTensor, transcript_position_ids: torch.LongTensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
-
abstract