espnet.nets.pytorch_backend.e2e_asr.E2E
espnet.nets.pytorch_backend.e2e_asr.E2E
class espnet.nets.pytorch_backend.e2e_asr.E2E(idim, odim, args)
Bases: ASRInterface
, Module
E2E module.
- Parameters:
- idim (int) – dimension of inputs
- odim (int) – dimension of outputs
- args (Namespace) – argument Namespace containing options
Construct an E2E object.
- Parameters:
- idim (int) – dimension of inputs
- odim (int) – dimension of outputs
- args (Namespace) – argument Namespace containing options
static add_arguments(parser)
Add arguments.
static attention_add_arguments(parser)
Add arguments for the attention.
calculate_all_attentions(xs_pad, ilens, ys_pad)
E2E attention calculation.
- Parameters:
- xs_pad (torch.Tensor) – batch of padded input sequences (B, Tmax, idim)
- ilens (torch.Tensor) – batch of lengths of input sequences (B)
- ys_pad (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
- Returns: attention weights with the following shape,
- multi-head case => attention weights (B, H, Lmax, Tmax),
- other case => attention weights (B, Lmax, Tmax).
- Return type: float ndarray
calculate_all_ctc_probs(xs_pad, ilens, ys_pad)
E2E CTC probability calculation.
- Parameters:
- xs_pad (torch.Tensor) – batch of padded input sequences (B, Tmax)
- ilens (torch.Tensor) – batch of lengths of input sequences (B)
- ys_pad (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
- Returns: CTC probability (B, Tmax, vocab)
- Return type: float ndarray
static decoder_add_arguments(parser)
Add arguments for the decoder.
encode(x)
Encode acoustic features.
- Parameters:x (ndarray) – input acoustic feature (T, D)
- Returns: encoder outputs
- Return type: torch.Tensor
static encoder_add_arguments(parser)
Add arguments for the encoder.
enhance(xs)
Forward only in the frontend stage.
- Parameters:xs (ndarray) – input acoustic feature (T, C, F)
- Returns: enhaned feature
- Return type: torch.Tensor
forward(xs_pad, ilens, ys_pad)
E2E forward.
- Parameters:
- xs_pad (torch.Tensor) – batch of padded input sequences (B, Tmax, idim)
- ilens (torch.Tensor) – batch of lengths of input sequences (B)
- ys_pad (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
- Returns: loss value
- Return type: torch.Tensor
get_total_subsampling_factor()
Get total subsampling factor.
init_like_chainer()
Initialize weight like chainer.
chainer basically uses LeCun way: W ~ Normal(0, fan_in ** -0.5), b = 0 pytorch basically uses W, b ~ Uniform(-fan_in**-0.5, fan_in**-0.5) however, there are two exceptions as far as I know.
- EmbedID.W ~ Normal(0, 1)
- LSTM.upward.b[forget_gate_range] = 1 (but not used in NStepLSTM)
recognize(x, recog_args, char_list, rnnlm=None)
E2E beam search.
- Parameters:
- x (ndarray) – input acoustic feature (T, D)
- recog_args (Namespace) – argument Namespace containing options
- char_list (list) – list of characters
- rnnlm (torch.nn.Module) – language model module
- Returns: N-best decoding results
- Return type: list
recognize_batch(xs, recog_args, char_list, rnnlm=None)
E2E batch beam search.
- Parameters:
- xs (list) – list of input acoustic feature arrays [(T_1, D), (T_2, D), …]
- recog_args (Namespace) – argument Namespace containing options
- char_list (list) – list of characters
- rnnlm (torch.nn.Module) – language model module
- Returns: N-best decoding results
- Return type: list
scorers()
Scorers.
subsample_frames(x)
Subsample speeh frames in the encoder.