espnet.nets.pytorch_backend.e2e_st_transformer.E2E
espnet.nets.pytorch_backend.e2e_st_transformer.E2E
class espnet.nets.pytorch_backend.e2e_st_transformer.E2E(idim, odim, args, ignore_id=-1)
Bases: STInterface
, Module
E2E module.
- Parameters:
- idim (int) – dimension of inputs
- odim (int) – dimension of outputs
- args (Namespace) – argument Namespace containing options
Construct an E2E object.
- Parameters:
- idim (int) – dimension of inputs
- odim (int) – dimension of outputs
- args (Namespace) – argument Namespace containing options
static add_arguments(parser)
Add arguments.
property attention_plot_class
Return PlotAttentionReport.
calculate_all_attentions(xs_pad, ilens, ys_pad, ys_pad_src)
E2E attention calculation.
- Parameters:
- xs_pad (torch.Tensor) – batch of padded input sequences (B, Tmax, idim)
- ilens (torch.Tensor) – batch of lengths of input sequences (B)
- ys_pad (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
- ys_pad_src (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
- Returns: attention weights (B, H, Lmax, Tmax)
- Return type: float ndarray
calculate_all_ctc_probs(xs_pad, ilens, ys_pad, ys_pad_src)
E2E CTC probability calculation.
- Parameters:
- xs_pad (torch.Tensor) – batch of padded input sequences (B, Tmax)
- ilens (torch.Tensor) – batch of lengths of input sequences (B)
- ys_pad (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
- ys_pad_src (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
- Returns: CTC probability (B, Tmax, vocab)
- Return type: float ndarray
encode(x)
Encode source acoustic features.
- Parameters:x (ndarray) – source acoustic feature (T, D)
- Returns: encoder outputs
- Return type: torch.Tensor
forward(xs_pad, ilens, ys_pad, ys_pad_src)
E2E forward.
- Parameters:
- xs_pad (torch.Tensor) – batch of padded source sequences (B, Tmax, idim)
- ilens (torch.Tensor) – batch of lengths of source sequences (B)
- ys_pad (torch.Tensor) – batch of padded target sequences (B, Lmax)
- ys_pad_src (torch.Tensor) – batch of padded target sequences (B, Lmax)
- Returns: ctc loss value
- Return type: torch.Tensor
- Returns: attention loss value
- Return type: torch.Tensor
- Returns: accuracy in attention decoder
- Return type: float
forward_asr(hs_pad, hs_mask, ys_pad)
Forward pass in the auxiliary ASR task.
- Parameters:
- hs_pad (torch.Tensor) – batch of padded source sequences (B, Tmax, idim)
- hs_mask (torch.Tensor) – batch of input token mask (B, Lmax)
- ys_pad (torch.Tensor) – batch of padded target sequences (B, Lmax)
- Returns: ASR attention loss value
- Return type: torch.Tensor
- Returns: accuracy in ASR attention decoder
- Return type: float
- Returns: ASR CTC loss value
- Return type: torch.Tensor
- Returns: character error rate from CTC prediction
- Return type: float
- Returns: character error rate from attetion decoder prediction
- Return type: float
- Returns: word error rate from attetion decoder prediction
- Return type: float
forward_mt(xs_pad, ys_in_pad, ys_out_pad, ys_mask)
Forward pass in the auxiliary MT task.
- Parameters:
- xs_pad (torch.Tensor) – batch of padded source sequences (B, Tmax, idim)
- ys_in_pad (torch.Tensor) – batch of padded target sequences (B, Lmax)
- ys_out_pad (torch.Tensor) – batch of padded target sequences (B, Lmax)
- ys_mask (torch.Tensor) – batch of input token mask (B, Lmax)
- Returns: MT loss value
- Return type: torch.Tensor
- Returns: accuracy in MT decoder
- Return type: float
get_total_subsampling_factor()
Get total subsampling factor.
reset_parameters(args)
Initialize parameters.
scorers()
Scorers.
translate(x, trans_args, char_list=None)
Translate input speech.
- Parameters:
- x (ndnarray) – input acoustic feature (B, T, D) or (T, D)
- trans_args (Namespace) – argment Namespace contraining options
- char_list (list) – list of characters
- Returns: N-best decoding results
- Return type: list