espnet.nets.pytorch_backend.e2e_st_transformer.E2E

About 2 min

class espnet.nets.pytorch_backend.e2e_st_transformer.E2E(idim, odim, args, ignore_id=-1)

Bases: STInterface, Module

E2E module.

Parameters:
- idim (int) – dimension of inputs
- odim (int) – dimension of outputs
- args (Namespace) – argument Namespace containing options

Construct an E2E object.

Parameters:
- idim (int) – dimension of inputs
- odim (int) – dimension of outputs
- args (Namespace) – argument Namespace containing options

static add_arguments(parser)

Add arguments.

property attention_plot_class

Return PlotAttentionReport.

calculate_all_attentions(xs_pad, ilens, ys_pad, ys_pad_src)

E2E attention calculation.

Parameters:
- xs_pad (torch.Tensor) – batch of padded input sequences (B, Tmax, idim)
- ilens (torch.Tensor) – batch of lengths of input sequences (B)
- ys_pad (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
- ys_pad_src (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
Returns: attention weights (B, H, Lmax, Tmax)
Return type: float ndarray

calculate_all_ctc_probs(xs_pad, ilens, ys_pad, ys_pad_src)

E2E CTC probability calculation.

Parameters:
- xs_pad (torch.Tensor) – batch of padded input sequences (B, Tmax)
- ilens (torch.Tensor) – batch of lengths of input sequences (B)
- ys_pad (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
- ys_pad_src (torch.Tensor) – batch of padded token id sequence tensor (B, Lmax)
Returns: CTC probability (B, Tmax, vocab)
Return type: float ndarray

encode(x)

Encode source acoustic features.

forward(xs_pad, ilens, ys_pad, ys_pad_src)

E2E forward.

Parameters:
- xs_pad (torch.Tensor) – batch of padded source sequences (B, Tmax, idim)
- ilens (torch.Tensor) – batch of lengths of source sequences (B)
- ys_pad (torch.Tensor) – batch of padded target sequences (B, Lmax)
- ys_pad_src (torch.Tensor) – batch of padded target sequences (B, Lmax)
Returns: ctc loss value
Return type: torch.Tensor
Returns: attention loss value
Return type: torch.Tensor
Returns: accuracy in attention decoder
Return type: float

forward_asr(hs_pad, hs_mask, ys_pad)

Forward pass in the auxiliary ASR task.

Parameters:
- hs_pad (torch.Tensor) – batch of padded source sequences (B, Tmax, idim)
- hs_mask (torch.Tensor) – batch of input token mask (B, Lmax)
- ys_pad (torch.Tensor) – batch of padded target sequences (B, Lmax)
Returns: ASR attention loss value
Return type: torch.Tensor
Returns: accuracy in ASR attention decoder
Return type: float
Returns: ASR CTC loss value
Return type: torch.Tensor
Returns: character error rate from CTC prediction
Return type: float
Returns: character error rate from attetion decoder prediction
Return type: float
Returns: word error rate from attetion decoder prediction
Return type: float

forward_mt(xs_pad, ys_in_pad, ys_out_pad, ys_mask)

Forward pass in the auxiliary MT task.

Parameters:
- xs_pad (torch.Tensor) – batch of padded source sequences (B, Tmax, idim)
- ys_in_pad (torch.Tensor) – batch of padded target sequences (B, Lmax)
- ys_out_pad (torch.Tensor) – batch of padded target sequences (B, Lmax)
- ys_mask (torch.Tensor) – batch of input token mask (B, Lmax)
Returns: MT loss value
Return type: torch.Tensor
Returns: accuracy in MT decoder
Return type: float

get_total_subsampling_factor()

Get total subsampling factor.

reset_parameters(args)

Initialize parameters.

scorers()

Scorers.

translate(x, trans_args, char_list=None)

Translate input speech.

Parameters:
- x (ndnarray) – input acoustic feature (B, T, D) or (T, D)
- trans_args (Namespace) – argment Namespace contraining options
- char_list (list) – list of characters
Returns: N-best decoding results
Return type: list