espnet.nets.pytorch_backend.rnn.decoders.Decoder
espnet.nets.pytorch_backend.rnn.decoders.Decoder
class espnet.nets.pytorch_backend.rnn.decoders.Decoder(eprojs, odim, dtype, dlayers, dunits, sos, eos, att, verbose=0, char_list=None, labeldist=None, lsm_weight=0.0, sampling_probability=0.0, dropout=0.0, context_residual=False, replace_sos=False, num_encs=1)
Bases: Module
, ScorerInterface
Decoder module.
- Parameters:
- eprojs (int) – encoder projection units
- odim (int) – dimension of outputs
- dtype (str) – gru or lstm
- dlayers (int) – decoder layers
- dunits (int) – decoder units
- sos (int) – start of sequence symbol id
- eos (int) – end of sequence symbol id
- att (torch.nn.Module) – attention module
- verbose (int) – verbose level
- char_list (list) – list of character strings
- labeldist (ndarray) – distribution of label smoothing
- lsm_weight (float) – label smoothing weight
- sampling_probability (float) – scheduled sampling probability
- dropout (float) – dropout rate
- context_residual (float) – if True, use context vector for token generation
- replace_sos (float) – use for multilingual (speech/text) translation
Initialize decoder.
calculate_all_attentions(hs_pad, hlen, ys_pad, strm_idx=0, lang_ids=None)
Calculate all of attentions.
- Parameters:
- hs_pad (torch.Tensor) – batch of padded hidden state sequences (B, Tmax, D) in multi-encoder case, list of torch.Tensor, [(B, Tmax_1, D), (B, Tmax_2, D), …, ] ]
- hlen (torch.Tensor) – batch of lengths of hidden state sequences (B) [in multi-encoder case, list of torch.Tensor, [(B), (B), …, ]
- ys_pad (torch.Tensor) – batch of padded character id sequence tensor (B, Lmax)
- strm_idx (int) – stream index for parallel speaker attention in multi-speaker case
- lang_ids (torch.Tensor) – batch of target language id tensor (B, 1)
- Returns: attention weights with the following shape,
- multi-head case => attention weights (B, H, Lmax, Tmax),
- multi-encoder case =>
[(B, Lmax, Tmax1), (B, Lmax, Tmax2), …, (B, Lmax, NumEncs)]
- other case => attention weights (B, Lmax, Tmax).
- Return type: float ndarray
forward(hs_pad, hlens, ys_pad, strm_idx=0, lang_ids=None)
Forward Decoder.
- Parameters:
- hs_pad (torch.Tensor) – batch of padded hidden state sequences (B, Tmax, D) [in multi-encoder case, list of torch.Tensor, [(B, Tmax_1, D), (B, Tmax_2, D), …, ] ]
- hlens (torch.Tensor) – batch of lengths of hidden state sequences (B) [in multi-encoder case, list of torch.Tensor, [(B), (B), …, ]
- ys_pad (torch.Tensor) – batch of padded character id sequence tensor (B, Lmax)
- strm_idx (int) – stream index indicates the index of decoding stream.
- lang_ids (torch.Tensor) – batch of target language id tensor (B, 1)
- Returns: attention loss value
- Return type: torch.Tensor
- Returns: accuracy
- Return type: float
init_state(x)
Initialize states.
recognize_beam(h, lpz, recog_args, char_list, rnnlm=None, strm_idx=0)
Process beam search implementation.
- Parameters:
- h (torch.Tensor) – encoder hidden state (T, eprojs) [in multi-encoder case, list of torch.Tensor, [(T1, eprojs), (T2, eprojs), …] ]
- lpz (torch.Tensor) – ctc log softmax output (T, odim) [in multi-encoder case, list of torch.Tensor, [(T1, odim), (T2, odim), …] ]
- recog_args (Namespace) – argument Namespace containing options
- char_list – list of character strings
- rnnlm (torch.nn.Module) – language module
- strm_idx (int) – stream index for speaker parallel attention in multi-speaker case
- Returns: N-best decoding results
- Return type: list of dicts
recognize_beam_batch(h, hlens, lpz, recog_args, char_list, rnnlm=None, normalize_score=True, strm_idx=0, lang_ids=None)
Recognize beam batch.
rnn_forward(ey, z_list, c_list, z_prev, c_prev)
Run rnn forward.
score(yseq, state, x)
Score probabilities.
zero_state(hs_pad)
Set zerp states.