espnet.nets.pytorch_backend.rnn.attentions.GDCAttLoc
espnet.nets.pytorch_backend.rnn.attentions.GDCAttLoc
class espnet.nets.pytorch_backend.rnn.attentions.GDCAttLoc(eprojs, dunits, att_dim, aconv_chans, aconv_filts, han_mode=False)
Bases: Module
Global duration control attention module. Reference: Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis (https://arxiv.org/abs/2202.07907) :param int eprojs: # projection-units of encoder :param int dunits: # units of decoder :param int att_dim: attention dimension :param int aconv_chans: # channels of attention convolution :param int aconv_filts: filter size of attention convolution :param bool han_mode: flag to swith on mode of hierarchical attention
and not store pre_compute_enc_h
Initializes internal Module state, shared by both nn.Module and ScriptModule.
forward(enc_hs_pad, enc_hs_len, trans_token, dec_z, att_prev, scaling=1.0, last_attended_idx=None, backward_window=1, forward_window=3)
Calcualte AttLoc forward propagation. :param torch.Tensor enc_hs_pad: padded encoder hidden state (B x T_max x D_enc) :param list enc_hs_len: padded encoder hidden state length (B) :param torch.Tensor trans_token: Global transition token
for duration (B x T_max x 1)
- Parameters:
- dec_z (torch.Tensor) – decoder hidden state (B x D_dec)
- att_prev (torch.Tensor) – previous attention weight (B x T_max)
- scaling (float) – scaling parameter before applying softmax
- forward_window (int) – forward window size when constraining attention
- last_attended_idx (int) – index of the inputs of the last attended
- backward_window (int) – backward window size in attention constraint
- forward_window – forward window size in attetion constraint
- Returns: attention weighted encoder state (B, D_enc)
- Return type: torch.Tensor
- Returns: previous attention weights (B x T_max)
- Return type: torch.Tensor
reset()
reset states