espnet.nets.pytorch_backend.rnn.attentions.GDCAttLoc
espnet.nets.pytorch_backend.rnn.attentions.GDCAttLoc
class espnet.nets.pytorch_backend.rnn.attentions.GDCAttLoc(eprojs, dunits, att_dim, aconv_chans, aconv_filts, han_mode=False)
Bases: Module
Global duration control attention module.
Reference: Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis (https://arxiv.org/abs/2202.07907) :param int eprojs: # projection-units of encoder :param int dunits: # units of decoder :param int att_dim: attention dimension :param int aconv_chans: # channels of attention convolution :param int aconv_filts: filter size of attention convolution :param bool han_mode: flag to swith on mode of hierarchical attention
and not store pre_compute_enc_h
Initialize GDCAttLoc.
forward(enc_hs_pad, enc_hs_len, trans_token, dec_z, att_prev, scaling=1.0, last_attended_idx=None, backward_window=1, forward_window=3)
Calcualte AttLoc forward propagation.
- Parameters:
- enc_hs_pad (torch.Tensor) – padded encoder hidden state (B x T_max x D_enc)
- enc_hs_len (list) – padded encoder hidden state length (B)
- trans_token (torch.Tensor) – Global transition token for duration (B x T_max x 1)
- dec_z (torch.Tensor) – decoder hidden state (B x D_dec)
- att_prev (torch.Tensor) – previous attention weight (B x T_max)
- scaling (float) – scaling parameter before applying softmax
- forward_window (int) – forward window size when constraining attention
- last_attended_idx (int) – index of the inputs of the last attended
- backward_window (int) – backward window size in attention constraint
- forward_window – forward window size in attetion constraint
- Returns: attention weighted encoder state (B, D_enc)
- Return type: torch.Tensor
- Returns: previous attention weights (B x T_max)
- Return type: torch.Tensor
reset()
Reset states.