espnet.nets.pytorch_backend.rnn.attentions.GDCAttLoc

Less than 1 minute

espnet.nets.pytorch_backend.rnn.attentions.GDCAttLoc

class espnet.nets.pytorch_backend.rnn.attentions.GDCAttLoc(eprojs, dunits, att_dim, aconv_chans, aconv_filts, han_mode=False)

Bases: Module

Global duration control attention module.

Reference: Singing-Tacotron: Global Duration Control Attention and Dynamic Filter for End-to-end Singing Voice Synthesis (https://arxiv.org/abs/2202.07907) :param int eprojs: # projection-units of encoder :param int dunits: # units of decoder :param int att_dim: attention dimension :param int aconv_chans: # channels of attention convolution :param int aconv_filts: filter size of attention convolution :param bool han_mode: flag to swith on mode of hierarchical attention

and not store pre_compute_enc_h

Initialize GDCAttLoc.

forward(enc_hs_pad, enc_hs_len, trans_token, dec_z, att_prev, scaling=1.0, last_attended_idx=None, backward_window=1, forward_window=3)

Calcualte AttLoc forward propagation.

Parameters:
- enc_hs_pad (torch.Tensor) – padded encoder hidden state (B x T_max x D_enc)
- enc_hs_len (list) – padded encoder hidden state length (B)
- trans_token (torch.Tensor) – Global transition token for duration (B x T_max x 1)
- dec_z (torch.Tensor) – decoder hidden state (B x D_dec)
- att_prev (torch.Tensor) – previous attention weight (B x T_max)
- scaling (float) – scaling parameter before applying softmax
- forward_window (int) – forward window size when constraining attention
- last_attended_idx (int) – index of the inputs of the last attended
- backward_window (int) – backward window size in attention constraint
- forward_window – forward window size in attetion constraint
Returns: attention weighted encoder state (B, D_enc)
Return type: torch.Tensor
Returns: previous attention weights (B x T_max)
Return type: torch.Tensor

reset()

Reset states.