espnet2.tts2.fastspeech2.loss.FastSpeech2LossDiscrete

Less than 1 minute

espnet2.tts2.fastspeech2.loss.FastSpeech2LossDiscrete

class espnet2.tts2.fastspeech2.loss.FastSpeech2LossDiscrete(use_masking: bool = True, use_weighted_masking: bool = False, ignore_id: int = -1)

Bases: Module

Loss function module for FastSpeech2.

Initialize feed-forward Transformer loss module.

Parameters:
- use_masking (bool) – Whether to apply masking for padded part in loss calculation.
- use_weighted_masking (bool) – Whether to weighted masking in loss calculation.

forward(after_outs: Tensor, before_outs: Tensor, d_outs: Tensor, p_outs: Tensor, e_outs: Tensor, ys: Tensor, ds: Tensor, ps: Tensor, es: Tensor, ilens: Tensor, olens: Tensor) → Tuple[Tensor, Tensor, Tensor, Tensor]

Calculate forward propagation.

Parameters:
- after_outs (Tensor) – Batch of outputs after postnets (B, T_feats, odim).
- before_outs (Tensor) – Batch of outputs before postnets (B, T_feats, odim).
- d_outs (LongTensor) – Batch of outputs of duration predictor (B, T_text).
- p_outs (Tensor) – Batch of outputs of pitch predictor (B, T_text, 1).
- e_outs (Tensor) – Batch of outputs of energy predictor (B, T_text, 1).
- ys (Tensor) – Batch of target features in discrete space (B, T_feats).
- ds (LongTensor) – Batch of durations (B, T_text).
- ps (Tensor) – Batch of target token-averaged pitch (B, T_text, 1).
- es (Tensor) – Batch of target token-averaged energy (B, T_text, 1).
- ilens (LongTensor) – Batch of the lengths of each input (B,).
- olens (LongTensor) – Batch of the lengths of each target (B,).
Returns: CrossEntropy loss value. Tensor: Duration predictor loss value. Tensor: Pitch predictor loss value. Tensor: Energy predictor loss value.
Return type: Tensor