espnet2.svs.xiaoice.loss.XiaoiceSing2Loss

Less than 1 minute

espnet2.svs.xiaoice.loss.XiaoiceSing2Loss

class espnet2.svs.xiaoice.loss.XiaoiceSing2Loss(use_masking: bool = True, use_weighted_masking: bool = False)

Bases: Module

Loss function module for FastSpeech2.

Initialize feed-forward Transformer loss module.

Parameters:
- use_masking (bool) – Whether to apply masking for padded part in loss calculation.
- use_weighted_masking (bool) – Whether to weighted masking in loss calculation.

forward(after_outs: Tensor, before_outs: Tensor, d_outs: Tensor, p_outs: Tensor, v_outs: Tensor, ys: Tensor, ds: Tensor, ps: Tensor, vs: Tensor, ilens: Tensor, olens: Tensor, loss_type: str = 'L1') → Tuple[Tensor, Tensor, Tensor, Tensor]

Calculate forward propagation.

Parameters:
- after_outs (Tensor) – Batch of outputs after postnets (B, T_feats, odim).
- before_outs (Tensor) – Batch of outputs before postnets (B, T_feats, odim).
- d_outs (LongTensor) – Batch of outputs of duration predictor (B, T_text).
- p_outs (Tensor) – Batch of outputs of log_f0 (B, T_text, 1).
- v_outs (Tensor) – Batch of outputs of VUV (B, T_text, 1).
- ys (Tensor) – Batch of target features (B, T_feats, odim).
- ds (LongTensor) – Batch of durations (B, T_text).
- ps (Tensor) – Batch of target log_f0 (B, T_text, 1).
- vs (Tensor) – Batch of target VUV (B, T_text, 1).
- ilens (LongTensor) – Batch of the lengths of each input (B,).
- olens (LongTensor) – Batch of the lengths of each target (B,).
- loss_type (str) – Mel loss type (“L1” (MAE), “L2” (MSE) or “L1+L2”)
Returns: Mel loss value. Tensor: Duration predictor loss value. Tensor: Pitch predictor loss value. Tensor: VUV predictor loss value.
Return type: Tensor