espnet2.tts.feats_extract.ying.Ying

About 1 min

espnet2.tts.feats_extract.ying.Ying

class espnet2.tts.feats_extract.ying.Ying(fs: int = 22050, w_step: int = 256, W: int = 2048, tau_max: int = 2048, midi_start: int = -5, midi_end: int = 75, octave_range: int = 24, use_token_averaged_ying: bool = False)

Bases: AbsFeatsExtract

Extact Ying-based Features.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

crop_scope(x, yin_start, scope_shift)

forward(input: Tensor, input_lengths: Tensor | None = None, feats_lengths: Tensor | None = None, durations: Tensor | None = None, durations_lengths: Tensor | None = None) → Tuple[Tensor, Tensor]

Defines the computation performed at every call.

Should be overridden by all subclasses.

NOTE

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_parameters() → Dict[str, Any]

midi_to_lag(m: int, octave_range: float = 12)

converts midi-to-lag, eq. (4)

Parameters:
- m – midi
- fs – sample_rate
- octave_range
Returns: time lag(tau, c(m)) calculated from midi, eq. (4)
Return type: lag

output_size() → int

yingram(x: Tensor)

calculates yingram from raw audio (multi segment)

Parameters:
- x – raw audio, torch.Tensor of shape (t)
- W – yingram Window Size
- tau_max
- fs – sampling rate
- w_step – yingram bin step size
Returns: yingram. torch.Tensor of shape (80 x t’)
Return type: yingram

yingram_from_cmndf(cmndfs: Tensor) → Tensor

yingram calculator from cMNDFs.

(cumulative Mean Normalized Difference Functions)

Parameters:
- cmndfs – torch.Tensor calculated cumulative mean normalized difference function for details, see models/yin.py or eq. (1) and (2)
- ms – list of midi(int)
- fs – sampling rate
Returns: calculated batch yingram
Return type: y