espnet2.tts.feats_extract.ying.Ying
espnet2.tts.feats_extract.ying.Ying
class espnet2.tts.feats_extract.ying.Ying(fs: int = 22050, w_step: int = 256, W: int = 2048, tau_max: int = 2048, midi_start: int = -5, midi_end: int = 75, octave_range: int = 24, use_token_averaged_ying: bool = False)
Bases: AbsFeatsExtract
Extact Ying-based Features.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
crop_scope(x, yin_start, scope_shift)
forward(input: Tensor, input_lengths: Tensor | None = None, feats_lengths: Tensor | None = None, durations: Tensor | None = None, durations_lengths: Tensor | None = None) → Tuple[Tensor, Tensor]
Defines the computation performed at every call.
Should be overridden by all subclasses.
NOTE
Although the recipe for forward pass needs to be defined within this function, one should call the Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
get_parameters() → Dict[str, Any]
midi_to_lag(m: int, octave_range: float = 12)
converts midi-to-lag, eq. (4)
- Parameters:
- m – midi
- fs – sample_rate
- octave_range
- Returns: time lag(tau, c(m)) calculated from midi, eq. (4)
- Return type: lag
output_size() → int
yingram(x: Tensor)
calculates yingram from raw audio (multi segment)
- Parameters:
- x – raw audio, torch.Tensor of shape (t)
- W – yingram Window Size
- tau_max
- fs – sampling rate
- w_step – yingram bin step size
- Returns: yingram. torch.Tensor of shape (80 x t’)
- Return type: yingram
yingram_from_cmndf(cmndfs: Tensor) → Tensor
yingram calculator from cMNDFs.
(cumulative Mean Normalized Difference Functions)
- Parameters:
- cmndfs – torch.Tensor calculated cumulative mean normalized difference function for details, see models/yin.py or eq. (1) and (2)
- ms – list of midi(int)
- fs – sampling rate
- Returns: calculated batch yingram
- Return type: y