espnet2.gan_svs.visinger2.visinger2_vocoder.MelScale
espnet2.gan_svs.visinger2.visinger2_vocoder.MelScale
class espnet2.gan_svs.visinger2.visinger2_vocoder.MelScale(n_mels: int = 128, sample_rate: int = 24000, f_min: float = 0.0, f_max: float | None = None, n_stft: int | None = None)
Bases: Module
Turn a normal STFT into a mel frequency STFT, using a conversion
matrix. This uses triangular filter banks. User can control which device the filter bank (fb) is (e.g. fb.to(spec_f.device)). :param n_mels: Number of mel filterbanks. (Default: 128) :type n_mels: int, optional :param sample_rate: Sample rate of audio signal. (Default: 16000) :type sample_rate: int, optional :param f_min: Minimum frequency. (Default: 0.) :type f_min: float, optional :param f_max: Maximum frequency.
(Default: sample_rate // 2)
- Parameters:n_stft (int , optional) – Number of bins in STFT. Calculated from first input if None is given. See n_fft in :class:Spectrogram. (Default: None)
Initializes internal Module state, shared by both nn.Module and ScriptModule.
forward(specgram: Tensor) → Tensor
Forward MelScale
- Parameters:specgram (Tensor) – A spectrogram STFT of dimension (…, freq, time).
- Returns: Mel frequency spectrogram of size (…, n_mels, time).
- Return type: Tensor