espnet2.layers package

espnet2.layers.utterance_mvn

class espnet2.layers.utterance_mvn.UtteranceMVN(norm_means: bool = True, norm_vars: bool = False, eps: float = 1e-20)[source]

Bases: espnet2.layers.abs_normalize.AbsNormalize

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]

Forward function

Parameters
  • x – (B, L, …)

  • ilens – (B,)

espnet2.layers.utterance_mvn.utterance_mvn(x: torch.Tensor, ilens: torch.Tensor = None, norm_means: bool = True, norm_vars: bool = False, eps: float = 1e-20) → Tuple[torch.Tensor, torch.Tensor][source]

Apply utterance mean and variance normalization

Parameters
  • x – (B, T, D), assumed zero padded

  • ilens – (B,)

  • norm_means

  • norm_vars

  • eps

espnet2.layers.stft

class espnet2.layers.stft.Stft(n_fft: int = 512, win_length: int = None, hop_length: int = 128, window: Optional[str] = 'hann', center: bool = True, normalized: bool = False, onesided: bool = True)[source]

Bases: torch.nn.modules.module.Module, espnet2.layers.inversible_interface.InversibleInterface

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(input: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]

STFT forward function.

Parameters
  • input – (Batch, Nsamples) or (Batch, Nsample, Channels)

  • ilens – (Batch)

Returns

(Batch, Frames, Freq, 2) or (Batch, Frames, Channels, Freq, 2)

Return type

output

inverse(input: Union[torch.Tensor, torch_complex.tensor.ComplexTensor], ilens: torch.Tensor = None) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]

Inverse STFT.

Parameters
  • input – Tensor(batch, T, F, 2) or ComplexTensor(batch, T, F)

  • ilens – (batch,)

Returns

(batch, samples) ilens: (batch,)

Return type

wavs

espnet2.layers.mask_along_axis

class espnet2.layers.mask_along_axis.MaskAlongAxis(mask_width_range: Union[int, Sequence[int]] = (0, 30), num_mask: int = 2, dim: Union[int, str] = 'time', replace_with_zero: bool = True)[source]

Bases: torch.nn.modules.module.Module

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(spec: torch.Tensor, spec_lengths: torch.Tensor = None)[source]

Forward function.

Parameters

spec – (Batch, Length, Freq)

espnet2.layers.mask_along_axis.mask_along_axis(spec: torch.Tensor, spec_lengths: torch.Tensor, mask_width_range: Sequence[int] = (0, 30), dim: int = 1, num_mask: int = 2, replace_with_zero: bool = True)[source]

Apply mask along the specified direction.

Parameters
  • spec – (Batch, Length, Freq)

  • spec_lengths – (Length): Not using lenghts in this implementation

  • mask_width_range – Select the width randomly between this range

espnet2.layers.time_warp

class espnet2.layers.time_warp.TimeWarp(window: int = 80, mode: str = 'bicubic')[source]

Bases: torch.nn.modules.module.Module

Time warping using torch.interpolate.

Parameters
  • window – time warp parameter

  • mode – Interpolate mode

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: torch.Tensor, x_lengths: torch.Tensor = None)[source]

Forward function.

Parameters
  • x – (Batch, Time, Freq)

  • x_lengths – (Batch,)

espnet2.layers.time_warp.time_warp(x: torch.Tensor, window: int = 80, mode: str = 'bicubic')[source]

Time warping using torch.interpolate.

Parameters
  • x – (Batch, Time, Freq)

  • window – time warp parameter

  • mode – Interpolate mode

espnet2.layers.abs_normalize

class espnet2.layers.abs_normalize.AbsNormalize[source]

Bases: torch.nn.modules.module.Module, abc.ABC

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract forward(input: torch.Tensor, input_lengths: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

espnet2.layers.inversible_interface

class espnet2.layers.inversible_interface.InversibleInterface[source]

Bases: abc.ABC

abstract inverse(input: torch.Tensor, input_lengths: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]

espnet2.layers.log_mel

class espnet2.layers.log_mel.LogMel(fs: int = 16000, n_fft: int = 512, n_mels: int = 80, fmin: float = None, fmax: float = None, htk: bool = False, log_base: float = None)[source]

Bases: torch.nn.modules.module.Module

Convert STFT to fbank feats

The arguments is same as librosa.filters.mel

Parameters
  • fs – number > 0 [scalar] sampling rate of the incoming signal

  • n_fft – int > 0 [scalar] number of FFT components

  • n_mels – int > 0 [scalar] number of Mel bands to generate

  • fmin – float >= 0 [scalar] lowest frequency (in Hz)

  • fmax – float >= 0 [scalar] highest frequency (in Hz). If None, use fmax = fs / 2.0

  • htk – use HTK formula instead of Slaney

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(feat: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

espnet2.layers.__init__

espnet2.layers.global_mvn

class espnet2.layers.global_mvn.GlobalMVN(stats_file: Union[pathlib.Path, str], norm_means: bool = True, norm_vars: bool = True, eps: float = 1e-20)[source]

Bases: espnet2.layers.abs_normalize.AbsNormalize, espnet2.layers.inversible_interface.InversibleInterface

Apply global mean and variance normalization

TODO(kamo): Make this class portable somehow

Parameters
  • stats_file – npy file

  • norm_means – Apply mean normalization

  • norm_vars – Apply var normalization

  • eps

extra_repr()[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]

Forward function

Parameters
  • x – (B, L, …)

  • ilens – (B,)

inverse(x: torch.Tensor, ilens: torch.Tensor = None) → Tuple[torch.Tensor, torch.Tensor][source]