espnet2.ssl.utils.mask.Masking
espnet2.ssl.utils.mask.Masking
class espnet2.ssl.utils.mask.Masking(encoder_embed_dim: int, mask_prob: float = 0.8, mask_selection: str = 'static', mask_other: float = 0.0, mask_length: int = 10, no_mask_overlap: bool = False, mask_min_space: int = 0, mask_channel_prob: float = 0.0, mask_channel_selection: str = 'static', mask_channel_other: float = 0.0, mask_channel_length: int = 10, no_mask_channel_overlap: bool = False, mask_channel_min_space: int = 0)
Bases: Module
Generate the masks for masked prediction. :param encoder_embed_dim: The dimension of the transformer embedding output. :type encoder_embed_dim: int :param mask_prob: Prob for each token to be the start of a masked span.
Will be multiplied by num of timesteps divided by len of mask span to mask approx this % of all elements. However due to overlaps, the actual number will be smaller (unless no_overlap is True).
- Parameters:
- mask_selection (str) – How to choose the mask length. Options: [
static
,uniform
,normal
,poisson
]. - mask_other (float) – Secondary mask argument (used for more complex distributions).
- mask_length (int) – The lengths of the mask.
- no_mask_overlap (bool) – Whether to allow masks to overlap.
- mask_min_space (int) – Minimum space between spans (if no overlap).
- mask_channel_prob (float) – The probability of replacing a feature with 0.
- mask_channel_selection (str) – How to choose mask length for channel mask. Options: [
static
,uniform
,normal
,poisson
]. - mask_channel_other (float) – Secondary mask argument for channel masking (used for more complex distributions).
- mask_channel_length (int) – Minimum space between spans (if no overlap is enabled) for channel masking.
- no_mask_channel_overlap (bool) – Whether to allow channel masks to overlap.
- mask_channel_min_space (int) – Minimum space between spans for channel masking (if no overlap is enabled).
- mask_selection (str) – How to choose the mask length. Options: [
forward(x: Tensor, padding_mask: Tensor | None) → Tensor
- Parameters:
- x (Tensor) – The encoded representations after feature extraction module.
- padding_mask (Tensor or None) – The padding mask which will prevent masking padded elements.
- Returns: The feature representations after masking. Tensor: The generated mask indices.
- Return type: Tensor