espnet2.train.preprocessor.SpkPreprocessor

Less than 1 minute

espnet2.train.preprocessor.SpkPreprocessor

class espnet2.train.preprocessor.SpkPreprocessor(train: bool, target_duration: float, spk2utt: str | None = None, sample_rate: int = 16000, num_eval: int = 10, rir_scp: str | None = None, rir_apply_prob: float = 1.0, noise_info: List[Tuple[float, str, Tuple[int, int], Tuple[float, float]]] | None = None, noise_apply_prob: float = 1.0, short_noise_thres: float = 0.5)

Bases: CommonPreprocessor

Preprocessor for Speaker tasks.

Parameters:
- train (bool) – Whether to use in training mode.
- spk2utt (str) – Path to the spk2utt file.
- target_duration (float) – Target duration in seconds.
- sample_rate (int) – Sampling rate.
- num_eval (int) – Number of utterances to be used for evaluation.
- rir_scp (str) – Path to the RIR scp file.
- rir_apply_prob (float) – Probability of applying RIR.
- noise_info (List *[*Tuple *[*float , str , Tuple *[*int , int ] , Tuple *[*float , float ] ] ]) –
  List of tuples of noise information. Each tuple represents a noise type. Each tuple consists of (prob, noise_scp, num_to_mix, db_range).
  - prob (float) is the probability of applying the noise type.
  - noise_scp (str) is the path to the noise scp file.
  - num_to_mix (Tuple[int, int]) is the range of the number of noises : to be mixed.
  - db_range (Tuple[float, float]) is the range of noise levels in dB.
- noise_apply_prob (float) – Probability of applying noise.
- short_noise_thres (float) – Threshold of short noise.