espnet2.train.preprocessor.SpkPreprocessor
espnet2.train.preprocessor.SpkPreprocessor
class espnet2.train.preprocessor.SpkPreprocessor(train: bool, target_duration: float, spk2utt: str | None = None, sample_rate: int = 16000, num_eval: int = 10, rir_scp: str | None = None, rir_apply_prob: float = 1.0, noise_info: List[Tuple[float, str, Tuple[int, int], Tuple[float, float]]] | None = None, noise_apply_prob: float = 1.0, short_noise_thres: float = 0.5)
Bases: CommonPreprocessor
Preprocessor for Speaker tasks.
- Parameters:
train (bool) – Whether to use in training mode.
spk2utt (str) – Path to the spk2utt file.
target_duration (float) – Target duration in seconds.
sample_rate (int) – Sampling rate.
num_eval (int) – Number of utterances to be used for evaluation.
rir_scp (str) – Path to the RIR scp file.
rir_apply_prob (float) – Probability of applying RIR.
noise_info (List *[*Tuple *[*float , str , Tuple *[*int , int ] , Tuple *[*float , float ] ] ]) –
List of tuples of noise information. Each tuple represents a noise type. Each tuple consists of (prob, noise_scp, num_to_mix, db_range).
- prob (float) is the probability of applying the noise type.
- noise_scp (str) is the path to the noise scp file.
- num_to_mix (Tuple[int, int]) is the range of the number of noises : to be mixed.
- db_range (Tuple[float, float]) is the range of noise levels in dB.
noise_apply_prob (float) – Probability of applying noise.
short_noise_thres (float) – Threshold of short noise.