espnet2.train package

espnet2.train.iterable_dataset

Iterable dataset module.

class espnet2.train.iterable_dataset.IterableESPnetDataset(path_name_type_list: Collection[Tuple[str, str, str]], preprocess: Optional[Callable[[str, Dict[str, numpy.ndarray]], Dict[str, numpy.ndarray]]] = None, float_dtype: str = 'float32', int_dtype: str = 'long', key_file: Optional[str] = None)[source]

Bases: torch.utils.data.dataset.IterableDataset

Pytorch Dataset class for ESPNet.

Examples

>>> dataset = IterableESPnetDataset([('wav.scp', 'input', 'sound'),
...                                  ('token_int', 'output', 'text_int')],
...                                )
>>> for uid, data in dataset:
...     data
{'input': per_utt_array, 'output': per_utt_array}
has_name(name) → bool[source]
names() → Tuple[str, ...][source]
espnet2.train.iterable_dataset.load_kaldi(input)[source]

espnet2.train.uasr_trainer

Trainer module for GAN-based UASR training.

class espnet2.train.uasr_trainer.UASRTrainer[source]

Bases: espnet2.train.trainer.Trainer

Trainer for GAN-based UASR training.

If you’d like to use this trainer, the model must inherit espnet.train.abs_gan_espnet_model.AbsGANESPnetModel.

classmethod add_arguments(parser: argparse.ArgumentParser)[source]

Add additional arguments for GAN-trainer.

classmethod build_options(args: argparse.Namespace) → espnet2.train.trainer.TrainerOptions[source]

Build options consumed by train(), eval(), and plot_attention().

classmethod train_one_epoch(model: torch.nn.modules.module.Module, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], reporter: espnet2.train.reporter.SubReporter, summary_writer, options: espnet2.train.uasr_trainer.UASRTrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → bool[source]

Train one epoch for UASR.

classmethod validate_one_epoch(model: torch.nn.modules.module.Module, iterator: Iterable[Dict[str, torch.Tensor]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.uasr_trainer.UASRTrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → None[source]

Validate one epoch.

class espnet2.train.uasr_trainer.UASRTrainerOptions(ngpu: int, resume: bool, use_amp: bool, train_dtype: str, grad_noise: bool, accum_grad: int, grad_clip: float, grad_clip_type: float, log_interval: Optional[int], no_forward_run: bool, use_matplotlib: bool, use_tensorboard: bool, use_wandb: bool, adapter: str, use_adapter: bool, save_strategy: str, output_dir: Union[pathlib.Path, str], max_epoch: int, seed: int, sharded_ddp: bool, patience: Optional[int], keep_nbest_models: Union[int, List[int]], nbest_averaging_interval: int, early_stopping_criterion: Sequence[str], best_model_criterion: Sequence[Sequence[str]], val_scheduler_criterion: Sequence[str], unused_parameters: bool, wandb_model_log_interval: int, create_graph_in_tensorboard: bool, generator_first: bool, max_num_warning: int)[source]

Bases: espnet2.train.trainer.TrainerOptions

Trainer option dataclass for UASRTrainer.

espnet2.train.gan_trainer

Trainer module for GAN-based training.

class espnet2.train.gan_trainer.GANTrainer[source]

Bases: espnet2.train.trainer.Trainer

Trainer for GAN-based training.

If you’d like to use this trainer, the model must inherit espnet.train.abs_gan_espnet_model.AbsGANESPnetModel.

classmethod add_arguments(parser: argparse.ArgumentParser)[source]

Add additional arguments for GAN-trainer.

classmethod build_options(args: argparse.Namespace) → espnet2.train.trainer.TrainerOptions[source]

Build options consumed by train(), eval(), and plot_attention().

classmethod train_one_epoch(model: torch.nn.modules.module.Module, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], reporter: espnet2.train.reporter.SubReporter, summary_writer, options: espnet2.train.gan_trainer.GANTrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → bool[source]

Train one epoch.

classmethod validate_one_epoch(model: torch.nn.modules.module.Module, iterator: Iterable[Dict[str, torch.Tensor]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.gan_trainer.GANTrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → None[source]

Validate one epoch.

class espnet2.train.gan_trainer.GANTrainerOptions(ngpu: int, resume: bool, use_amp: bool, train_dtype: str, grad_noise: bool, accum_grad: int, grad_clip: float, grad_clip_type: float, log_interval: Optional[int], no_forward_run: bool, use_matplotlib: bool, use_tensorboard: bool, use_wandb: bool, adapter: str, use_adapter: bool, save_strategy: str, output_dir: Union[pathlib.Path, str], max_epoch: int, seed: int, sharded_ddp: bool, patience: Optional[int], keep_nbest_models: Union[int, List[int]], nbest_averaging_interval: int, early_stopping_criterion: Sequence[str], best_model_criterion: Sequence[Sequence[str]], val_scheduler_criterion: Sequence[str], unused_parameters: bool, wandb_model_log_interval: int, create_graph_in_tensorboard: bool, generator_first: bool)[source]

Bases: espnet2.train.trainer.TrainerOptions

Trainer option dataclass for GANTrainer.

espnet2.train.collate_fn

class espnet2.train.collate_fn.CommonCollateFn(float_pad_value: Union[float, int] = 0.0, int_pad_value: int = -32768, not_sequence: Collection[str] = ())[source]

Bases: object

Functor class of common_collate_fn()

class espnet2.train.collate_fn.HuBERTCollateFn(float_pad_value: Union[float, int] = 0.0, int_pad_value: int = -32768, label_downsampling: int = 1, pad: bool = False, rand_crop: bool = True, crop_audio: bool = True, not_sequence: Collection[str] = (), window_size: float = 25, window_shift: float = 20, sample_rate: float = 16)[source]

Bases: espnet2.train.collate_fn.CommonCollateFn

Functor class of common_collate_fn()

espnet2.train.collate_fn.common_collate_fn(data: Collection[Tuple[str, Dict[str, numpy.ndarray]]], float_pad_value: Union[float, int] = 0.0, int_pad_value: int = -32768, not_sequence: Collection[str] = ()) → Tuple[List[str], Dict[str, torch.Tensor]][source]

Concatenate ndarray-list to an array and convert to torch.Tensor.

Examples

>>> from espnet2.samplers.constant_batch_sampler import ConstantBatchSampler,
>>> import espnet2.tasks.abs_task
>>> from espnet2.train.dataset import ESPnetDataset
>>> sampler = ConstantBatchSampler(...)
>>> dataset = ESPnetDataset(...)
>>> keys = next(iter(sampler)
>>> batch = [dataset[key] for key in keys]
>>> batch = common_collate_fn(batch)
>>> model(**batch)

Note that the dict-keys of batch are propagated from that of the dataset as they are.

espnet2.train.reporter

Reporter module.

class espnet2.train.reporter.Average(value: Union[float, int, complex, torch.Tensor, numpy.ndarray])[source]

Bases: espnet2.train.reporter.ReportedValue

class espnet2.train.reporter.ReportedValue[source]

Bases: object

class espnet2.train.reporter.Reporter(epoch: int = 0)[source]

Bases: object

Reporter class.

Examples

>>> reporter = Reporter()
>>> with reporter.observe('train') as sub_reporter:
...     for batch in iterator:
...         stats = dict(loss=0.2)
...         sub_reporter.register(stats)
check_early_stopping(patience: int, key1: str, key2: str, mode: str, epoch: int = None, logger=None) → bool[source]
finish_epoch(sub_reporter: espnet2.train.reporter.SubReporter) → None[source]
get_all_keys(epoch: int = None) → Tuple[Tuple[str, str], ...][source]
get_best_epoch(key: str, key2: str, mode: str, nbest: int = 0) → int[source]
get_epoch() → int[source]
get_keys(epoch: int = None) → Tuple[str, ...][source]

Returns keys1 e.g. train,eval.

get_keys2(key: str, epoch: int = None) → Tuple[str, ...][source]

Returns keys2 e.g. loss,acc.

get_value(key: str, key2: str, epoch: int = None)[source]
has(key: str, key2: str, epoch: int = None) → bool[source]
load_state_dict(state_dict: dict)[source]
log_message(epoch: int = None) → str[source]
matplotlib_plot(output_dir: Union[str, pathlib.Path])[source]

Plot stats using Matplotlib and save images.

observe(key: str, epoch: int = None) → AbstractContextManager[espnet2.train.reporter.SubReporter][source]
set_epoch(epoch: int) → None[source]
sort_epochs(key: str, key2: str, mode: str) → List[int][source]
sort_epochs_and_values(key: str, key2: str, mode: str) → List[Tuple[int, float]][source]

Return the epoch which resulted the best value.

Example

>>> val = reporter.sort_epochs_and_values('eval', 'loss', 'min')
>>> e_1best, v_1best = val[0]
>>> e_2best, v_2best = val[1]
sort_values(key: str, key2: str, mode: str) → List[float][source]
start_epoch(key: str, epoch: int = None) → espnet2.train.reporter.SubReporter[source]
state_dict()[source]
tensorboard_add_scalar(summary_writer, epoch: int = None, key1: Optional[str] = None)[source]
wandb_log(epoch: int = None)[source]
class espnet2.train.reporter.SubReporter(key: str, epoch: int, total_count: int)[source]

Bases: object

This class is used in Reporter.

See the docstring of Reporter for the usage.

finished() → None[source]
get_epoch() → int[source]
get_total_count() → int[source]

Returns the number of iterations over all epochs.

log_message(start: int = None, end: int = None) → str[source]
measure_iter_time(iterable, name: str)[source]
measure_time(name: str)[source]
next()[source]

Close up this step and reset state for the next step

register(stats: Dict[str, Union[float, int, complex, torch.Tensor, numpy.ndarray, Dict[str, Union[float, int, complex, torch.Tensor, numpy.ndarray]], None]], weight: Union[float, int, complex, torch.Tensor, numpy.ndarray, None] = None) → None[source]
tensorboard_add_scalar(summary_writer, start: int = None)[source]
wandb_log(start: int = None)[source]
class espnet2.train.reporter.WeightedAverage(value: Tuple[Union[float, int, complex, torch.Tensor, numpy.ndarray], Union[float, int, complex, torch.Tensor, numpy.ndarray]], weight: Union[float, int, complex, torch.Tensor, numpy.ndarray])[source]

Bases: espnet2.train.reporter.ReportedValue

espnet2.train.reporter.aggregate(values: Sequence[ReportedValue]) → Union[float, int, complex, torch.Tensor, numpy.ndarray][source]
espnet2.train.reporter.to_reported_value(v: Union[float, int, complex, torch.Tensor, numpy.ndarray], weight: Union[float, int, complex, torch.Tensor, numpy.ndarray, None] = None) → espnet2.train.reporter.ReportedValue[source]
espnet2.train.reporter.wandb_get_prefix(key: str)[source]

espnet2.train.abs_espnet_model

class espnet2.train.abs_espnet_model.AbsESPnetModel(*args, **kwargs)[source]

Bases: torch.nn.modules.module.Module, abc.ABC

The common abstract class among each tasks

“ESPnetModel” is referred to a class which inherits torch.nn.Module, and makes the dnn-models forward as its member field, a.k.a delegate pattern, and defines “loss”, “stats”, and “weight” for the task.

If you intend to implement new task in ESPNet, the model must inherit this class. In other words, the “mediator” objects between our training system and the your task class are just only these three values, loss, stats, and weight.

Example

>>> from espnet2.tasks.abs_task import AbsTask
>>> class YourESPnetModel(AbsESPnetModel):
...     def forward(self, input, input_lengths):
...         ...
...         return loss, stats, weight
>>> class YourTask(AbsTask):
...     @classmethod
...     def build_model(cls, args: argparse.Namespace) -> YourESPnetModel:

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract collect_feats(**batch) → Dict[str, torch.Tensor][source]
abstract forward(**batch) → Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

espnet2.train.preprocessor

class espnet2.train.preprocessor.AbsPreprocessor(train: bool)[source]

Bases: abc.ABC

class espnet2.train.preprocessor.CommonPreprocessor(train: bool, use_lang_prompt: bool = False, use_nlp_prompt: bool = False, token_type: Optional[str] = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: Optional[str] = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: Optional[str] = None, rir_scp: Optional[str] = None, rir_apply_prob: float = 1.0, noise_scp: Optional[str] = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, aux_task_names: Collection[str] = None, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: str = 'text', fs: int = 0, nonsplit_symbol: Iterable[str] = None, data_aug_effects: List = None, data_aug_num: List[int] = [1, 1], data_aug_prob: float = 0.0, whisper_language: Optional[str] = None, whisper_task: Optional[str] = None)[source]

Bases: espnet2.train.preprocessor.AbsPreprocessor

class espnet2.train.preprocessor.CommonPreprocessor_multi(train: bool, use_lang_prompt: bool = False, use_nlp_prompt: bool = False, token_type: Optional[str] = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: Optional[str] = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: Optional[str] = None, rir_scp: Optional[str] = None, rir_apply_prob: float = 1.0, noise_scp: Optional[str] = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, aux_task_names: Collection[str] = None, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: List[str] = ['text'], fs: int = 0, speaker_change_symbol: Iterable[str] = None, data_aug_effects: List = None, data_aug_num: List[int] = [1, 1], data_aug_prob: float = 0.0, whisper_language: Optional[str] = None, whisper_task: Optional[str] = None)[source]

Bases: espnet2.train.preprocessor.CommonPreprocessor

class espnet2.train.preprocessor.DynamicMixingPreprocessor(train: bool, source_scp: Optional[str] = None, ref_num: int = 2, dynamic_mixing_gain_db: float = 0.0, speech_name: str = 'speech_mix', speech_ref_name_prefix: str = 'speech_ref', mixture_source_name: Optional[str] = None, utt2spk: Optional[str] = None, categories: Optional[List] = None)[source]

Bases: espnet2.train.preprocessor.AbsPreprocessor

class espnet2.train.preprocessor.EnhPreprocessor(train: bool, rir_scp: Optional[str] = None, rir_apply_prob: float = 1.0, noise_scp: Optional[str] = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech_mix', speech_ref_name_prefix: str = 'speech_ref', noise_ref_name_prefix: str = 'noise_ref', dereverb_ref_name_prefix: str = 'dereverb_ref', use_reverberant_ref: bool = False, num_spk: int = 1, num_noise_type: int = 1, sample_rate: int = 8000, force_single_channel: bool = False, channel_reordering: bool = False, categories: Optional[List] = None, data_aug_effects: List = None, data_aug_num: List[int] = [1, 1], data_aug_prob: float = 0.0, speech_segment: Optional[int] = None, avoid_allzero_segment: bool = True, flexible_numspk: bool = False)[source]

Bases: espnet2.train.preprocessor.CommonPreprocessor

Preprocessor for Speech Enhancement (Enh) task.

class espnet2.train.preprocessor.MutliTokenizerCommonPreprocessor(train: bool, token_type: List[str] = [None], token_list: List[Union[pathlib.Path, str, Iterable[str]]] = [None], bpemodel: List[Union[pathlib.Path, str, Iterable[str]]] = [None], text_cleaner: Collection[str] = None, g2p_type: Union[List[str], str] = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: Optional[str] = None, rir_scp: Optional[str] = None, rir_apply_prob: float = 1.0, noise_scp: Optional[str] = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: List[str] = ['text'], tokenizer_encode_conf: List[Dict] = [{}, {}], fs: int = 0, data_aug_effects: List = None, data_aug_num: List[int] = [1, 1], data_aug_prob: float = 0.0, whisper_language: List[str] = None, whisper_task: Optional[str] = None)[source]

Bases: espnet2.train.preprocessor.CommonPreprocessor

class espnet2.train.preprocessor.S2TPreprocessor(train: bool, token_type: Optional[str] = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: Optional[str] = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: Optional[str] = None, rir_scp: Optional[str] = None, rir_apply_prob: float = 1.0, noise_scp: Optional[str] = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: str = 'text', text_prev_name: str = 'text_prev', text_ctc_name: str = 'text_ctc', fs: int = 16000, na_symbol: str = '<na>', speech_length: float = 30, speech_resolution: float = 0.02, speech_init_silence: float = 1.0, text_prev_apply_prob: float = 0.5, time_apply_prob: float = 0.5, notime_symbol: str = '<notimestamps>', first_time_symbol: str = '<0.00>', last_time_symbol: str = '<30.00>')[source]

Bases: espnet2.train.preprocessor.CommonPreprocessor

class espnet2.train.preprocessor.SLUPreprocessor(train: bool, token_type: Optional[str] = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, transcript_token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: Optional[str] = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: Optional[str] = None, rir_scp: Optional[str] = None, rir_apply_prob: float = 1.0, noise_scp: Optional[str] = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: str = 'text', fs: int = 0, data_aug_effects: List = None, data_aug_num: List[int] = [1, 1], data_aug_prob: float = 0.0)[source]

Bases: espnet2.train.preprocessor.CommonPreprocessor

class espnet2.train.preprocessor.SVSPreprocessor(train: bool, token_type: Optional[str] = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: Optional[str] = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: Optional[str] = None, singing_volume_normalize: float = None, singing_name: str = 'singing', text_name: str = 'text', label_name: str = 'label', midi_name: str = 'score', fs: numpy.int32 = 0, hop_length: numpy.int32 = 256, phn_seg: dict = {1: [1], 2: [0.25, 1], 3: [0.1, 0.5, 1], 4: [0.05, 0.1, 0.5, 1]})[source]

Bases: espnet2.train.preprocessor.AbsPreprocessor

Preprocessor for Sing Voice Sythesis (SVS) task.

class espnet2.train.preprocessor.SpkPreprocessor(train: bool, target_duration: float, spk2utt: Optional[str] = None, sample_rate: int = 16000, num_eval: int = 10, rir_scp: Optional[str] = None, rir_apply_prob: float = 1.0, noise_info: List[Tuple[float, str, Tuple[int, int], Tuple[float, float]]] = None, noise_apply_prob: float = 1.0, short_noise_thres: float = 0.5)[source]

Bases: espnet2.train.preprocessor.CommonPreprocessor

Preprocessor for Speaker tasks.

Parameters:
  • train (bool) – Whether to use in training mode.

  • spk2utt (str) – Path to the spk2utt file.

  • target_duration (float) – Target duration in seconds.

  • sample_rate (int) – Sampling rate.

  • num_eval (int) – Number of utterances to be used for evaluation.

  • rir_scp (str) – Path to the RIR scp file.

  • rir_apply_prob (float) – Probability of applying RIR.

  • noise_info (List[Tuple[float, str, Tuple[int, int], Tuple[float, float]]]) –

    List of tuples of noise information. Each tuple represents a noise type. Each tuple consists of (prob, noise_scp, num_to_mix, db_range).

    • prob (float) is the probability of applying the noise type.

    • noise_scp (str) is the path to the noise scp file.

    • num_to_mix (Tuple[int, int]) is the range of the number of noises

      to be mixed.

    • db_range (Tuple[float, float]) is the range of noise levels in dB.

  • noise_apply_prob (float) – Probability of applying noise.

  • short_noise_thres (float) – Threshold of short noise.

class espnet2.train.preprocessor.TSEPreprocessor(train: bool, train_spk2enroll: Optional[str] = None, enroll_segment: int = None, load_spk_embedding: bool = False, load_all_speakers: bool = False, rir_scp: Optional[str] = None, rir_apply_prob: float = 1.0, noise_scp: Optional[str] = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech_mix', speech_ref_name_prefix: str = 'speech_ref', noise_ref_name_prefix: str = 'noise_ref', dereverb_ref_name_prefix: str = 'dereverb_ref', use_reverberant_ref: bool = False, num_spk: int = 1, num_noise_type: int = 1, sample_rate: int = 8000, force_single_channel: bool = False, channel_reordering: bool = False, categories: Optional[List] = None, data_aug_effects: List = None, data_aug_num: List[int] = [1, 1], data_aug_prob: float = 0.0, speech_segment: Optional[int] = None, avoid_allzero_segment: bool = True, flexible_numspk: bool = False)[source]

Bases: espnet2.train.preprocessor.EnhPreprocessor

Preprocessor for Target Speaker Extraction.

espnet2.train.preprocessor.any_allzero(signal)[source]
espnet2.train.preprocessor.detect_non_silence(x: numpy.ndarray, threshold: float = 0.01, frame_length: int = 1024, frame_shift: int = 512, window: str = 'boxcar') → numpy.ndarray[source]

Power based voice activity detection.

Parameters:

x – (Channel, Time)

>>> x = np.random.randn(1000)
>>> detect = detect_non_silence(x)
>>> assert x.shape == detect.shape
>>> assert detect.dtype == np.bool
espnet2.train.preprocessor.framing(x, frame_length: int = 512, frame_shift: int = 256, centered: bool = True, padded: bool = True)[source]

espnet2.train.abs_gan_espnet_model

ESPnetModel abstract class for GAN-based training.

class espnet2.train.abs_gan_espnet_model.AbsGANESPnetModel(*args, **kwargs)[source]

Bases: espnet2.train.abs_espnet_model.AbsESPnetModel, torch.nn.modules.module.Module, abc.ABC

The common abstract class among each GAN-based task.

“ESPnetModel” is referred to a class which inherits torch.nn.Module, and makes the dnn-models “forward” as its member field, a.k.a delegate pattern. And “forward” must accept the argument “forward_generator” and Return the dict of “loss”, “stats”, “weight”, and “optim_idx”. “optim_idx” for generator must be 0 and that for discriminator must be 1.

Example

>>> from espnet2.tasks.abs_task import AbsTask
>>> class YourESPnetModel(AbsGANESPnetModel):
...     def forward(self, input, input_lengths, forward_generator=True):
...         ...
...         if forward_generator:
...             # return loss for the generator
...             # optim idx 0 indicates generator optimizer
...             return dict(loss=loss, stats=stats, weight=weight, optim_idx=0)
...         else:
...             # return loss for the discriminator
...             # optim idx 1 indicates discriminator optimizer
...             return dict(loss=loss, stats=stats, weight=weight, optim_idx=1)
>>> class YourTask(AbsTask):
...     @classmethod
...     def build_model(cls, args: argparse.Namespace) -> YourESPnetModel:

Initializes internal Module state, shared by both nn.Module and ScriptModule.

abstract collect_feats(**batch) → Dict[str, torch.Tensor][source]
abstract forward(forward_generator: bool = True, **batch) → Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor], int]][source]

Return the generator loss or the discrimiantor loss.

This method must have an argument “forward_generator” to switch the generator loss calculation and the discrimiantor loss calculation. If forward_generator is true, return the generator loss with optim_idx 0. If forward_generator is false, return the discrimiantor loss with optim_idx 1.

Parameters:

forward_generator (bool) – Whether to return the generator loss or the discrimiantor loss. This must have the default value.

Returns:

  • loss (Tensor): Loss scalar tensor.

  • stats (Dict[str, float]): Statistics to be monitored.

  • weight (Tensor): Weight tensor to summarize losses.

  • optim_idx (int): Optimizer index (0 for G and 1 for D).

Return type:

Dict[str, Any]

espnet2.train.trainer

Trainer module.

class espnet2.train.trainer.Trainer[source]

Bases: object

Trainer having a optimizer.

If you’d like to use multiple optimizers, then inherit this class and override the methods if necessary - at least “train_one_epoch()”

>>> class TwoOptimizerTrainer(Trainer):
...     @classmethod
...     def add_arguments(cls, parser):
...         ...
...
...     @classmethod
...     def train_one_epoch(cls, model, optimizers, ...):
...         loss1 = model.model1(...)
...         loss1.backward()
...         optimizers[0].step()
...
...         loss2 = model.model2(...)
...         loss2.backward()
...         optimizers[1].step()
classmethod add_arguments(parser: argparse.ArgumentParser)[source]

Reserved for future development of another Trainer

classmethod build_options(args: argparse.Namespace) → espnet2.train.trainer.TrainerOptions[source]

Build options consumed by train(), eval(), and plot_attention()

classmethod plot_attention(model: torch.nn.modules.module.Module, output_dir: Optional[pathlib.Path], summary_writer, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.trainer.TrainerOptions) → None[source]
static resume(checkpoint: Union[str, pathlib.Path], model: torch.nn.modules.module.Module, reporter: espnet2.train.reporter.Reporter, optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], ngpu: int = 0, strict: bool = True)[source]
classmethod run(model: espnet2.train.abs_espnet_model.AbsESPnetModel, optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], train_iter_factory: espnet2.iterators.abs_iter_factory.AbsIterFactory, valid_iter_factory: espnet2.iterators.abs_iter_factory.AbsIterFactory, plot_attention_iter_factory: Optional[espnet2.iterators.abs_iter_factory.AbsIterFactory], trainer_options, distributed_option: espnet2.train.distributed_utils.DistributedOption) → None[source]

Perform training. This method performs the main process of training.

classmethod train_one_epoch(model: torch.nn.modules.module.Module, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], reporter: espnet2.train.reporter.SubReporter, summary_writer, options: espnet2.train.trainer.TrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → bool[source]
classmethod validate_one_epoch(model: torch.nn.modules.module.Module, iterator: Iterable[Dict[str, torch.Tensor]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.trainer.TrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → None[source]
class espnet2.train.trainer.TrainerOptions(ngpu: int, resume: bool, use_amp: bool, train_dtype: str, grad_noise: bool, accum_grad: int, grad_clip: float, grad_clip_type: float, log_interval: Union[int, NoneType], no_forward_run: bool, use_matplotlib: bool, use_tensorboard: bool, use_wandb: bool, adapter: str, use_adapter: bool, save_strategy: str, output_dir: Union[pathlib.Path, str], max_epoch: int, seed: int, sharded_ddp: bool, patience: Union[int, NoneType], keep_nbest_models: Union[int, List[int]], nbest_averaging_interval: int, early_stopping_criterion: Sequence[str], best_model_criterion: Sequence[Sequence[str]], val_scheduler_criterion: Sequence[str], unused_parameters: bool, wandb_model_log_interval: int, create_graph_in_tensorboard: bool)[source]

Bases: object

espnet2.train.dataset

class espnet2.train.dataset.AbsDataset[source]

Bases: torch.utils.data.dataset.Dataset, abc.ABC

abstract has_name(name) → bool[source]
abstract names() → Tuple[str, ...][source]
class espnet2.train.dataset.AdapterForLabelScpReader(loader)[source]

Bases: collections.abc.Mapping

keys() → a set-like object providing a view on D's keys[source]
class espnet2.train.dataset.AdapterForSingingScoreScpReader(loader)[source]

Bases: collections.abc.Mapping

keys() → a set-like object providing a view on D's keys[source]
class espnet2.train.dataset.AdapterForSoundScpReader(loader, dtype=None, allow_multi_rates=False)[source]

Bases: collections.abc.Mapping

keys() → a set-like object providing a view on D's keys[source]
class espnet2.train.dataset.ESPnetDataset(path_name_type_list: Collection[Tuple[str, str, str]], preprocess: Optional[Callable[[str, Dict[str, numpy.ndarray]], Dict[str, numpy.ndarray]]] = None, float_dtype: str = 'float32', int_dtype: str = 'long', max_cache_size: Union[float, int, str] = 0.0, max_cache_fd: int = 0, allow_multi_rates: bool = False)[source]

Bases: espnet2.train.dataset.AbsDataset

Pytorch Dataset class for ESPNet.

Examples

>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound'),
...                          ('token_int', 'output', 'text_int')],
...                         )
... uttid, data = dataset['uttid']
{'input': per_utt_array, 'output': per_utt_array}
has_name(name) → bool[source]
names() → Tuple[str, ...][source]
class espnet2.train.dataset.H5FileWrapper(path: str)[source]

Bases: object

espnet2.train.dataset.kaldi_loader(path, float_dtype=None, max_cache_fd: int = 0, allow_multi_rates=False)[source]
espnet2.train.dataset.label_loader(path)[source]
espnet2.train.dataset.multi_columns_sound_loader(path, float_dtype=None, allow_multi_rates=False)[source]
espnet2.train.dataset.rand_int_loader(filepath, loader_type)[source]
espnet2.train.dataset.score_loader(path)[source]
espnet2.train.dataset.sound_loader(path, float_dtype=None, multi_columns=False, allow_multi_rates=False)[source]
espnet2.train.dataset.variable_columns_sound_loader(path, float_dtype=None, allow_multi_rates=False)[source]

espnet2.train.__init__

espnet2.train.class_choices

class espnet2.train.class_choices.ClassChoices(name: str, classes: Mapping[str, Type], type_check: Optional[Type] = None, default: Optional[str] = None, optional: bool = False)[source]

Bases: object

Helper class to manage the options for variable objects and its configuration.

Example:

>>> class A:
...     def __init__(self, foo=3):  pass
>>> class B:
...     def __init__(self, bar="aaaa"):  pass
>>> choices = ClassChoices("var", dict(a=A, b=B), default="a")
>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> choices.add_arguments(parser)
>>> args = parser.parse_args(["--var", "a", "--var_conf", "foo=4")
>>> args.var
a
>>> args.var_conf
{"foo": 4}
>>> class_obj = choices.get_class(args.var)
>>> a_object = class_obj(**args.var_conf)
add_arguments(parser)[source]
choices() → Tuple[Optional[str], ...][source]
get_class(name: Optional[str]) → Optional[type][source]

espnet2.train.spk_trainer

Trainer module for speaker recognition. In speaker recognition (embedding extractor training/inference), calculating validation loss in closed set is not informative since generalization in unseen utterances from known speakers are good in most cases. Thus, we measure open set equal error rate (EER) using unknown speakers by overriding validate_one_epoch.

class espnet2.train.spk_trainer.SpkTrainer[source]

Bases: espnet2.train.trainer.Trainer

Trainer designed for speaker recognition.

Training will be done as closed set classification. Validation will be open set EER calculation.

classmethod extract_embed(model: torch.nn.modules.module.Module, iterator: Iterable[Dict[str, torch.Tensor]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.trainer.TrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption, output_dir: str, custom_bs: int, average: bool = False) → None[source]
classmethod validate_one_epoch(model: torch.nn.modules.module.Module, iterator: Iterable[Dict[str, torch.Tensor]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.trainer.TrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → None[source]

espnet2.train.distributed_utils

class espnet2.train.distributed_utils.DistributedOption(distributed: bool = False, dist_backend: str = 'nccl', dist_init_method: str = 'env://', dist_world_size: Union[int, NoneType] = None, dist_rank: Union[int, NoneType] = None, local_rank: Union[int, NoneType] = None, ngpu: int = 0, dist_master_addr: Union[str, NoneType] = None, dist_master_port: Union[int, NoneType] = None, dist_launcher: Union[str, NoneType] = None, multiprocessing_distributed: bool = True)[source]

Bases: object

dist_backend = 'nccl'
dist_init_method = 'env://'
dist_launcher = None
dist_master_addr = None
dist_master_port = None
dist_rank = None
dist_world_size = None
distributed = False
init_options()[source]
init_torch_distributed()[source]
local_rank = None
multiprocessing_distributed = True
ngpu = 0
espnet2.train.distributed_utils.free_port()[source]

Find free port using bind().

There are some interval between finding this port and using it and the other process might catch the port by that time. Thus it is not guaranteed that the port is really empty.

espnet2.train.distributed_utils.get_local_rank(prior=None, launcher: Optional[str] = None) → Optional[int][source]
espnet2.train.distributed_utils.get_master_addr(prior=None, launcher: Optional[str] = None) → Optional[str][source]
espnet2.train.distributed_utils.get_master_port(prior=None) → Optional[int][source]
espnet2.train.distributed_utils.get_node_rank(prior=None, launcher: Optional[str] = None) → Optional[int][source]

Get Node Rank.

Use for “multiprocessing distributed” mode. The initial RANK equals to the Node id in this case and the real Rank is set as (nGPU * NodeID) + LOCAL_RANK in torch.distributed.

espnet2.train.distributed_utils.get_num_nodes(prior=None, launcher: Optional[str] = None) → Optional[int][source]

Get the number of nodes.

Use for “multiprocessing distributed” mode. RANK equals to the Node id in this case and the real Rank is set as (nGPU * NodeID) + LOCAL_RANK in torch.distributed.

espnet2.train.distributed_utils.get_rank(prior=None, launcher: Optional[str] = None) → Optional[int][source]
espnet2.train.distributed_utils.get_world_size(prior=None, launcher: Optional[str] = None) → int[source]
espnet2.train.distributed_utils.is_in_slurm_job() → bool[source]
espnet2.train.distributed_utils.is_in_slurm_step() → bool[source]
espnet2.train.distributed_utils.resolve_distributed_mode(args)[source]