espnet2.ps2st.espnet_model.ESPnetQwen2AudioModel

Less than 1 minute

espnet2.ps2st.espnet_model.ESPnetQwen2AudioModel

class espnet2.ps2st.espnet_model.ESPnetQwen2AudioModel(model_name: str = 'Qwen/Qwen2-Audio-7B-Instruct', vocab_size: int = 50000, token_list: Tuple[str, ...] | List[str] = (), ignore_id: int = -1, decode_config_path: str | None = None, pytest_mode: bool | None = False)

Bases: AbsESPnetModel

ESPnet model integrating Qwen2-Audio from transformers

Initialize internal Module state, shared by both nn.Module and ScriptModule.

collect_feats(speech: Tensor, speech_lengths: Tensor, text: Tensor, text_lengths: Tensor, **kwargs) → Dict[str, Tensor]

Collect features for statistics computation

forward(speech: Tensor, speech_lengths: Tensor, text: Tensor, text_lengths: Tensor, **kwargs) → Tuple[Tensor, Dict[str, Tensor], Tensor]

Forward pass required by AbsESPnetModel interface

Returns: Scalar tensor (dummy for inference-only) stats: Dictionary of statistics for logging weight: Batch size for normalization
Return type: loss

inference(input_ids: Tensor, attention_mask: Tensor, input_features: Tensor, feature_attention_mask: Tensor, **kwargs) → str

Custom inference method using Qwen2-Audio