espnet2.text.qwen2audio_tokenizer.Qwen2AudioTokenizer

Less than 1 minute

class espnet2.text.qwen2audio_tokenizer.Qwen2AudioTokenizer(model_name: str = 'Qwen/Qwen2-Audio-7B-Instruct')

Qwen2-Audio tokenizer that handles both text and audio inputs

create_multimodal_query(text_input: str, audio_input: Tuple[List[ndarray], int] | None = None) → Dict

Create query with both text and audio inputs for Qwen2-Audio.

This is the core tokenization process from the original example.

text2tokens(line: str) → List[str]

Convert text to tokens using Qwen2-Audio processor

tokens2text(tokens: Iterable[str]) → str

Convert tokens back to text