espnet2.speechlm.dataloader.multimodal_loader.text_loader.TextReader
Less than 1 minute
espnet2.speechlm.dataloader.multimodal_loader.text_loader.TextReader
class espnet2.speechlm.dataloader.multimodal_loader.text_loader.TextReader(text_file: str, valid_ids: list | None = None)
Bases: object
Dict-like text reader supporting plain and JSONL formats.
Plain format: <id> <text content> JSONL format: {“id”: “<id>”, “text”: “<text content>”}
Format is determined by file suffix (.jsonl for JSONL, otherwise plain).
- Parameters:
- text_file – Path to text file (plain or JSONL format)
- valid_ids – List of valid IDs to keep (optional, keeps all if None)
items()
Return iterator over (id, text) pairs.
keys()
Return iterator over IDs.
values()
Return iterator over texts.
