espnetez.dataset.ESPnetEZDataset
espnetez.dataset.ESPnetEZDataset
class espnetez.dataset.ESPnetEZDataset(dataset, data_info)
Bases: AbsDataset
A dataset class for handling ESPnet data with easy access to data information.
This class extends the AbsDataset class and provides functionalities to manage a dataset and its associated metadata. It allows users to retrieve dataset items using unique identifiers and check for available names in the dataset.
dataset
The dataset containing the actual data entries.
- Type: Union[list, Tuple]
data_info
A dictionary mapping attribute names to functions that extract those attributes from the dataset.
Type: Dict[str, callable]
Parameters:
- dataset (Union *[*list , Tuple ]) – The dataset from which data will be extracted.
- data_info (Dict *[*str , callable ]) – A dictionary where keys are attribute names and values are functions that process the dataset entries.
#
has_name(name
Checks if the given name exists in the data_info dictionary.
names() → Tuple[str, ...]
Returns a tuple of all names in the data_info.
__getitem__(uid
Union[str, int]) -> Tuple[str, Dict]: Retrieves the data entry corresponding to the provided unique identifier.
__len__() → int
Returns the total number of entries in the dataset.
######### Examples
>>> dataset = [
("audio1.wav", "transcription1"),
("audio2.wav", "transcription2")
]
>>> data_info = {
... "audio": lambda x: x[0],
... "transcription": lambda x: x[1]
... }
>>> ez_dataset = ESPnetEZDataset(dataset, data_info)
>>> ez_dataset.has_name("audio")
True
>>> ez_dataset.names()
('audio', 'transcription')
>>> ez_dataset[0]
('0', {'audio': 'audio1.wav', 'transcription': 'transcription1'})
>>> len(ez_dataset)
2
####### NOTE The dataset and data_info must be provided in a compatible format to ensure proper functionality of the methods.
#
has_name(name
Check if the specified name exists in the dataset’s data information.
This method searches the data_info attribute of the dataset to determine if the given name is present as a key. It is useful for validating whether certain attributes or features are available in the dataset.
- Parameters:name (str) – The name to search for in the dataset’s data information.
- Returns: True if the name exists in the data information; False otherwise.
- Return type: bool
######### Examples
>>> dataset = ESPnetEZDataset(dataset=[...],
data_info={'feature1': ..., 'feature2': ...})
>>> dataset.has_name('feature1')
True
>>> dataset.has_name('feature3')
False
####### NOTE The method performs a simple membership check using the in operator, which is efficient for dictionaries.
names() → Tuple[str, ...]
A dataset class for ESPnet that handles data retrieval and management.
This class extends the abstract dataset class to provide functionalities specific to the ESPnet framework. It manages a dataset and its associated metadata, allowing for efficient data access and manipulation.
dataset
The underlying dataset that contains the data.
- Type: Union[list, tuple]
data_info
A dictionary mapping names to functions that process each data entry in the dataset.
Type: Dict[str, callable]
Parameters:
- dataset (Union *[*list , tuple ]) – The dataset to be wrapped.
- data_info (Dict *[*str , callable ]) – A dictionary where keys are the names of the data attributes and values are functions that extract or transform the data from the dataset.
has_name(name
str) -> bool: Checks if a given name exists in the data_info.
names() → Tuple[str, ...]
Returns a tuple of all the names available in the data_info.
__getitem__(uid
Union[str, int]) -> Tuple[str, Dict]: Retrieves the data entry corresponding to the provided identifier.
__len__() → int
Returns the number of entries in the dataset.
######### Examples
>>> dataset = ESPnetEZDataset(dataset=[...],
data_info={'feature': lambda x: x.feature, 'label': lambda x: x.label})
>>> dataset.has_name('feature')
True
>>> dataset.names()
('feature', 'label')
>>> entry = dataset[0]
>>> print(entry)
('0', {'feature': ..., 'label': ...})
>>> len(dataset)
100
####### NOTE The functions provided in the data_info should be callable and should accept a single argument corresponding to an entry from the dataset.