espnet2.sds.llm.hugging_face_llm.HuggingFaceLLM
Less than 1 minute
espnet2.sds.llm.hugging_face_llm.HuggingFaceLLM
class espnet2.sds.llm.hugging_face_llm.HuggingFaceLLM(access_token: str, tag: str = 'meta-llama/Llama-3.2-1B-Instruct', device: str = 'cuda', dtype: str = 'float16')
Bases: AbsLLM
Hugging Face LLM
A class for initializing a text response generator using the Transformers library.
- Parameters:
- access_token (str) – The access token required for downloading models from Hugging Face.
- tag (str , optional) – The model tag for the pre-trained language model. Defaults to “meta-llama/Llama-3.2-1B-Instruct”.
- device (str , optional) – The device to run the inference on. Defaults to “cuda”.
- dtype (str , optional) – The data type for model computation. Defaults to “float16”.
- Raises:ImportError – If the transformers library is not installed.
forward(chat_messages: List[dict]) → str
Generate a response from the language model based on the provided chat messages.
- Parameters:chat_messages (List *[*dict ]) – A list of chat messages, where each message is a dictionary containing the conversation history. Each dictionary should have keys like “role” (e.g., “user”, “assistant”) and “content” (the message text).
- Returns: The generated response text from the language model.
- Return type: str
Notes
- The model generates a response with a maximum of 64
new tokens and a deterministic sampling strategy (temperature set to 0 and do_sample set to False).
warmup()
Perform a single forward pass with dummy input to pre-load and warm up the model.