espnet2.sds.llm.hugging_face_llm.HuggingFaceLLM

Less than 1 minute

espnet2.sds.llm.hugging_face_llm.HuggingFaceLLM

class espnet2.sds.llm.hugging_face_llm.HuggingFaceLLM(access_token: str, tag: str = 'meta-llama/Llama-3.2-1B-Instruct', device: str = 'cuda', dtype: str = 'float16')

Bases: AbsLLM

Hugging Face LLM

A class for initializing a text response generator

using the Transformers library.

Parameters:
- access_token (str) – The access token required for downloading models from Hugging Face.
- tag (str , optional) – The model tag for the pre-trained language model. Defaults to “meta-llama/Llama-3.2-1B-Instruct”.
- device (str , optional) – The device to run the inference on. Defaults to “cuda”.
- dtype (str , optional) – The data type for model computation. Defaults to “float16”.
Raises:ImportError – If the transformers library is not installed.

forward(chat_messages: List[dict]) → str

Generate a response from the language model based on

the provided chat messages.

Parameters:chat_messages (List *[*dict ]) – A list of chat messages, where each message is a dictionary containing the conversation history. Each dictionary should have keys like “role” (e.g., “user”, “assistant”) and “content” (the message text).
Returns: The generated response text from the language model.
Return type: str

Notes

The model generates a response with a maximum of 64

new tokens and a deterministic sampling strategy (temperature set to 0 and do_sample set to False).

warmup()

Perform a single forward pass with dummy input to

pre-load and warm up the model.