LLM Wrapper of LangChain#

Hugging Face transformers Format#

IPEX-LLM provides TransformersLLM and TransformersPipelineLLM, which implement the standard interface of LLM wrapper of LangChain.

class ipex_llm.langchain.llms.transformersllm.TransformersLLM(*args: Any, **kwargs: Any)[source]#

Bases: langchain.llms.base.LLM

Wrapper around the BigDL-LLM Transformer-INT4 model


from ipex_llm.langchain.llms import TransformersLLM
llm = TransformersLLM.from_model_id(model_id="THUDM/chatglm-6b")
classmethod from_model_id(model_id: str, model_kwargs: Optional[dict] = None, device_map: str = 'cpu', tokenizer_id: Optional[str] = None, **kwargs: Any) langchain.llms.base.LLM[source]#

Construct object from model_id

  • model_id – Path for the huggingface repo id to be downloaded or the huggingface checkpoint folder.

  • model_kwargs – Keyword arguments that will be passed to the model and tokenizer.

  • kwargs – Extra arguments that will be passed to the model and tokenizer.


An object of TransformersLLM.

classmethod from_model_id_low_bit(model_id: str, model_kwargs: Optional[dict] = None, device_map: str = 'cpu', tokenizer_id: Optional[str] = None, **kwargs: Any) langchain.llms.base.LLM[source]#

Construct low_bit object from model_id :param model_id: Path for the bigdl transformers low-bit model checkpoint folder. :param model_kwargs: Keyword arguments that will be passed to the model and tokenizer. :param kwargs: Extra arguments that will be passed to the model and tokenizer.


An object of TransformersLLM.

Native Model#

For llama/chatglm/bloom/gptneox/starcoder model families, you could also use the following LLM wrappers with the native (cpp) implementation for maximum performance.

class ipex_llm.langchain.llms.LlamaLLM(*args: Any, **kwargs: Any)[source]#

Bases: ipex_llm.langchain.llms.bigdlllm._BaseCausalLM

validate_environment(values: Dict) Dict#

Validate that bigdl-llm is installed, family is supported

stream(prompt: str, stop: Optional[List[str]] = None, run_manager: Optional[langchain.callbacks.manager.CallbackManagerForLLMRun] = None) Generator[Dict, None, None]#

Yields results objects as they are generated in real time.

BETA: this is a beta feature while we figure out the right abstraction. Once that happens, this interface could change.

It also calls the callback manager’s on_llm_new_token event with similar parameters to the OpenAI LLM class method of the same name.

  • prompt – The prompts to pass into the model.

  • stop – Optional list of stop words to use when generating.


A generator representing the stream of tokens being generated.


A dictionary like objects containing a string token and metadata. See llama-cpp-python docs and below for more.


from ipex_llm.langchain.llms import LlamaLLM
llm = LlamaLLM(
    temperature = 0.5
for chunk in llm.stream("Ask 'Hi, how are you?' like a pirate:'",
    result = chunk["choices"][0]
    print(result["text"], end='', flush=True)
get_num_tokens(text: str) int#

Get the number of tokens that present in the text.

Useful for checking if an input will fit in a model’s context window.


text – The string input to tokenize.


The number of tokens in the text.

Embeddings Wrapper of LangChain#

Hugging Face transformers AutoModel#

Wrapper around BigdlLLM embedding models.

class ipex_llm.langchain.embeddings.transformersembeddings.TransformersEmbeddings(*args: Any, **kwargs: Any)[source]#

Bases: pydantic.BaseModel, langchain.embeddings.base.Embeddings

Wrapper around bigdl-llm transformers embedding models.

To use, you should have the transformers python package installed.


from ipex_llm.langchain.embeddings import TransformersEmbeddings
embeddings = TransformersEmbeddings.from_model_id(model_id)
classmethod from_model_id(model_id: str, model_kwargs: Optional[dict] = None, device_map: str = 'cpu', **kwargs: Any)[source]#

Construct object from model_id.

  • model_id – Path for the huggingface repo id to be downloaded or the huggingface checkpoint folder.

  • model_kwargs – Keyword arguments that will be passed to the model and tokenizer.

  • kwargs – Extra arguments that will be passed to the model and tokenizer.


An object of TransformersEmbeddings.

embed(text: str, **kwargs)[source]#

Compute doc embeddings using a HuggingFace transformer model.


texts – The list of texts to embed.


List of embeddings, one for each text.

embed_documents(texts: List[str]) List[List[float]][source]#

Compute doc embeddings using a HuggingFace transformer model.


texts – The list of texts to embed.


List of embeddings, one for each text.

embed_query(text: str) List[float][source]#

Compute query embeddings using a bigdl-llm transformer model.


text – The text to embed.


Embeddings for the text.

class ipex_llm.langchain.embeddings.transformersembeddings.TransformersBgeEmbeddings(*args: Any, **kwargs: Any)[source]#

Bases: ipex_llm.langchain.embeddings.transformersembeddings.TransformersEmbeddings

embed(text: str, **kwargs)[source]#

Compute doc embeddings using a HuggingFace transformer model.


texts – The list of texts to embed.


List of embeddings, one for each text.

Native Model#

For llama/bloom/gptneox/starcoder model families, you could also use the following wrappers.

class ipex_llm.langchain.embeddings.LlamaEmbeddings(*args: Any, **kwargs: Any)[source]#

Bases: ipex_llm.langchain.embeddings.bigdlllm._BaseEmbeddings

validate_environment(values: Dict) Dict#

Validate that bigdl-llm library is installed.

embed_documents(texts: List[str]) List[List[float]]#

Embed a list of documents using the optimized int4 model.


texts – The list of texts to embed.


List of embeddings, one for each text.

embed_query(text: str) List[float]#

Embed a query using the optimized int4 model.


text – The text to embed.


Embeddings for the text.