entrypoints | Modular

Python module

Provides a high-level Python interface for running inference with large language models. The LLM class handles model loading, session management, and text generation with a simple API.

LLM

class max.entrypoints.llm.LLM(pipeline_config)

source

A high level interface for interacting with LLMs.

Parameters:

pipeline_config (PipelineConfig)

generate()

generate(prompts, max_new_tokens=100, use_tqdm=True)

source

Generates text completions for the given prompts.

This method is thread safe and may be used on the same LLM instance from multiple threads concurrently with no external synchronization.

Parameters:

  • prompts (str | Sequence[str]) – The input string or list of strings to generate completions for.
  • max_new_tokens (int | None) – The maximum number of tokens to generate in the response.
  • use_tqdm (bool) – Whether to display a progress bar during generation.

Returns:

A list of generated text completions corresponding to each input prompt.

Raises:

  • ValueError – If prompts is empty or contains invalid data.
  • RuntimeError – If the model fails to generate completions.

Return type:

Sequence[str]