GitHub - NVIDIA-AI-IOT/inference_builder: A tool for generating python inference pipeline

Inference Builder

Overview

Inference Builder is a tool that automatically generates inference pipelines and integrates them into either a microservice or a standalone application. It takes an inference configuration file and an OpenAPI specification (when integrated with an HTTP server) as inputs, and may also require custom code snippets in certain cases.

The output of the tool is a Python package that can be used to build a microservice container image with a customized Dockerfile.

The Inference Builder consists of three major components:

Code templates: These are reusable modules for various inference backends and frameworks, as well as for API servers. They are optimized and tested, making them suitable for any model with specified inputs, outputs, and configuration parameters.
Common inference flow: It serves as the core logic that standardizes the end-to-end inference process—including data loading and pre-processing, model inference, post-processing, and integration with the API server. It supports pluggable inference backends and frameworks, enabling flexibility and performance optimization.
Command line tool: It generates a source code package by combining predefined code templates with the Common Inference Flow. It also automatically produces corresponding test cases and evaluation scripts to support validation and performance assessment.

Visit our documentation for more details:

Getting started

First, be sure your system meets the requirement.

Operating System	Python	CPU	GPU*
Ubuntu 24.04	3.12	x86, aarch64	Nvidia ADA, Hopper, Blackwell

*: If you only generate the inference pipeline without running it, no GPU is required.

Next, follow these steps to get started:

Install prerequisites

sudo apt update
sudo apt install protobuf-compiler
sudo apt install python3.12-venv python3.12-dev

Note for TEGRA users: If you're using a TEGRA device, you'll also need to install the Docker buildx plugin:

sudo apt install docker-buildx

Clone the repository

git clone https://github.com/NVIDIA-AI-IOT/inference_builder

Set up the virtual environment

cd inference_builder
git submodule update --init --recursive
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt

Play with the examples

Now you can try our examples to learn more. These examples span all supported backends and demonstrate their distinct inference flows.

Benefit of using Inference Builder

Compared to manually crafting inference source code, Inference Builder offers developers the following advantages:

Separation of concerns: Introduces a new programming paradigm that decouples inference data flow and server logic from the model implementation, allowing developers to focus solely on model behavior.
Backend flexibility: Standardizes data flow across different inference backends, enabling developers to switch to the optimal backend for their specific requirement without rewriting the entire pipeline.
Hardware acceleration: Automatically enables GPU-accelerated processing to boost performance.
Streaming support: Provides built-in support for streaming protocols such as RTSP with minimal configuration.
Standardized testing: Automates and standardizes test case generation to simplify validation and evaluation workflows.

Contributing

Contributions are welcome! Please feel free to submit a PR.

Project status and roadmap

The project is under active development and the following features are expected to be supported in the near future:

Support for more backends and frameworks such as VLLM and onnx runtime.
Support for more model types such as speech models.