Quickstart — nsight-python

Here’s the absolute minimal example to get started with Nsight Python. Just add a decorator to your function and wrap the kernel you want to profile with nsight.annotate():

import torch
import nsight

@nsight.analyze.kernel
def benchmark_matmul(n):
    """
    The simplest possible benchmark.
    We create two matrices and multiply them.
    """
    # Create two NxN matrices on GPU
    a = torch.randn(n, n, device="cuda")
    b = torch.randn(n, n, device="cuda")

    # Mark the kernel we want to profile
    with nsight.annotate("matmul"):
        c = a @ b

    return c

if __name__ == "__main__":
    # Run the benchmark
    result = benchmark_matmul(1024)
    print(result.to_dataframe())

That’s it! Nsight Python will automatically profile your kernel and collect metrics. The to_dataframe() method returns the results as a pandas DataFrame for easy analysis.

For more advanced examples including parameter sweeps, custom metrics, and visualization, check out the examples directory on GitHub.