NASM lexer fixes by seanthegeek · Pull Request #3059 · pygments/pygments

Note: This patch was written by Claude Sonnet 4.6 via the Claude Code CLI. I know the Pygments project may be cautious about LLM-generated contributions, and I'd genuinely welcome feedback on the quality of this work — both the code itself and how well it follows Pygments conventions. I'm using this as a real-world test of how well Claude handles a non-trivial open-source contribution task given a detailed prompt. Any review comments, even harsh ones, are appreciated.

Original prompt (verbatim)

Pygments Lexer: NASM (Netwide Assembler)

Task

Fix the existing NASM (Netwide Assembler) lexer in Pygments. Work inside my local fork of the pygments/pygments repo on a separate branch.

Official references

MANDATORY: Before writing or modifying the lexer, you MUST fetch and read every
URL in this list.
This is not background reading — it is a required prerequisite
step. Fetch each page, extract the keywords or function names, and verify them
against the lexer before declaring any work complete.

Pygments references

Phase 1: Setup and audit

  1. Confirm you're in the root of a Pygments repo checkout (look for pygments/lexers/, tests/, setup.py).

  2. Run git checkout -b fix/nasm main to create a dedicated branch.

  3. Set up a venv: python -m venv venv && source venv/bin/activate && pip install -e ".[dev]".

  4. Run tox -e py to confirm the existing test suite passes.

  5. Establish a baseline — run the existing lexer against a sample and count Error tokens:

    echo '<sample code>' | python -m pygments -l nasm -f html | grep -o 'class="err"' | wc -l
  6. Read the existing lexer end-to-end. Understand the current states, token patterns, and keyword sets.

Known issues to fix

Phase 1: Research

Before writing any code, fetch and read the official references listed above.

Do not invent or assume any syntax elements. If something is ambiguous in the docs, web-search to verify before including it.

Phase 2: Fix the lexer

Apply fixes to the existing lexer file.

Review the existing lexer at pygments/lexers/asm.py (the NasmLexer class) and fix:

  1. Register matching greediness: The lexer matches register names like sp inside longer words (e.g., sprintf). Fix by using word boundary anchors or negative lookahead.
  2. Macro whitespace: %define and other preprocessor directives must be recognized even when preceded by whitespace, not just at column 0.
  3. Missing registers: Audit and add any missing x86-64 extended registers, AVX-512 registers, mask registers.
  4. Missing directives: Ensure all NASM preprocessor and assembler directives are covered.
  5. Disassembly compatibility: Consider gracefully handling <symbol@plt> patterns and hex address prefixes.

After each fix, run the tests to confirm no regressions:

tox -e py -- tests/snippets/nasm/

Phase 3: Expand tests

Review and expand the existing test snippets in tests/snippets/nasm/. Add snippets that cover the syntax that was previously broken.

Each snippet file is a .txt file containing source code. Run:

tox -- --update-goldens tests/snippets/nasm/new_test.txt

This auto-populates expected tokens. Review them for correctness, then check them in.

Phase 4: Test and iterate

This is the critical phase. Use pygmentize as the feedback loop.

  1. Run tox -e py. Fix any failures.

  2. Test your lexer on the example file and count Error tokens:

    python -m pygments -l nasm -f html tests/examplefiles/nasm/* | grep -o 'class="err"' | wc -l
  3. If there are Error tokens, identify the unmatched text:

    python -m pygments -l nasm -f testcase tests/examplefiles/nasm/* | grep "Token.Error"
  4. For each Error token:
    a. Identify what syntax element the unmatched text represents.
    b. Web-search the official docs to confirm the syntax is valid.
    c. Fix the lexer rule.
    d. Re-run tox -e py -- tests/snippets/nasm/ to confirm no regressions.
    e. Re-test with pygmentize to verify the Error is gone.

  5. Repeat until the Error token count is zero.

  6. Run the full test suite one more time: tox -e py.

  7. Visually inspect the HTML output for sanity:

    python -m pygments -l nasm -f html -O full,style=monokai tests/examplefiles/nasm/* > /tmp/preview.html
    open /tmp/preview.html  # or xdg-open on Linux

    Confirm that keywords, functions, operators, strings, numbers, and comments are each highlighted distinctly.

Phase 5: Finalize

  1. Run tox -e py one final time — full pass, zero failures.
  2. Review the diff: git diff --stat. You should have these files:
    • pygments/lexers/asm.py (the fixes)
    • tests/snippets/nasm/ (new or updated test snippets)
    • Possibly tests/examplefiles/nasm/ (expanded example)
  3. Commit: git add -A && git commit -m "Fix NASM (Netwide Assembler) lexer: <summarize fixes>".
  4. Report what you've done: list the keyword count, function count, token types used, and confirm zero Error tokens.

Constraints (applies to all phases)

  • No hallucinated syntax. Every keyword, function, operator, and language construct must come from the official documentation listed above. If you're unsure, web-search the docs before adding it.
  • Follow Pygments conventions exactly. Read existing lexers (especially sql.py and the lexer development guide) for patterns. Use words(), bygroups(), include(), and default() helpers appropriately.
  • Python code must include type hints and pass ruff linter checks.
  • The Error token count is the ground truth. tox passing is necessary but not sufficient — you must also have zero Token.Error in both test snippets and example files.
  • Iterate until clean. Do not declare the task complete until both tox -e py passes AND the Error token count is zero.