NASM lexer fixes by seanthegeek · Pull Request #3059 · pygments/pygments
Note: This patch was written by Claude Sonnet 4.6 via the Claude Code CLI. I know the Pygments project may be cautious about LLM-generated contributions, and I'd genuinely welcome feedback on the quality of this work — both the code itself and how well it follows Pygments conventions. I'm using this as a real-world test of how well Claude handles a non-trivial open-source contribution task given a detailed prompt. Any review comments, even harsh ones, are appreciated.
Pygments Lexer: NASM (Netwide Assembler)
Task
Fix the existing NASM (Netwide Assembler) lexer in Pygments. Work inside my local fork of the pygments/pygments repo on a separate branch.
Official references
MANDATORY: Before writing or modifying the lexer, you MUST fetch and read every
URL in this list. This is not background reading — it is a required prerequisite
step. Fetch each page, extract the keywords or function names, and verify them
against the lexer before declaring any work complete.
- NASM documentation: https://www.nasm.us/doc/
- NASM instruction reference: https://www.nasm.us/doc/nasmdocb.html
- NASM preprocessor: https://www.nasm.us/doc/nasmdoc4.html
- NASM directives: https://www.nasm.us/doc/nasmdoc7.html
- NASM expressions: https://www.nasm.us/doc/nasmdoc3.html
- x86/x86-64 instruction reference: https://www.felixcloutier.com/x86/
- Pygments issue Fix NasmLexer to support syntax like <sprintf@plt> #1231 (sprintf@plt bug): Fix NasmLexer to support syntax like <sprintf@plt> #1231
- Pygments issue NASM lexer: Macros with whitespace before it are not recognized #728 (macro whitespace bug): NASM lexer: Macros with whitespace before it are not recognized #728
Pygments references
- Write your own lexer: https://pygments.org/docs/lexerdevelopment/
- Contributing to Pygments: https://pygments.org/docs/contributing/
- Builtin tokens: https://pygments.org/docs/tokens/
- Available lexers: https://pygments.org/docs/lexers/
- Pygments GitHub repo: https://github.com/pygments/pygments
- Existing ASM lexers (structural reference):
pygments/lexers/asm.py - SQL lexer (structural reference for query languages):
pygments/lexers/sql.py
Phase 1: Setup and audit
-
Confirm you're in the root of a Pygments repo checkout (look for
pygments/lexers/,tests/,setup.py). -
Run
git checkout -b fix/nasm mainto create a dedicated branch. -
Set up a venv:
python -m venv venv && source venv/bin/activate && pip install -e ".[dev]". -
Run
tox -e pyto confirm the existing test suite passes. -
Establish a baseline — run the existing lexer against a sample and count Error tokens:
echo '<sample code>' | python -m pygments -l nasm -f html | grep -o 'class="err"' | wc -l
-
Read the existing lexer end-to-end. Understand the current states, token patterns, and keyword sets.
Known issues to fix
- Issue Fix NasmLexer to support syntax like <sprintf@plt> #1231:
<sprintf@plt>causesspinsidesprintfto be tokenized as a register. - Issue NASM lexer: Macros with whitespace before it are not recognized #728:
%definepreceded by whitespace produces Error tokens. - Missing registers: x86-64 extended registers, AVX-512, mask registers may have gaps.
- Missing preprocessor directives: Some
%directives may not be covered.
Phase 1: Research
Before writing any code, fetch and read the official references listed above.
Do not invent or assume any syntax elements. If something is ambiguous in the docs, web-search to verify before including it.
Phase 2: Fix the lexer
Apply fixes to the existing lexer file.
Review the existing lexer at pygments/lexers/asm.py (the NasmLexer class) and fix:
- Register matching greediness: The lexer matches register names like
spinside longer words (e.g.,sprintf). Fix by using word boundary anchors or negative lookahead. - Macro whitespace:
%defineand other preprocessor directives must be recognized even when preceded by whitespace, not just at column 0. - Missing registers: Audit and add any missing x86-64 extended registers, AVX-512 registers, mask registers.
- Missing directives: Ensure all NASM preprocessor and assembler directives are covered.
- Disassembly compatibility: Consider gracefully handling
<symbol@plt>patterns and hex address prefixes.
After each fix, run the tests to confirm no regressions:
tox -e py -- tests/snippets/nasm/
Phase 3: Expand tests
Review and expand the existing test snippets in tests/snippets/nasm/. Add snippets that cover the syntax that was previously broken.
Each snippet file is a .txt file containing source code. Run:
tox -- --update-goldens tests/snippets/nasm/new_test.txt
This auto-populates expected tokens. Review them for correctness, then check them in.
Phase 4: Test and iterate
This is the critical phase. Use pygmentize as the feedback loop.
-
Run
tox -e py. Fix any failures. -
Test your lexer on the example file and count Error tokens:
python -m pygments -l nasm -f html tests/examplefiles/nasm/* | grep -o 'class="err"' | wc -l
-
If there are Error tokens, identify the unmatched text:
python -m pygments -l nasm -f testcase tests/examplefiles/nasm/* | grep "Token.Error"
-
For each Error token:
a. Identify what syntax element the unmatched text represents.
b. Web-search the official docs to confirm the syntax is valid.
c. Fix the lexer rule.
d. Re-runtox -e py -- tests/snippets/nasm/to confirm no regressions.
e. Re-test withpygmentizeto verify the Error is gone. -
Repeat until the Error token count is zero.
-
Run the full test suite one more time:
tox -e py. -
Visually inspect the HTML output for sanity:
python -m pygments -l nasm -f html -O full,style=monokai tests/examplefiles/nasm/* > /tmp/preview.html open /tmp/preview.html # or xdg-open on Linux
Confirm that keywords, functions, operators, strings, numbers, and comments are each highlighted distinctly.
Phase 5: Finalize
- Run
tox -e pyone final time — full pass, zero failures. - Review the diff:
git diff --stat. You should have these files:pygments/lexers/asm.py(the fixes)tests/snippets/nasm/(new or updated test snippets)- Possibly
tests/examplefiles/nasm/(expanded example)
- Commit:
git add -A && git commit -m "Fix NASM (Netwide Assembler) lexer: <summarize fixes>". - Report what you've done: list the keyword count, function count, token types used, and confirm zero Error tokens.
Constraints (applies to all phases)
- No hallucinated syntax. Every keyword, function, operator, and language construct must come from the official documentation listed above. If you're unsure, web-search the docs before adding it.
- Follow Pygments conventions exactly. Read existing lexers (especially
sql.pyand the lexer development guide) for patterns. Usewords(),bygroups(),include(), anddefault()helpers appropriately. - Python code must include type hints and pass ruff linter checks.
- The Error token count is the ground truth.
toxpassing is necessary but not sufficient — you must also have zeroToken.Errorin both test snippets and example files. - Iterate until clean. Do not declare the task complete until both
tox -e pypasses AND the Error token count is zero.