Fixed issue with single-line comments in TransactSqlLexer by will-hinson · Pull Request #2717 · pygments/pygments

This pull request fixes an issue with the regex for the Comment.Single token type in the TransactSqlLexer class.

In the current version of pygments (2.18.0), the existing regex causes single-line comments to be lexed incorrectly whenever they are immediately followed by a non-comment token on the next line. For example, consider the following code:

from pygments.lexers.sql import TransactSqlLexer
for token_type, token_contents in list(
    TransactSqlLexer().get_tokens(
        """
        -- this is a single line comment
        select
        """
    )
):
    print(token_type, repr(token_contents))

When run, this results in the following stream of tokens: (Note that the comment is lexed as various other tokens)

Token.Text.Whitespace '        '
Token.Operator '-'
Token.Operator '-'
Token.Text.Whitespace ' '
Token.Name 'this'
Token.Text.Whitespace ' '
Token.Keyword 'is'
Token.Text.Whitespace ' '
Token.Name 'a'
Token.Text.Whitespace ' '
Token.Name 'single'
Token.Text.Whitespace ' '
Token.Name 'line'
Token.Text.Whitespace ' '
Token.Name 'comment'
Token.Text.Whitespace '\n        '
Token.Keyword 'select'
Token.Text.Whitespace '\n        \n'

Lexing with the modified regex for the token Comment.Single in this commit results in the following stream of tokens: (Note that the comment is now lexed correctly.)

Token.Text.Whitespace '        '
Token.Comment.Single '-- this is a single line comment\n'
Token.Text.Whitespace '        '
Token.Keyword 'select'
Token.Text.Whitespace '\n        \n'