Various improvements to the BQN lexer by slotThe ยท Pull Request #2789 ยท pygments/pygments

added 5 commits

October 7, 2024 08:39
\w matches all alphanumeric Unicode characters, including ones (e.g., ๐•Š)
that BQN treats special. This is especially troublesome for variables;
previously, something like

    ๐•Ši

would have returned

    (Token.Operator, '๐•Š'),
    (Token.Error, 'i'),

instead of

    (Token.Operator, '๐•Š'),
    (Token.Name.Variable, 'i').

This extends to special sequences like \b, which care about the
difference between \w and \W.
BQN does not actually use * (ASTERISK) anywhere,
but there is a primitive function โ‹† (STAR OPERATOR).

@slotThe slotThe changed the title Fix BQN lexer treating special characters as word chars Various improvements to the BQN lexer

Nov 15, 2024

@slotThe slotThe deleted the bqn/inter-word-chars branch

January 5, 2025 16:00