bpo-31672: Fix string.Template accidentally matched non-ASCII identif… · python/cpython@7060380

4 files changed

lines changed

Original file line numberDiff line numberDiff line change

@@ -746,8 +746,18 @@ to parse template strings. To do this, you can override these class attributes:

746746
747747

* *idpattern* -- This is the regular expression describing the pattern for

748748

non-braced placeholders (the braces will be added automatically as

749-

appropriate). The default value is the regular expression

750-

``[_a-z][_a-z0-9]*``.

749+

appropriate). The default value is the regular expression

750+

``(?-i:[_a-zA-Z][_a-zA-Z0-9]*)``.

751+
752+

.. note::

753+
754+

Since default *flags* is ``re.IGNORECASE``, pattern ``[a-z]`` can match

755+

with some non-ASCII characters. That's why we use local ``-i`` flag here.

756+
757+

While *flags* is kept to ``re.IGNORECASE`` for backward compatibility,

758+

you can override it to ``0`` or ``re.IGNORECASE | re.ASCII`` when

759+

subclassing.

760+
751761
752762

* *flags* -- The regular expression flags that will be applied when compiling

753763

the regular expression used for recognizing substitutions. The default value

Original file line numberDiff line numberDiff line change

@@ -78,7 +78,11 @@ class Template(metaclass=_TemplateMetaclass):

7878

"""A string class for supporting $-substitutions."""

7979
8080

delimiter = '$'

81-

idpattern = r'[_a-z][_a-z0-9]*'

81+

# r'[a-z]' matches to non-ASCII letters when used with IGNORECASE,

82+

# but without ASCII flag. We can't add re.ASCII to flags because of

83+

# backward compatibility. So we use local -i flag and [a-zA-Z] pattern.

84+

# See https://bugs.python.org/issue31672

85+

idpattern = r'(?-i:[_a-zA-Z][_a-zA-Z0-9]*)'

8286

flags = _re.IGNORECASE

8387
8488

def __init__(self, template):

Original file line numberDiff line numberDiff line change

@@ -271,6 +271,12 @@ def test_invalid_placeholders(self):

271271

raises(ValueError, s.substitute, dict(who='tim'))

272272

s = Template('$who likes $100')

273273

raises(ValueError, s.substitute, dict(who='tim'))

274+

# Template.idpattern should match to only ASCII characters.

275+

# https://bugs.python.org/issue31672

276+

s = Template("$who likes $\u0131") # (DOTLESS I)

277+

raises(ValueError, s.substitute, dict(who='tim'))

278+

s = Template("$who likes $\u0130") # (LATIN CAPITAL LETTER I WITH DOT ABOVE)

279+

raises(ValueError, s.substitute, dict(who='tim'))

274280
275281

def test_idpattern_override(self):

276282

class PathPattern(Template):

Original file line numberDiff line numberDiff line change

@@ -0,0 +1,2 @@

1+

``idpattern`` in ``string.Template`` matched some non-ASCII characters. Now

2+

it uses ``-i`` regular expression local flag to avoid non-ASCII characters.