Issue33881
Created on 2018-06-16 18:02 by eric.smith, last changed 2022-04-11 14:59 by admin. This issue is now closed.
| Pull Requests | |||
|---|---|---|---|
| URL | Status | Linked | Edit |
| PR 7916 | closed | valer, 2018-06-25 21:58 | |
| PR 20837 | closed | thatiparthy, 2020-06-12 18:49 | |
| Messages (8) | |||
|---|---|---|---|
| msg319766 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2018-06-16 18:02 | |
See issue 33880 for the same issue with namedtuple. This shows up on dataclasses only through make_dataclass. This is an expected ValueError: >>> make_dataclass('a', ['a', 'b', 'c', 'a']) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/cygdrive/c/home/eric/.local/lib/python3.6/site-packages/dataclasses.py", line 1123, in make_dataclass raise TypeError(f'Field name duplicated: {name!r}') TypeError: Field name duplicated: 'a' But this is a SyntaxError: >>> make_dataclass('a', ['\u00b5', '\u03bc']) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/cygdrive/c/home/eric/.local/lib/python3.6/site-packages/dataclasses.py", line 1133, in make_dataclass unsafe_hash=unsafe_hash, frozen=frozen) File "/cygdrive/c/home/eric/.local/lib/python3.6/site-packages/dataclasses.py", line 958, in dataclass return wrap(_cls) File "/cygdrive/c/home/eric/.local/lib/python3.6/site-packages/dataclasses.py", line 950, in wrap return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen) File "/cygdrive/c/home/eric/.local/lib/python3.6/site-packages/dataclasses.py", line 871, in _process_class else 'self', File "/cygdrive/c/home/eric/.local/lib/python3.6/site-packages/dataclasses.py", line 490, in _init_fn return_type=None) File "/cygdrive/c/home/eric/.local/lib/python3.6/site-packages/dataclasses.py", line 356, in _create_fn exec(txt, globals, locals) File "<string>", line 1 SyntaxError: duplicate argument 'μ' in function definition |
|||
| msg320446 - (view) | Author: Valeriya Sinevich (valer) * | Date: 2018-06-25 22:01 | |
I sent a PR and measured how it affected the performance by creating a dataclass with 10000 members.
python.bat -m timeit -s "from dataclasses import make_dataclass" -s "arg_list = [chr(k) * i for k in range(97, 123) for i in range(1, 500)]" "make_dataclass('a', arg_list)"
The performance didn't change, both before and after the changes it takes around 14.5s.
|
|||
| msg320454 - (view) | Author: Steve Dower (steve.dower) * ![]() |
Date: 2018-06-25 23:21 | |
The benchmark may not be triggering that much work. NFKC normalization only applies for characters outside of the basic Latin characters (0-255).
I ran the below benchmarks and saw a huge difference. Granted, it's a very degenerate case with collections this big, but it appears to be linear with len(NAMES), suggesting that the normalization is the expensive part.
>>> CHRS=[c for c in (chr(i) for i in range(65535)) if c.isidentifier()]
>>> def makename():
... return ''.join(random.choice(CHRS) for _ in range(10))
...
>>> NAMES = [makename() for _ in range(10000)]
>>> timeit.timeit('len(set(NAMES))', globals=globals(), number=100000)
38.04007526000004
>>> timeit.timeit('len(set(unicodedata.normalize("NFKC", n) for n in NAMES))', globals=globals(), number=100000)
820.2586788580002
I wonder if it's better to catch the SyntaxError and do the check there? That way we don't really have a performance impact, since it's only going to show up in exceptional cases anyway.
|
|||
| msg320455 - (view) | Author: Steve Dower (steve.dower) * ![]() |
Date: 2018-06-25 23:23 | |
> outside of the basic Latin characters (0-255) This should be (0-127). The Latin characters in 128-255 are considered extended ones, and may be decomposed by normalization. |
|||
| msg371356 - (view) | Author: Cheryl Sabella (cheryl.sabella) * ![]() |
Date: 2020-06-12 12:00 | |
The pull request that was opened for this has been closed due to inactivity, so this issue is available for anyone to work on. |
|||
| msg371424 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2020-06-12 22:19 | |
I recommend leaving the code as-is. AFAICT, the situation almost never arises in practice, and if it did, the existing SyntaxError is clear.
Given that the rest of that language uses a SyntaxError, there isn't a net benefit for switching to TypeError:
>>> def f(µ=1, μ=2):
pass
SyntaxError: duplicate argument 'μ' in function definition
>>> def f(**kwds):
return kwds
>>> f(µ=1, μ=2)
SyntaxError: keyword argument repeated
|
|||
| msg371435 - (view) | Author: Srinivas Reddy Thatiparthy(శ్రీనివాస్ రెడ్డి తాటిపర్తి) (thatiparthy) * | Date: 2020-06-13 04:28 | |
I am convinced by Raymond’s argument. Hence closing the PR. |
|||
| msg371463 - (view) | Author: Eric V. Smith (eric.smith) * ![]() |
Date: 2020-06-13 16:42 | |
I agree with Raymond. There's no real harm being caused here. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:59:01 | admin | set | github: 78062 |
| 2020-06-13 16:42:00 | eric.smith | set | status: open -> closed resolution: wont fix messages: + msg371463 stage: patch review -> resolved |
| 2020-06-13 04:28:47 | thatiparthy | set | messages: + msg371435 |
| 2020-06-12 22:19:45 | rhettinger | set | nosy:
+ rhettinger messages: + msg371424 |
| 2020-06-12 18:49:29 | thatiparthy | set | nosy:
+ thatiparthy pull_requests: + pull_request20031 |
| 2020-06-12 12:00:51 | cheryl.sabella | set | nosy:
+ cheryl.sabella messages:
+ msg371356 |
| 2018-06-25 23:23:09 | steve.dower | set | messages: + msg320455 |
| 2018-06-25 23:21:33 | steve.dower | set | messages: + msg320454 |
| 2018-06-25 22:01:48 | valer | set | nosy:
+ valer, steve.dower messages: + msg320446 |
| 2018-06-25 21:58:52 | valer | set | keywords:
+ patch stage: patch review pull_requests: + pull_request7519 |
| 2018-06-16 19:08:50 | eric.smith | set | title: dataclasses should use NFKD to find duplicate members -> dataclasses should use NFKC to find duplicate members |
| 2018-06-16 18:02:08 | eric.smith | create | |
