Message 319934 - Python tracker

Message319934

Author	ammar2
Recipients	ammar2
Date	2018-06-19.07:41:51
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1529394112.27.0.56676864532.issue33899@psf.upfronthosting.co.za>
In-reply-to

Content
As was pointed out in https://bugs.python.org/issue33766 there is an edge case in the tokenizer whereby it will implicitly treat the end of input as a newline. The tokenize module in stdlib does not mirror the C code's behavior in this case. tokenizer.c: ~/cpython $ echo -n 'x' \| ./python ---------- NAME ("x") NEWLINE ENDMARKER tokenize module: ~/cpython $ echo -n 'x' \| ./python -m tokenize 1,0-1,1: NAME 'x' 2,0-2,0: ENDMARKER '' The instrumentation to have the C tokenizer dump out its tokens is mine, can provide a diff to produce that output if needed.

Content

As was pointed out in https://bugs.python.org/issue33766 there is an edge case in the tokenizer whereby it will implicitly treat the end of input as a newline. The tokenize module in stdlib does not mirror the C code's behavior in this case.

tokenizer.c:

  ~/cpython $ echo -n 'x' | ./python
  ----------
  NAME ("x")
  NEWLINE
  ENDMARKER

tokenize module:

  ~/cpython $ echo -n 'x' | ./python -m tokenize
  1,0-1,1:            NAME           'x'            
  2,0-2,0:            ENDMARKER      ''

The instrumentation to have the C tokenizer dump out its tokens is mine, can provide a diff to produce that output if needed.

History
Date	User	Action	Args
2018-06-19 07:41:52	ammar2	set	recipients: + ammar2
2018-06-19 07:41:52	ammar2	set	messageid: <1529394112.27.0.56676864532.issue33899@psf.upfronthosting.co.za>
2018-06-19 07:41:52	ammar2	link	issue33899 messages
2018-06-19 07:41:51	ammar2	create