bpo-25324: Move the description of tokenize tokens to token.rst. (#1911) · python/cpython@5cefb6c

@@ -17,7 +17,7 @@ as well, making it useful for implementing "pretty-printers," including

1717

colorizers for on-screen displays.

18181919

To simplify token stream handling, all :ref:`operators` and :ref:`delimiters`

20-

tokens are returned using the generic :data:`token.OP` token type. The exact

20+

tokens are returned using the generic :data:`~token.OP` token type. The exact

2121

type can be determined by checking the ``exact_type`` property on the

2222

:term:`named tuple` returned from :func:`tokenize.tokenize`.

2323

@@ -44,7 +44,7 @@ The primary entry point is a :term:`generator`:

44444545

The returned :term:`named tuple` has an additional property named

4646

``exact_type`` that contains the exact operator type for

47-

:data:`token.OP` tokens. For all other token types ``exact_type``

47+

:data:`~token.OP` tokens. For all other token types ``exact_type``

4848

equals the named tuple ``type`` field.

49495050

.. versionchanged:: 3.1

@@ -58,26 +58,7 @@ The primary entry point is a :term:`generator`:

585859596060

All constants from the :mod:`token` module are also exported from

61-

:mod:`tokenize`, as are three additional token type values:

62-63-

.. data:: COMMENT

64-65-

Token value used to indicate a comment.

66-67-68-

.. data:: NL

69-70-

Token value used to indicate a non-terminating newline. The NEWLINE token

71-

indicates the end of a logical line of Python code; NL tokens are generated

72-

when a logical line of code is continued over multiple physical lines.

73-74-75-

.. data:: ENCODING

76-77-

Token value that indicates the encoding used to decode the source bytes

78-

into text. The first token returned by :func:`.tokenize` will always be an

79-

ENCODING token.

80-61+

:mod:`tokenize`.

81628263

Another function is provided to reverse the tokenization process. This is

8364

useful for creating tools that tokenize a script, modify the token stream, and

@@ -96,8 +77,8 @@ write back the modified script.

9677

token type and token string as the spacing between tokens (column

9778

positions) may change.

987999-

It returns bytes, encoded using the ENCODING token, which is the first

100-

token sequence output by :func:`.tokenize`.

80+

It returns bytes, encoded using the :data:`~token.ENCODING` token, which

81+

is the first token sequence output by :func:`.tokenize`.

101821028310384

:func:`.tokenize` needs to detect the encoding of source files it tokenizes. The

@@ -115,7 +96,7 @@ function it uses to do this is available:

1159611697

It detects the encoding from the presence of a UTF-8 BOM or an encoding

11798

cookie as specified in :pep:`263`. If both a BOM and a cookie are present,

118-

but disagree, a SyntaxError will be raised. Note that if the BOM is found,

99+

but disagree, a :exc:`SyntaxError` will be raised. Note that if the BOM is found,

119100

``'utf-8-sig'`` will be returned as an encoding.

120101121102

If no encoding is specified, then the default of ``'utf-8'`` will be

@@ -147,8 +128,8 @@ function it uses to do this is available:

147128

3

148129149130

Note that unclosed single-quoted strings do not cause an error to be

150-

raised. They are tokenized as ``ERRORTOKEN``, followed by the tokenization of

151-

their contents.

131+

raised. They are tokenized as :data:`~token.ERRORTOKEN`, followed by the

132+

tokenization of their contents.

152133153134154135

.. _tokenize-cli:

@@ -260,7 +241,7 @@ the name of the token, and the final column is the value of the token (if any)

260241

4,11-4,12: NEWLINE '\n'

261242

5,0-5,0: ENDMARKER ''

262243263-

The exact token type names can be displayed using the ``-e`` option:

244+

The exact token type names can be displayed using the :option:`-e` option:

264245265246

.. code-block:: sh

266247