tfds.deprecated.text.TokenTextEncoder

TextEncoder backed by a list of tokens.

Inherits From: TextEncoder

tfds.deprecated.text.TokenTextEncoder(
    vocab_list,
    oov_buckets=1,
    oov_token='UNK',
    lowercase=False,
    tokenizer=None,
    strip_vocab=True,
    decode_token_separator=' '
)

Tokenization splits on (and drops) non-alphanumeric characters with regex "\W+".

Args
`vocab_list`	`list<str>`, list of tokens.
`oov_buckets`	`int`, the number of `int`s to reserve for OOV hash buckets. Tokens that are OOV will be hash-modded into a OOV bucket in `encode`.
`oov_token`	`str`, the string to use for OOV ids in `decode`.
`lowercase`	`bool`, whether to make all text and tokens lowercase.
`tokenizer`	`Tokenizer`, responsible for converting incoming text into a list of tokens.
`strip_vocab`	`bool`, whether to strip whitespace from the beginning and end of elements of `vocab_list`.
`decode_token_separator`	`str`, the string used to separate tokens when decoding.

Attributes
`lowercase`
`oov_token`
`tokenizer`
`tokens`
`vocab_size`	Size of the vocabulary. Decode produces ints [1, vocab_size).

Methods

`decode`

View source

decode(
    ids
)

Decodes a list of integers into text.

`encode`

View source

encode(
    s
)

Encodes text into a list of integers.

`load_from_file`

View source

@classmethod
load_from_file(
    filename_prefix
)

Load from file. Inverse of save_to_file.

`save_to_file`

View source

save_to_file(
    filename_prefix
)

Store to file. Inverse of load_from_file.

tfds.deprecated.text.TokenTextEncoder

Args

Attributes

Methods

decode

encode

load_from_file

save_to_file

`decode`

`encode`

`load_from_file`

`save_to_file`