tfds.deprecated.text.TextEncoder

Abstract base class for converting between text and integers.

A note on padding:

Because text data is typically variable length and nearly always requires padding during training, ID 0 is always reserved for padding. To accommodate this, all TextEncoders behave in certain ways:

  • encode: never returns id 0 (all ids are 1+)
  • decode: drops 0 in the input ids
  • vocab_size: includes ID 0

    New subclasses should be careful to match this behavior.

Attributes

vocab_size Size of the vocabulary. Decode produces ints [1, vocab_size).

Methods

decode

View source

@abc.abstractmethod
decode(
    ids
)

Decodes a list of integers into text.

encode

View source

@abc.abstractmethod
encode(
    s
)

Encodes text into a list of integers.

load_from_file

View source

@classmethod
@abc.abstractmethod
load_from_file(
    filename_prefix
)

Load from file. Inverse of save_to_file.

save_to_file

View source

@abc.abstractmethod
save_to_file(
    filename_prefix
)

Store to file. Inverse of load_from_file.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-04-26 UTC.