This module imports a copy of html.parser.HTMLParser and modifies it heavily through monkey-patches. A copy is imported rather than the module being directly imported as this ensures that the user can import and use the unmodified library for their own needs.

Classes:

Bases: HTMLParser

Extract raw HTML from text.

The raw HTML is stored in the htmlStash of the Markdown instance passed to md and the remaining text is stored in cleandoc as a list of strings.

Methods:

Attributes:

  • line_offset (int) –

    Returns char index in self.rawdata for the start of the current line.

Reset this instance. Loses all unprocessed data.

Handle any buffered data.

Returns char index in self.rawdata for the start of the current line.

Returns True if current position is at start of line.

Allows for up to three blank spaces at start of line.

Returns the text of the end tag.

If it fails to extract the actual text from the raw data, it builds a closing tag with tag.

‹› markdown.htmlparser.HTMLExtractor.handle_empty_tag(data: str, is_block: bool)

Handle empty tags (<data>).

Return full source of start tag: <...>.