htmst is a python library for parsing html into AST with positions.
Installation
or
Usage
from htmst import HtmlAst html = """<span foo="bar">hi</span>""" ast = HtmlAst(html) print(ast.root.children[0].tag) # span print(ast.root.children[0].start.row) # 0 print(ast.root.children[0].start.col) # 0 print(ast.root.children[0].end.row) # 0 print(ast.root.children[0].end.col) # 25 print(ast.root.children[0].attrs[0].name) # foo print(ast.root.children[0].attrs[0].value) # bar print(ast.root.children[0].attrs[0].start.row) # 0 print(ast.root.children[0].attrs[0].start.col) # 6 print(ast.root.children[0].attrs[0].end.row) # 0 print(ast.root.children[0].attrs[0].end.col) # 15
Nodes
DoubleTagNode: represents double tagsSingleTagNode: represents single tagsAttrNode: represents attributesTextNode: represents textsCommentNode: represents commentsDoctypeNode: represents doctypes
Each node has a start and end position.
Contributing
Contributions are welcome! Please read the contributing guidelines for more information.
License
This project is licensed under the MIT License.