Name Description Size
__init__.py A collection of modules for building different kinds of trees from HTML documents. To create a treebuilder for a new type of tree, you need to do implement several things: 1. A set of classes for various types of elements: Document, Doctype, Comment, Element. These must implement the interface of ``base.treebuilders.Node`` (although comment nodes have a different signature for their constructor, see ``treebuilders.etree.Comment``) Textual content may also be implemented as another node type, or not, as your tree implementation requires. 2. A treebuilder object (called ``TreeBuilder`` by convention) that inherits from ``treebuilders.base.TreeBuilder``. This has 4 required attributes: * ``documentClass`` - the class to use for the bottommost node of a document * ``elementClass`` - the class to use for HTML Elements * ``commentClass`` - the class to use for comments * ``doctypeClass`` - the class to use for doctypes It also has one required method: * ``getDocument`` - Returns the root node of the complete document tree 3. If you wish to run the unit tests, you must also create a ``testSerializer`` method on your treebuilder which accepts a node and returns a string containing Node and its children serialized according to the format used in the unittests 3592
base.py Represents an item in the tree 14553
dom.py 8925
etree.py Return true if the node has children or text 12824
etree_lxml.py Module for supporting the lxml.etree library. The idea here is to use as much of the native library as possible, without using fragile hacks like custom element names that break between releases. The downside of this is that we cannot represent all possible trees; specifically the following are known to cause problems: Text or comments as siblings of the root element Docypes with no name When any of these things occur, we emit a DataLossWarning 14754