parser/htmlparser - mozsearch

mozilla-central/parser/htmlparser

Name	Description	Size
CNavDTD.cpp		1321
CNavDTD.h		680
CParserContext.cpp		2071
CParserContext.h	MODULE NOTES: @update gess 4/1/98	1480
metrics.yaml		2259
moz.build		1201
nsElementTable.cpp		6496
nsElementTable.h		587
nsExpatDriver.cpp	RLBOX HELPERS *******************************	64707
nsExpatDriver.h	Pass a buffer to Expat. If Expat is blocked aBuffer should be null and aLength should be 0. The result of the call will be stored in mInternalState. Expat will parse as much of the buffer as it can and store the rest in its internal buffer. @param aBuffer the buffer to pass to Expat. May be null. @param aLength the length of the buffer to pass to Expat (in number of char16_t's). Must be 0 if aBuffer is null and > 0 if aBuffer is not null. @param aIsFinal whether this is the last chunk in a row passed to ParseChunk, and if so whether it's the last chunk and buffer passed to ParseChunk (meaning there will be no more calls to ParseChunk for the document being parsed). @param aConsumed [out] the number of PRUnichars that Expat consumed. This doesn't include the PRUnichars that Expat stored in its buffer but didn't parse yet. @param aLastLineLength [out] the length of the last line that Expat has consumed. This will only be computed if aIsFinal is not None or mInternalState is set to a failure.	10993
nsHTMLTagList.h	This file contains the list of all HTML tags. See nsHTMLTags.h for access to the enum values for tags. It is designed to be used as input to various places that will define the HTML_TAG macro in useful ways through the magic of C preprocessing. Additionally, it is consumed by the self-regeneration code in ElementName.java from which nsHtml5ElementName.cpp/h is translated. See parser/html/java/README.txt. If you edit this list, you need to re-run ElementName.java self-regeneration and the HTML parser Java to C++ translation. All entries must be enclosed in the macro HTML_TAG which will have cruel and unusual things done to it. It is recommended (but not strictly necessary) to keep all entries in alphabetical order. The first argument to HTML_TAG is the tag name. The second argument is the "creator" method of the form NS_New$TAGNAMEElement, that will be used by nsHTMLContentSink.cpp to create a content object for a tag of that type. Use NOTUSED, if the particular tag has a non-standard creator. The third argument is the interface name specified for this element in the HTML specification. It can be empty if the relevant interface name is "HTMLElement". The HTML_OTHER macro is for values in the nsHTMLTag enum that are not strictly tags. Entries must use only lowercase characters. Don't forget to update /editor/libeditor/HTMLEditUtils.cpp as well. * Break these invariants and bad things will happen. **	6367
nsHTMLTags.cpp		4831
nsHTMLTags.h	Declare the enum list using the magic of preprocessing enum values are "eHTMLTag_foo" (where foo is the tag) To change the list of tags, see nsHTMLTagList.h These enum values are used as the index of array in various places. If we change the structure of the enum by adding entries to it or removing entries from it _directly_, not via nsHTMLTagList.h, don't forget to update dom/bindings/BindingUtils.cpp and dom/html/nsHTMLContentSink.cpp as well.	2643
nsIContentSink.h	MODULE NOTES: @update gess 4/1/98 This pure virtual interface is used as the "glue" that connects the parsing process to the content model construction process. The icontentsink interface is a very lightweight wrapper that represents the content-sink model building process. There is another one that you may care about more, which is the IHTMLContentSink interface. (See that file for details).	4257
nsIDTD.h	MODULE NOTES: @update gess 7/20/98 This interface defines standard interface for DTD's. Note that this isn't HTML specific. DTD's have several functions within the parser system: 1) To coordinate the consumption of an input stream via the parser 2) To serve as proxy to represent the containment rules of the underlying document 3) To offer autodetection services to the parser (mainly for doc conversion)	2534
nsIExpatSink.idl	This interface should be implemented by any content sink that wants to get output from expat and do something with it; in other words, by any sink that handles some sort of XML dialect.	4525
nsIFragmentContentSink.h	The fragment sink allows a client to parse a fragment of sink, possibly surrounded in context. Also see nsParser::ParseFragment(). Note: once you've parsed a fragment, the fragment sink must be re-set on the parser in order to parse another fragment.	2710
nsIHTMLContentSink.h	This interface is OBSOLETE and in the process of being REMOVED. Do NOT implement! This file declares the concrete HTMLContentSink class. This class is used during the parsing process as the primary interface between the parser and the content model. After the tokenizer completes, the parser iterates over the known token list. As the parser identifies valid elements, it calls the contentsink interface to notify the content model that a new node or child node is being created and added to the content model. The HTMLContentSink interface assumes 4 underlying containers: HTML, HEAD, BODY and FRAMESET. Before accessing any these, the parser will call the appropriate OpennsIHTMLContentSink method: OpenHTML,OpenHead,OpenBody,OpenFrameSet; likewise, the ClosensIHTMLContentSink version will be called when the parser is done with a given section. IMPORTANT: The parser may Open each container more than once! This is due to the irregular nature of HTML files. For example, it is possible to encounter plain text at the start of an HTML document (that precedes the HTML tag). Such text is treated as if it were part of the body. In such cases, the parser will Open the body, pass the text- node in and then Close the body. The body will likely be re-Opened later when the actual <BODY> tag has been seen. Containers within the body are Opened and Closed using the OpenContainer(...) and CloseContainer(...) calls. It is assumed that the document or contentSink is maintaining its state to manage where new content should be added to the underlying document. NOTE: OpenHTML() and OpenBody() may get called multiple times in the same document. That's fine, and it doesn't mean that we have multiple bodies or HTML's. NOTE: I haven't figured out how sub-documents (non-frames) are going to be handled. Stay tuned.	3328
nsIParser.h	This GECKO-INTERNAL interface is on track to being REMOVED (or refactored to the point of being near-unrecognizable). Please DO NOT #include this file in comm-central code, in your XULRunner app or binary extensions. Please DO NOT #include this into new files even inside Gecko. It is more likely than not that #including this header is the wrong thing to do.	5987
nsParser.cpp	The parser can be explicitly interrupted by passing a return value of NS_ERROR_HTMLPARSER_INTERRUPTED from BuildModel on the DTD. This will cause the parser to stop processing and allow the application to return to the event loop. The data which was left at the time of interruption will be processed the next time OnDataAvailable is called. If the parser has received its final chunk of data then OnDataAvailable will no longer be called by the networking module, so the parser will schedule a nsParserContinueEvent which will call the parser to process the remaining data after returning to the event loop. If the parser is interrupted while processing the remaining data it will schedule another ParseContinueEvent. The processing of data followed by scheduling of the continue events will proceed until either: 1) All of the remaining data can be processed without interrupting 2) The parser has been cancelled. This capability is currently used in CNavDTD and nsHTMLContentSink. The nsHTMLContentSink is notified by CNavDTD when a chunk of tokens is going to be processed and when each token is processed. The nsHTML content sink records the time when the chunk has started processing and will return NS_ERROR_HTMLPARSER_INTERRUPTED if the token processing time has exceeded a threshold called max tokenizing processing time. This allows the content sink to limit how much data is processed in a single chunk which in turn gates how much time is spent away from the event loop. Processing smaller chunks of data also reduces the time spent in subsequent reflows. This capability is most apparent when loading large documents. If the maximum token processing time is set small enough the application will remain responsive during document load. A side-effect of this capability is that document load is not complete when the last chunk of data is passed to OnDataAvailable since the parser may have been interrupted when the last chunk of data arrived. The document is complete when all of the document has been tokenized and there aren't any pending nsParserContinueEvents. This can cause problems if the application assumes that it can monitor the load requests to determine when the document load has been completed. This is what happens in Mozilla. The document is considered completely loaded when all of the load requests have been satisfied. To delay the document load until all of the parsing has been completed the nsHTMLContentSink adds a dummy parser load request which is not removed until the nsHTMLContentSink's DidBuildModel is called. The CNavDTD will not call DidBuildModel until the final chunk of data has been passed to the parser through the OnDataAvailable and there aren't any pending nsParserContineEvents. Currently the parser is ignores requests to be interrupted during the processing of script. This is because a document.write followed by JavaScript calls to manipulate the DOM may fail if the parser was interrupted during the document.write. For more details @see bugzilla bug 76722	38020
nsParser.h	MODULE NOTES: This class does two primary jobs: 1) It iterates the tokens provided during the tokenization process, identifing where elements begin and end (doing validation and normalization). 2) It controls and coordinates with an instance of the IContentSink interface, to coordinate the the production of the content model. The basic operation of this class assumes that an HTML document is non-normalized. Therefore, we don't process the document in a normalized way. Don't bother to look for methods like: doHead() or doBody(). Instead, in order to be backward compatible, we must scan the set of tokens and perform this basic set of operations: 1) Determine the token type (easy, since the tokens know) 2) Determine the appropriate section of the HTML document each token belongs in (HTML,HEAD,BODY,FRAMESET). 3) Insert content into our document (via the sink) into the correct section. 4) In the case of tags that belong in the BODY, we must ensure that our underlying document state reflects the appropriate context for our tag. For example,if we see a <TR>, we must ensure our document contains a table into which the row can be placed. This may result in "implicit containers" created to ensure a well-formed document.	9030
nsParserBase.h		472
nsParserConstants.h		762
nsParserMsgUtils.cpp		1950
nsParserMsgUtils.h		1078
nsRLBoxExpatDriver.h		909
nsScanner.cpp	Use this constructor if you want i/o to be based on a single string you hand in during construction. This short cut was added for Javascript. @update gess 5/12/98 @param aMode represents the parser mode (nav, other) @return	9434
nsScanner.h	MODULE NOTES: @update gess 4/1/98 The scanner is a low-level service class that knows how to consume characters out of an (internal) stream. This class also offers a series of utility methods that most tokenizers want, such as readUntil() and SkipWhitespace().	5275
nsScannerString.cpp	nsScannerBufferList	11370
nsScannerString.h	NOTE: nsScannerString (and the other classes defined in this file) are not related to nsAString or any of the other xpcom/string classes. nsScannerString is based on the nsSlidingString implementation that used to live in xpcom/string. Now that nsAString is limited to representing only single fragment strings, nsSlidingString can no longer be used. An advantage to this design is that it does not employ any virtual functions. This file uses SCC-style indenting in deference to the nsSlidingString code from which this code is derived ;-)	13721
tests