<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: XML: Parser</title>
	<atom:link href="http://findlay.net.nz/paul/2007/07/04/xml-parser/feed" rel="self" type="application/rss+xml" />
	<link>http://findlay.net.nz/paul/2007/07/04/xml-parser</link>
	<description>Paul Findlay and his online content</description>
	<lastBuildDate>Fri, 18 Apr 2008 21:08:24 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Cowtowncoder</title>
		<link>http://findlay.net.nz/paul/2007/07/04/xml-parser/comment-page-1#comment-35</link>
		<dc:creator>Cowtowncoder</dc:creator>
		<pubDate>Fri, 12 Oct 2007 23:41:16 +0000</pubDate>
		<guid isPermaLink="false">http://findlay.net.nz/paul/2007/07/04/xml-parser#comment-35</guid>
		<description>Hey Paul, good luck with the development! Like I said, it&#039;s a non-trivial task. But I think using a state machine is a good idea (in case it wasn&#039;t obvious from my first comment), and it was something I was thinking about too. In the end choice had more to do with possibility to recycle some existing pieces, than feasibility of approaches.
I hope to learn more about the implementation if and when you get that far.</description>
		<content:encoded><![CDATA[<p>Hey Paul, good luck with the development! Like I said, it&#8217;s a non-trivial task. But I think using a state machine is a good idea (in case it wasn&#8217;t obvious from my first comment), and it was something I was thinking about too. In the end choice had more to do with possibility to recycle some existing pieces, than feasibility of approaches.<br />
I hope to learn more about the implementation if and when you get that far.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Findlay &#187; XML: Impossible to be original</title>
		<link>http://findlay.net.nz/paul/2007/07/04/xml-parser/comment-page-1#comment-33</link>
		<dc:creator>Paul Findlay &#187; XML: Impossible to be original</dc:creator>
		<pubDate>Thu, 19 Jul 2007 22:46:59 +0000</pubDate>
		<guid isPermaLink="false">http://findlay.net.nz/paul/2007/07/04/xml-parser#comment-33</guid>
		<description>[...] combination of these XPath expressions get applied like a giant state-machine/trie to the incoming XML events in something approximating log(n) time, so large numbers of XML streams can be handled [...]</description>
		<content:encoded><![CDATA[<p>[...] combination of these XPath expressions get applied like a giant state-machine/trie to the incoming XML events in something approximating log(n) time, so large numbers of XML streams can be handled [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul</title>
		<link>http://findlay.net.nz/paul/2007/07/04/xml-parser/comment-page-1#comment-32</link>
		<dc:creator>Paul</dc:creator>
		<pubDate>Thu, 19 Jul 2007 14:05:40 +0000</pubDate>
		<guid isPermaLink="false">http://findlay.net.nz/paul/2007/07/04/xml-parser#comment-32</guid>
		<description>I&#039;m glad you have done it, and thanks for the encouragement :) Woodstox is the parser mine looks up to..

I hope the FSM thing works out for me, since its somewhere around 400 states already (I could reduce that by making it a pushdown automata..), utf characters get decoded to Unicode only as they are needed. I plan on using what I call a &quot;Tim Bray technique&quot; where chars &gt; 128 get transformed to corresponding ASCII characters for the sake of keeping the transition characters in the FSM low, but inside the state the original Unicode character is checked for its validity to be present in whatever part of the document we are looking at. Perhaps if profiling shows to much work is being done, I could make it happen over longer runs of characters.

Yes, I want it to be a niche thing (and it should be just because it is written for the &lt;a href=&quot;http://www.digitalmars.com/d/&quot; rel=&quot;nofollow&quot;&gt;D programming language&lt;/a&gt;), and you did read my intentions right. I want the parser to be suitable for environments like jabber servers or handling thousands of concurrent REST requests etc. I&#039;m not sure anyone cares to do this at such a low level, so I may end up building some recursive descent parser generator that fits with an asynchronous IO framework..

Hopefully someone wants to pay me somehow for the end usage to compensate for being the only person in the world who cares :)</description>
		<content:encoded><![CDATA[<p>I&#8217;m glad you have done it, and thanks for the encouragement <img src='http://findlay.net.nz/paul/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  Woodstox is the parser mine looks up to..</p>
<p>I hope the FSM thing works out for me, since its somewhere around 400 states already (I could reduce that by making it a pushdown automata..), utf characters get decoded to Unicode only as they are needed. I plan on using what I call a &#8220;Tim Bray technique&#8221; where chars > 128 get transformed to corresponding ASCII characters for the sake of keeping the transition characters in the FSM low, but inside the state the original Unicode character is checked for its validity to be present in whatever part of the document we are looking at. Perhaps if profiling shows to much work is being done, I could make it happen over longer runs of characters.</p>
<p>Yes, I want it to be a niche thing (and it should be just because it is written for the <a href="http://www.digitalmars.com/d/" rel="nofollow">D programming language</a>), and you did read my intentions right. I want the parser to be suitable for environments like jabber servers or handling thousands of concurrent REST requests etc. I&#8217;m not sure anyone cares to do this at such a low level, so I may end up building some recursive descent parser generator that fits with an asynchronous IO framework..</p>
<p>Hopefully someone wants to pay me somehow for the end usage to compensate for being the only person in the world who cares <img src='http://findlay.net.nz/paul/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Cowtowncoder</title>
		<link>http://findlay.net.nz/paul/2007/07/04/xml-parser/comment-page-1#comment-31</link>
		<dc:creator>Cowtowncoder</dc:creator>
		<pubDate>Wed, 18 Jul 2007 20:29:50 +0000</pubDate>
		<guid isPermaLink="false">http://findlay.net.nz/paul/2007/07/04/xml-parser#comment-31</guid>
		<description>Yes. But definitely doable.
Now, assuming I understand you right, one main benefit would be the ability to do non-blocking parsing: that is, either get next event, or indication of &quot;not enough data yet to know&quot;. For what it&#039;s worth, I actually have written core of such a parser, and one that does implement Stax API too. At this point the main challenge is figuring out who would care enough to want to get it ready for real use. ;-)

There are already enough good-but-blocking parsers for this to be bit of a niche thing I suppose.
Oh, also, I didn&#039;t use FSM, but that&#039;s mostly since it gets rather hairy when one wants to also integrate utf-decoding with parsing, for further performance gains.</description>
		<content:encoded><![CDATA[<p>Yes. But definitely doable.<br />
Now, assuming I understand you right, one main benefit would be the ability to do non-blocking parsing: that is, either get next event, or indication of &#8220;not enough data yet to know&#8221;. For what it&#8217;s worth, I actually have written core of such a parser, and one that does implement Stax API too. At this point the main challenge is figuring out who would care enough to want to get it ready for real use. <img src='http://findlay.net.nz/paul/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>There are already enough good-but-blocking parsers for this to be bit of a niche thing I suppose.<br />
Oh, also, I didn&#8217;t use FSM, but that&#8217;s mostly since it gets rather hairy when one wants to also integrate utf-decoding with parsing, for further performance gains.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

