How does anyone actually do XML parsing in Ruby? I’m talking about real XML parsing with namespaces, entities, XML encoded text nodes — the whole 9 yards. I don’t need XPath: I’m willing to walk the tree a la Groovy’s XMLSlurper. But I need something that doesn’t suck.
I’ve been trying to do some simple XML parsing: read in an XML, extract some data elements (including namespaces that are declared), and then take one particular node and store all the child nodes in a CLOB (a.k.a. “text”) database field.
REXML has been a nightmare. It just barely works for parsing. Its XPath, while nominally supported, is getting me all kinds of weird results. It’s never clear if it’s resolving entities or XML encoding or not. The pretty printer will wrap in the middle of an XML encoded entity. The deprecated #write method completely fails. And, best of all, the formatters ship with ruby 1.8.6 patchlevel 111 for darwin, but not ruby 1.8.6 patchlevel 36 for x86_64-linux. And the website is down, so I can’t pull down the code manually.
The alternatives I’ve looked at haven’t been awesome. LibXML-ruby is experimental, requires nonportable library installations, and has the libxml interface that we know and hate. Hpricot half-supports XML, and doesn’t support namespaces.
Quite frankly, XML parsing is one of those things that needs to be a solved problem before any language can be considered ready for the real world. As far as I can tell, Ruby fails at this point.
I know that it’s the open source world, and I should quit my bitchin’ and write up my own XML parser which doesn’t suck. There are two problems with this — 1) I’m not an XML expert, I don’t want to be an XML expert, and so if I were to write something, it’d take me a very long, very unhappy time and probably be wrong in nonobvious ways; 2) in the same amount of time, I could probably rewrite my application in Groovy/Grails, where XML parsing is easy. This amount of time, though, is a lot more than I want to spend. So if I flushed the time I’ve spent building my application down the toilet because something as common as XML parsing is broke in Ruby, I’m going to be really unhappy.
So someone please, please, please correct me by pointing out a great and glorious and easy-to-use XML library in Ruby.
Related posts:
Pingback: Enfranchised Mind » Hpricot Does Namespaces?
Pingback: Enfranchised Mind » The Status of Ruby’s libxml