May 27 2008
The Status of Ruby’s libxml
When I was struggling with XML parsing in Ruby, the consensus was to try out libxml. I got on the devel mailing list in preparation for giving it a shot. Unfortunately, this exchange on the mailing list has changed my mind:
I’m curious about the development status of libxml.
My application still core dumps fairly regularly though not in any way
that has proven useful for tracking down problems.
Is there any active development looking into the memory problems?
What is the status of libxml and libxsl for ruby 1.9?
Thanks.
__
Marc
I have been unable to continue work, my personal life not allowing. However,
the amount of effort that I have put into this has not cleared the library
of all its problems and it really needs active shared involvement of more
than just a single developer. This is sort of asking for manna, but the
number of people using this library and the number of issues with it are too
much for a single developer needing to make a living.Dan
Since I don’t feel like becoming the developer of libxml right now (I’ve got plenty of other stuff on my plate), I’m going to pass on trying it out.
Hopefully, Hpricot really will handle namespaces as nicely as Garrick says. Otherwise, I’m going to have to rewrite my app in Groovy/Grails and install a bunch of new infrastructure to support it (e.g. JVM, Tomcat/Winstone). That would be unfortunate.
Popularity: 9% [?]
17 Responses to “The Status of Ruby’s libxml”
Leave a Reply
Additional comments powered by BackType

You should probably be using REXML, which ships with Ruby, instead of libxml. This advice comes to you from the author of NQXML (http://nqxml.sourceforge.net/) :-)
Using REXML is certainly an option, but its performance is terrible compared to libxml. If that’s a factor in your application, you’re right to be concerned. That said, Hpricot does a pretty good job in my limited experience.
You two must be new here.
My Frustrations with REXML: Ruby’s Standard Library for Reading/Writing XML; or, Ruby’s Problem Is Its Type System, and Don’t Try to Tell Me Otherwise
I will never use REXML again. Ever. Period. It is dead to me.
Posted more on this issue over here on Peter Cooper’s request.
My recommendation for writing xml in ruby is to use Builder (http://builder.rubyforge.org/), a core part of rails (but not ruby). I’m not sure how it’s speed compares to other xml packages, but I don’t have to do a lot of writing thankfully; to give you an idea of builder’s speed, on my dual core amd development box it takes around 4 minutes to generate a google sitemap with 300k links in it with other stuff involved (database work, URI escaping, etc).
As for reading xml, as you’ve already discovered rexml is next to useless if speed is any concern of yours. It’s fine for small xml docs (like those returned from google’s data apis), but for a project I work on involving parsing itunes xml files it is unusable. I moved from rexml’s saxparser to libxml’s saxparser which works fine, but I ended up rolling my own regexp based parser to get the flexibility and speed I needed. I don’t recommend that for any kind of general xml parsing, but luckily I was dealing with a small, predictable subset.
rexml is a quick ticket to a non-performant app. It works, sure, but it works in the same way that a car that you push around to go places works. You’ll get there eventually, but it’ll take you ages and it won’t really have been worth the effort.
The problem with REXML isn’t just performance. It’s alsog got problems with documentation and stability — see my link above.
[...] libraries. As most of you know REXML is far from being issue-free (performance in primis), and in The Status of Ruby’s libxml Robert uncovers that the author of LibXml Ruby is unable to actively pursue the development of his [...]
I know your comment was tongue-in-cheek, but wouldn’t you consider JRuby on Rails before Groovy on Grails? I mean, if you’re using Rails already. If not, I’d certainly like to know why.
It wasn’t tongue-in-cheek.
I don’t want to use most of Java’s XML libraries, because they’re really a pain. The functionality I’m looking for is something succinct like Groovy’s XmlSlurper and (on a totally different note) Java’s multithreading capabilities. I’m willing to work around the multithreading issue, but the XML issue is a deal breaker.
And while wrapping the XmlSlurper in a driver class and compiling to byte code and then calling it through JRuby sounds like fun, it also sounds like a lot of indirection and potential problems when I could just do Groovy/Grails directly. I don’t think Ruby/Rails gains me enough over Groovy/Grails to be worth that kind of hackiness.
I’m open to being convinced otherwise, though.
Facts:
* JRuby let’s you use Java classes from Ruby
* Groovy compiles to Java bytecode, so you can run from Java
Therefore:
* JRuby lets you use Groovy from Ruby!
It might be the most, uh, direct thing, but I believe it should be possible. I don’t think I’ve really seen anyone talking about doing this. If I were to guess, it’d go something like:
* Implement a class in Groovy using XmlSlurper
* Use groovyc to compile it to bytecode
* Use the resulting bytecode from Ruby
* …
* PROFIT?
Coming from a Java background, and now being used to Rails, I think I would much, much prefer this route rather than rewriting and switching the entire platform to a Java/Groovy stack.
[...] is quite a bit of disgruntlement about XML and Ruby right at this point in time; see The Status of Ruby’s libxml and My Frustrations with REXML: Ruby’s Standard Library for Reading/Writing XML; or, Ruby’s [...]
@Josh
Yeah, I considered that. Not a huge fan of experimenting with trying to call Groovy from within Rails, but if you get an XmlSlurper port working on Rails, let me know — I’ll be more than happy to spend some cycles debugging it and working on it (like I’m doing with OCurl right now).
The app was small enough, and I wanted the ability to throw off threads in addition to good XML parsing, so I’ve ported it over to Groovy/Grails at this point. Something interesting I’ve discovered is that having Java available makes me want to do more — since it’s easy to do things like “active objects” (objects with heavyweight processing hidden in theads) and caching (especially with JConch), I’m finding it hard to resist going that route with my code.
So the app is taking longer to re-write in Grails, but that’s mainly because the scope creeped up pretty fast. The XML parsing and website stuff went very, very quickly.
[...] leaves just the more thorough blog posts on topics (like Groovy’s list#flatten and the status of Ruby’s libxml) and reasonably significant announcements (like presentations I’m [...]
Hi Robert,
Thought you’d like to know we just pushed out libxml-ruby 0.8.0, which adds windows support and fixes lots of old outstanding bugs. More information is at my blog:
http://cfis.savagexi.com/articles/2008/07/16/resurrecting-libxml-ruby
If you have a chance, give it a try. We’d love to hear your feedback
[...] twenty responses on what the problem is, and what we could do about it. Robert Fischer was lamenting on the state of Ruby’s libxml library, and didn’t seem to like REXML much either. Tim Bray has also had a few complaints about REXML. It [...]
[...] The Status of Ruby’s libxml [...]