Mar 24 2008

My Frustrations with REXML: Ruby’s Standard Library for Reading/Writing XML

Published by Robert Fischer at 2:59 pm under To Be Categorized

Edit: Once you get the gist of this rant, jump to the comments for a slightly more reasoned approach. Or my follow-up post which attempts to re-open the dialog.


So, I’m trying to do a little bit of XML reading/writing. Nothing major — read in an XML, grab out some values, and then store the raw XML into the database. I’m doing pretty much the same thing in Groovy, and the XmlSlurper made that blissfully easy.

Since the core library comes with the REXML parser, I figured that it was a nice, stable library, and I’d roll with it. The interface wasn’t as nice as XmlSlurper, but it seems like it would do.

This was the start of the pain.

In fact, the pain pissed me off enough to share my frustrations with the world. Hopefully someone finds this useful, and they can avoid the pain and suffering I put up with. And yes, I could spend the time I’m griping going through and fixing up all the bugs, but I shouldn’t have to for a language as mature as Ruby. Core libraries are supposed to be stable, reliable beasties. If I wanted to spend all my time debugging half-baked implementations or rolling my own solutions, I’d never leave Ocaml — I come to Ruby for the community support. That’s supposed to be the big advantage.

Anyway, here we go:

Problem #1 came along when I tried to parse XML. First of all, the API documentation completely sucks — if you look at the top level REXML package, it’s totally worthless. If you manage to figure out that it’s REXML::Document that you probably want, you’re still not much better off. If you check out #new, which is really what you probably want, you’re rewarded with one word: “Constructor” You also have some “@param” tags that ran together and tell you things like the second argument, called “context”, should be a Hash of the context. That clears up a lot! And, seriously, if you’re telling me that it should be a Hash in the documentation, why aren’t we just doing implied static typing and being done with it?

Anyway, I retreated to Google, found the REXML tutorial, and managed to figure it out from there.

But then I kept having this annoying bug: when I called Element#text(), it was not only ignoring my instructions to leave entities alone (i.e. don’t turn “&lt;” into “<”), but it then seemed to go through and attempt to re-parse it, because it was complaining about unbalanced tags! Principle of Least Surprise my ass(1)! I’m not sure why the second part of that was happening, but the first part is apparently documented, so I stopped using the easy-to-read convenience method and went to Element#write.

This is where the real pain began. See, Element#write is broken. Deprecated and broken, actually. But the tutorial still tells you to use it. The solution is to use their Formatter approach. Except — ready for it? — that’s broken, too! No, I’m not kidding. In this language core library, both versions are broke! The solution is for me to reach in and make a change to the core library so that we avoid a null. In the standard Ruby deployment, using the standard core XML processing library, there is no way to write out XML. It is impossible because of bugs in the library.

The worst part?

THAT STUPID BUG IN THEIR CORE LIBRARY WOULD HAVE BEEN FIXED WITH STATIC TYPING(2). Even more if you have a type system which can check nulls for you. Null pointers/”nil when you didn’t expect it!” errors are totally solvable problems. The fact that our industry hasn’t moved past this painful left-over from C is driving me crazy. The next person who tries to tell me that dynamic typing is the best thing since sliced bread is going to get an earful. It is a flat-out wrong position, and I’m done hearing otherwise from anyone.

(1) As much as I’d love to claim that quote, it actually comes from Paul Cantrell’s excellent exploration of closures in Ruby.
(2) Or with the right test and a CI server guarding the production-bound branch. But that’s apparently not happening…which is where static typing comes in.

Popularity: 18% [?]

21 Responses to “My Frustrations with REXML: Ruby’s Standard Library for Reading/Writing XML”

  1. Nickon 24 Mar 2008 at 7:48 pm

    The answer, my friend, is hpricot. Or to just abandon Ruby altogether since it’s dynamically typed and all the airbag ranting in the world isn’t going to change that.

  2. Brian Hammondon 24 Mar 2008 at 8:40 pm

    otherwise

  3. Justin Bozonieron 24 Mar 2008 at 8:50 pm

    There’s a difference between dynamic typing being bad versus strong typing being necessary at times.

    I think what we’re seeing is a growing need to be able to loosely type and at other times statically typed and sometimes dynamically type.

  4. Robert Fischeron 24 Mar 2008 at 9:08 pm

    @Justin

    There’s a difference between dynamic typing being bad versus strong typing being necessary at times.
    Care to elaborate? I’m missing the distinction you’re making. And when is it necessary to dynamically type?

    Personally, I’m just burnt out on the extra work and flakiness that comes with dynamic languages. I had these problems back with CPAN, but Ruby is flat-out worse.

  5. Jameson 24 Mar 2008 at 10:25 pm

    Ruby’s not for everyone.

    Find a tool that better fits your world-view and you’ll be happier.

  6. tgon 25 Mar 2008 at 4:19 am

    I had problems with REXML too. This time in XPATH predicate processing. I wrote an angry post about it here: .

    http://arrogantgeek.blogspot.com/2008/01/why-ruby-sucks-1.html

  7. [...] Enfranchised Mind » My Frustrations with REXML: Ruby?s Standard Library for Reading/Writin… [...]

  8. Robert Fischeron 25 Mar 2008 at 6:47 am

    Yeah, I’d like to get off Ruby, but the reality is that I haven’t encountered anything else out there better than Rails. And it’s not that Rails is the best web framework out there, but that the community provides a lot of strong support. I just don’t see another framework with the level of support, advanced set of plugins, etc., etc.

    My ranting was really 3-fold:
    1) To feel better. I was really frustrated with REXML, and although I had long considered Rails to be flaky, discovering that the Ruby core library is flaky, too, really upset me.
    2) To push back against the overwhelming popularity of dynamic languages right now. Ruby/Rails, like most things in IT, is just a temporary solution until there’s some better framework. It’d be really nice if the next framework was a bit smarter about typing and safety.
    3) To document my problem. There wasn’t a lot of documentation for my problem on the net, so I wanted to add some more. Also, my memory is horrible, so I wanted to have the story locked down somewhere for posterity — people seem to act like I’m just a reactionary freak when I gripe about dynamic languages, and it’s nice to have a solid example to demonstrate the issue.

  9. Robert Fischeron 25 Mar 2008 at 7:50 am

    If you’re reading this, you probably want to also check out the comments for this post on Reddit. That walled garden has a nascent (although potentially interesting) thread.

  10. Zachon 25 Mar 2008 at 9:26 am

    What interests me here is how you manage to go from “there is a major bug in this core library” to “an entire model for programming is wrong” in a single sentence. “THAT STUPID BUG IN THEIR CORE LIBRARY WOULD HAVE BEEN FiXED WITH STATIC TYPING.” doesn’t, to me, seem to be a useful enough point to warrant the conclusion you have drawn. You could write that sentence a hundred different ways - “the stupid bug could have been avoided if they used a hash instead of a set,” “they completely would have avoided readability problems if they had used black instead of blue!” - that doesn’t immediately nullify the value of the two choices. No one would claim that because a set or the color blue where inherently poor simply because they were not proper solutions to specific problems.

    This frustrates me because I have run into many things in Ruby and Ruby’s core libraries that make me wary of considering it a “mature” language or one I’d really be interested in designing production systems in, and you were well on your way to aptly pointing that out, until you chose to use your rhetorical sword to slay a dragon far out of range.

  11. Robert Fischeron 25 Mar 2008 at 1:37 pm

    Oh, and let’s not forget that this comes hot on the heels of another problem I had with name collisions at runtime.

  12. Robert Fischeron 25 Mar 2008 at 1:39 pm

    Check the Reddit comments for other links to complaints and pithy comments about REXML. It’s apparently just a complete mess.

  13. Robert Fischeron 25 Mar 2008 at 9:55 pm

    @Zach

    You’re welcome to tackle Ruby and it’s problems — for my part, I was mainly trying to vent (see above for more on that). I’d encourage you to post your stuff on Ruby and Ruby’s core libraries: I’d be curious to see what you’ve bumped into.

    Now that I’m a little calmer, let me add some more. I may rework this and some other thoughts into a more coherent post in the near future — I haven’t decided.

    Here’s the thing: open source libraries are one thing. I’m used to those being of dubious quality, and I’m worth putting some effort in to fix them. Even more, I’ve cut Ruby on Rails a lot of slack, because the bugs and the awkwardness are the price that you pay to hang out with the beer-swilling hipsters that make out that community and roll with the rapid rate of development. And, really, it is the best web framework I’ve dealt with, mainly because it provides the most comprehensive and extensible set of functionality of any framework, and it’s got a solid community to back it up.

    However, it’s doing this all by constantly just barely working. That’s the price you pay: the development constantly pushes to the very edge of tolerance, and APIs you took for granted in the last release may fade away, and there’s this constant concern about stability.

    But, the argument ran, static typing really doesn’t get you anything. It’s the unit tests that do it. The testing will exercise the API and give you — for all practical purposes, anyway — the protection that static typing will. So all this clinging to static typing is just old fogies who can’t pry their insecurities away from their enterprisey languages long enough to see how people really get productive.

    And that argument isn’t one I really buy. But I hear it a lot. And I hear people griping about how bad static typing is, for reasons that have zero applicability to any language whose type system postdates the Reagan administration. And I try to mention things to engage in dialog, and it’s really gotten nowhere. But, y’know, hey, if the unit tests really do provide the same protection, then it’s no big deal.

    But then this went down. If Ruby cannot even keep their standard library in check — if they, as the leaders of the language, can’t manage to keep themselves together, and keep something as common as reading and writing XML working — if they fall apart in ways that straightforward static type checking would solve, and if it burns a full day of my precious and limited time trying to figure it out, then I feel like all that tolerance and engaging in dialog just bit me in my ass.

  14. Jeremy D Pavleckon 28 Mar 2008 at 9:09 am

    What a timely post actually. So I decided to poke around with Ruby finally, and see if it could work with one of my projects I have in mind, which is mainly grabbing an RSS feed, parsing it, re-writing it to a database, which I’ll use later to construct some neat things - just for the hell of it.

    Glad I saw this!

  15. [...] for my blogging: after some soul-searching prompted by insightful comments, I’ve decided that my last post on Ruby came a bit too close to violating my own rule #1. It [...]

  16. Robert Fischeron 29 Mar 2008 at 11:43 pm

    @Jeremy D Pavleck

    Here’s the summary of my experience: I haven’t used the core library’s built-in RSS feed, so I don’t know how that works out. REXML is to be avoided. Hpricot is pretty nice, as long as you don’t need to deal with namespaces.

  17. Jonason 14 Apr 2008 at 1:53 pm

    As someone pointed out earlier: don’t use REXML. Its documentation is terrible, it doesn’t really work, and it’s ridiculously complicated to use. Also, don’t use XML-Simple, since it suffers from the exact same problems.

    Instead, use Hpricot.XML, which is awesome, seriously fast, ridiculously easy to use and awesome.

    http://code.whytheluckystiff.net/hpricot/

    You can select stuff with CSS or XPATH selectors, change it, print it out. It doesn’t get any easier.

  18. Robert Fischeron 14 Apr 2008 at 2:36 pm

    @jonas

    Got some documentation on Hpricot and namespaces? I haven’t found any support.

  19. Robert Fischeron 14 Apr 2008 at 2:52 pm

    @jonas

    Oh, and if REXML sucks so bad, why does the Ruby team make sure that it ships with every Ruby distribution? What does it say about the language that they allow something so woefully broken to be a backbone of their language?

  20. [...] what we could do about it. Robert Fischer was lamenting on the state of Ruby’s libxml library, and didn’t seem to like REXML much either. Tim Bray has also had a few complaints about REXML. It seemed there was a problem to be [...]

  21. Caligulaon 22 Aug 2008 at 4:19 am

    People that complain about dynamic languages because they found a bug that static typing would have solved make me sleepy.

Trackback URI | Comments RSS

Leave a Reply

Green Web Hosting! This site hosted by DreamHost.