Mar 24 2008
My Frustrations with REXML: Ruby’s Standard Library for Reading/Writing XML; or, Ruby’s Problem Is Its Type System, and Don’t Try to Tell Me Otherwise
Edit: Once you get the gist of this rant, jump to the comments for a slightly more reasoned approach. Or my follow-up post which attempts to re-open the dialog.
So, I’m trying to do a little bit of XML reading/writing. Nothing major — read in an XML, grab out some values, and then store the raw XML into the database. I’m doing pretty much the same thing in Groovy, and the XmlSlurper made that blissfully easy.
Since the core library comes with the REXML parser, I figured that it was a nice, stable library, and I’d roll with it. The interface wasn’t as nice as XmlSlurper, but it seems like it would do.
This was the start of the pain.
In fact, the pain pissed me off enough to share my frustrations with the world. Hopefully someone finds this useful, and they can avoid the pain and suffering I put up with. And yes, I could spend the time I’m griping going through and fixing up all the bugs, but I shouldn’t have to for a language as mature as Ruby. Core libraries are supposed to be stable, reliable beasties. If I wanted to spend all my time debugging half-baked implementations or rolling my own solutions, I’d never leave Ocaml — I come to Ruby for the community support. That’s supposed to be the big advantage.
Anyway, here we go:
Problem #1 came along when I tried to parse XML. First of all, the API documentation completely sucks — if you look at the top level REXML package, it’s totally worthless. If you manage to figure out that it’s REXML::Document that you probably want, you’re still not much better off. If you check out #new, which is really what you probably want, you’re rewarded with one word: “Constructor” You also have some “@param” tags that ran together and tell you things like the second argument, called “context”, should be a Hash of the context. That clears up a lot! And, seriously, if you’re telling me that it should be a Hash in the documentation, why aren’t we just doing implied static typing and being done with it?
Anyway, I retreated to Google, found the REXML tutorial, and managed to figure it out from there.
But then I kept having this annoying bug: when I called Element#text(), it was not only ignoring my instructions to leave entities alone (i.e. don’t turn “<” into “<”), but it then seemed to go through and attempt to re-parse it, because it was complaining about unbalanced tags! Principle of Least Surprise my ass(1)! I’m not sure why the second part of that was happening, but the first part is apparently documented, so I stopped using the easy-to-read convenience method and went to Element#write.
This is where the real pain began. See, Element#write is broken. Deprecated and broken, actually. But the tutorial still tells you to use it. The solution is to use their Formatter approach. Except — ready for it? — that’s broken, too! No, I’m not kidding. In this language core library, both versions are broke! The solution is for me to reach in and make a change to the core library so that we avoid a null. In the standard Ruby deployment, using the standard core XML processing library, there is no way to write out XML. It is impossible because of bugs in the library.
The worst part?
THAT STUPID BUG IN THEIR CORE LIBRARY WOULD HAVE BEEN FIXED WITH STATIC TYPING. Even more if you have a type system which can check nulls for you. Null pointers/”nil when you didn’t expect it!” errors are totally solvable problems. The fact that our industry hasn’t moved past this painful left-over from C is driving me crazy. The next person who tries to tell me that dynamic typing is the best thing since sliced bread is going to get an earful. It is a flat-out wrong position, and I’m done hearing otherwise from anyone.
(1) As much as I’d love to claim that quote, it actually comes from Paul Cantrell’s excellent exploration of closures in Ruby.
Popularity: 20% [?]
The answer, my friend, is hpricot. Or to just abandon Ruby altogether since it’s dynamically typed and all the airbag ranting in the world isn’t going to change that.
otherwise
There’s a difference between dynamic typing being bad versus strong typing being necessary at times.
I think what we’re seeing is a growing need to be able to loosely type and at other times statically typed and sometimes dynamically type.
@Justin
There’s a difference between dynamic typing being bad versus strong typing being necessary at times.
Care to elaborate? I’m missing the distinction you’re making. And when is it necessary to dynamically type?
Personally, I’m just burnt out on the extra work and flakiness that comes with dynamic languages. I had these problems back with CPAN, but Ruby is flat-out worse.
Ruby’s not for everyone.
Find a tool that better fits your world-view and you’ll be happier.
I had problems with REXML too. This time in XPATH predicate processing. I wrote an angry post about it here: .
http://arrogantgeek.blogspot.com/2008/01/why-ruby-sucks-1.html
[...] Enfranchised Mind » My Frustrations with REXML: Ruby?s Standard Library for Reading/Writin… [...]
Yeah, I’d like to get off Ruby, but the reality is that I haven’t encountered anything else out there better than Rails. And it’s not that Rails is the best web framework out there, but that the community provides a lot of strong support. I just don’t see another framework with the level of support, advanced set of plugins, etc., etc.
My ranting was really 3-fold:
1) To feel better. I was really frustrated with REXML, and although I had long considered Rails to be flaky, discovering that the Ruby core library is flaky, too, really upset me.
2) To push back against the overwhelming popularity of dynamic languages right now. Ruby/Rails, like most things in IT, is just a temporary solution until there’s some better framework. It’d be really nice if the next framework was a bit smarter about typing and safety.
3) To document my problem. There wasn’t a lot of documentation for my problem on the net, so I wanted to add some more. Also, my memory is horrible, so I wanted to have the story locked down somewhere for posterity — people seem to act like I’m just a reactionary freak when I gripe about dynamic languages, and it’s nice to have a solid example to demonstrate the issue.
If you’re reading this, you probably want to also check out the comments for this post on Reddit. That walled garden has a nascent (although potentially interesting) thread.
What interests me here is how you manage to go from “there is a major bug in this core library” to “an entire model for programming is wrong” in a single sentence. “THAT STUPID BUG IN THEIR CORE LIBRARY WOULD HAVE BEEN FiXED WITH STATIC TYPING.” doesn’t, to me, seem to be a useful enough point to warrant the conclusion you have drawn. You could write that sentence a hundred different ways - “the stupid bug could have been avoided if they used a hash instead of a set,” “they completely would have avoided readability problems if they had used black instead of blue!” - that doesn’t immediately nullify the value of the two choices. No one would claim that because a set or the color blue where inherently poor simply because they were not proper solutions to specific problems.
This frustrates me because I have run into many things in Ruby and Ruby’s core libraries that make me wary of considering it a “mature” language or one I’d really be interested in designing production systems in, and you were well on your way to aptly pointing that out, until you chose to use your rhetorical sword to slay a dragon far out of range.
Oh, and let’s not forget that this comes hot on the heels of another problem I had with name collisions at runtime.
Check the Reddit comments for other links to complaints and pithy comments about REXML. It’s apparently just a complete mess.
@Zach
You’re welcome to tackle Ruby and it’s problems — for my part, I was mainly trying to vent (see above for more on that). I’d encourage you to post your stuff on Ruby and Ruby’s core libraries: I’d be curious to see what you’ve bumped into.
Now that I’m a little calmer, let me add some more. I may rework this and some other thoughts into a more coherent post in the near future — I haven’t decided.
Here’s the thing: open source libraries are one thing. I’m used to those being of dubious quality, and I’m worth putting some effort in to fix them. Even more, I’ve cut Ruby on Rails a lot of slack, because the bugs and the awkwardness are the price that you pay to hang out with the beer-swilling hipsters that make out that community and roll with the rapid rate of development. And, really, it is the best web framework I’ve dealt with, mainly because it provides the most comprehensive and extensible set of functionality of any framework, and it’s got a solid community to back it up.
However, it’s doing this all by constantly just barely working. That’s the price you pay: the development constantly pushes to the very edge of tolerance, and APIs you took for granted in the last release may fade away, and there’s this constant concern about stability.
But, the argument ran, static typing really doesn’t get you anything. It’s the unit tests that do it. The testing will exercise the API and give you — for all practical purposes, anyway — the protection that static typing will. So all this clinging to static typing is just old fogies who can’t pry their insecurities away from their enterprisey languages long enough to see how people really get productive.
And that argument isn’t one I really buy. But I hear it a lot. And I hear people griping about how bad static typing is, for reasons that have zero applicability to any language whose type system postdates the Reagan administration. And I try to mention things to engage in dialog, and it’s really gotten nowhere. But, y’know, hey, if the unit tests really do provide the same protection, then it’s no big deal.
But then this went down. If Ruby cannot even keep their standard library in check — if they, as the leaders of the language, can’t manage to keep themselves together, and keep something as common as reading and writing XML working — if they fall apart in ways that straightforward static type checking would solve, and if it burns a full day of my precious and limited time trying to figure it out, then I feel like all that tolerance and engaging in dialog just bit me in my ass.
What a timely post actually. So I decided to poke around with Ruby finally, and see if it could work with one of my projects I have in mind, which is mainly grabbing an RSS feed, parsing it, re-writing it to a database, which I’ll use later to construct some neat things - just for the hell of it.
Glad I saw this!
[...] for my blogging: after some soul-searching prompted by insightful comments, I’ve decided that my last post on Ruby came a bit too close to violating my own rule #1. It [...]
@Jeremy D Pavleck
Here’s the summary of my experience: I haven’t used the core library’s built-in RSS feed, so I don’t know how that works out. REXML is to be avoided. Hpricot is pretty nice, as long as you don’t need to deal with namespaces.
As someone pointed out earlier: don’t use REXML. Its documentation is terrible, it doesn’t really work, and it’s ridiculously complicated to use. Also, don’t use XML-Simple, since it suffers from the exact same problems.
Instead, use Hpricot.XML, which is awesome, seriously fast, ridiculously easy to use and awesome.
http://code.whytheluckystiff.net/hpricot/
You can select stuff with CSS or XPATH selectors, change it, print it out. It doesn’t get any easier.
@jonas
Got some documentation on Hpricot and namespaces? I haven’t found any support.
@jonas
Oh, and if REXML sucks so bad, why does the Ruby team make sure that it ships with every Ruby distribution? What does it say about the language that they allow something so woefully broken to be a backbone of their language?