Archive for June, 2006

Jun 25 2006

Ruby on Rails: One Thumb Up, One Thumb Down

Published by Robert Fischer under To Be Categorized

I’ve been playing around on Ruby on Rails, and my conclusion is that while it’s still an improvement over Java/Tomcat, it’s not the glorious grail of web development that the buzzword-media would have you believe.

On the one hand, there’s Ruby. Ruby is a language which is almost awesome. Here’s the pros:

  • It is genuinely object-oriented, as opposed to Java (e.g.: try “2.toString()” in Java). Literals are objects, methods are objects, etc., etc. The result is that a lot of the crap that developers shouldn’t be worrying about (e.g.: boxing and unboxing) is finally able to be genuinely ignored without littering your code with. I’ve always wondered why Java didn’t do this from the get-go, and why C# bothered implementing easy means for user code to force boxing/unboxing. If everything’s an object, programmers should simply treat everything like an object — leave it to the compiler/VM to do the funny optimizations. That’s Ruby’s take, too, and I like it.
  • Ruby provides Perlish array and hash constructors. This is not just some syntactic sugar, but a major win for functional programming. In Ocaml, for instance, it’s easy to talk in the code about arrays of tuples of tuples of arrays. The beautiful part of this is that you can then define recursive functional calls that you apply to this structure, and a highly efficient wavefront application means. In Java, you awesome get object-bloat, when you start defining a large number of classes to address all these relatively simple cases, and coding those objects takes up a lot of time (particularly if you have a strict unit testing regimine).
  • Ruby allows you to pass code (including closures) around with ease. It’s actually easier than in Perl, which is nifty. And the functional/applicative programming style is built into the language so far that the Pickax book (the O’Reilly book on Ruby) declares that loops are almost always the wrong answer in Ruby. About time people figured out how ugly loops are…
  • Ruby has a nifty way of making reference to “that thing called Foo”, similar to Perl’s nameglob (”*Foo”) operator. This makes it very easy to write code about other code, which makes it easy to implement and work with attributes and the like. It’s also kinda like passing a variable by reference, but without some of the nightmares that causes.

But there are some catches.

  • There is no type safety system. I mean, everything’s an object, and you’ve got inheritence for objects, and all this awesome stuff, but you’ve got strictly dynamic typing. And dynamic typing taken to the point where they don’t even allow you to suggest that a particular argument should abide by some kind of contract. All your standard maintenence nightmares ensue — the biggest one being the “sleeper cell code”, which is only executed when the leap day lands on a Wednesday for the year you’re calculating, and THEN your application explodes for seemingly no good reason at all.
  • The syntax is a little bit too clean. If I am having to parse the language based on whitespace and formatting (which the compiler kinda-sorta-but-not-really ignores), then the compiler is going to have trouble. I never considered putting a semicolon at the end of the line to be an onerous duty — in fact, I kinda liked it, because I’m the king of multiline statements in Java. That’s out the door. Parenthesis can be used sometimes to denote which arguments belong to which method, but it’s just begging for trouble.

And Ruby on Rails — that nifty web application development system currently seducing managers and buzzword-susceptible geeks — has got a similar kind of pros and cons.

Pros:

  • MVC structure is forced, which is nice. The natural way to do things in Rails is the pragmatic and sane way to do web development.
  • It provides all the nice features that a web applications server provides — database-driven sessions, preprocessor classes, postprocessor classes, mark-up language level code for dynamic *ML generation, application language level code for heavy lifting, etc., etc. And it’s easier to implement this stuff and plug it into Rails than it is to develop it on Java and plug it in through Tomcat.
  • The convention-over-configuration model is a major pro. It means that as long as you’re staying within the rules, it’s easy to plug new features in, to get the system talking to a database, to respond to changes quickly, and generally to do rapid development.

Cons:

  • Yes, it is easy to get something up-and-running right away. As long as you’re using Linux and MySQL, that is, like looking at the ugly template code generated by Rails, and either use the WEBrick or lighttpd. I’m yet to be convinced that the long-term maintenance cost of Rails-via-Apache is any cheaper than Tomcat-via-Apache, and I’m almost certain the performance is worse, even with FastCGI.
  • The strong emphasis on convention-over-configuration leads to some serious with-us-or-against-us issues. Any deviation from convention is penalized by arbitrary, awkward, and generally confusing error messages. My favorite is the “undefined constant” error message that you get if you try to define a model with a plural name — a “users” model (representing the collection of all users) is a violation of convention, because models should always have singular names. If you try to do it, the error message is a stack trace through code the user never wrote with the message “undefined constants ‘users’ in users.rb”.

Ultimately, my conclusion is that Ruby on Rails is nice for small web portals, and while it’s better than Tomcat, it’s not better for all the reasons being touted. You still have to be just as much of an adept to get a Tomcat web application built as to get Ruby on Rails web application built: the difference is that Rails has its pain in the actual coding/development neck of the woods, whereas the pain of Tomcat is all in the infrastructure.

Popularity: 2% [?]

One response so far

Jun 17 2006

The Functional-Relational Impedance Match

One of the big topics of discussion recently among developers has been the Object-Relational Impedance Mismatch. Or maybe not, OO developers may have just decided that it’s a permanent fact of life, like the primacy of the church of Rome, the naval power of Spain, or smallpox. Something you live with and survive (or not, as the case may be), but not anything you can really do something about. Until, of course, some fool does go and do something about it.

Generally, how that fool (and he’s obviously a fool for trying to change the unchangeable) changes the unchangeable is by first changing the paradigm- changing how the problem is thought about. When the plague was God’s punishment for the wickedness of the earth, there wasn’t really much of anything you could do about it (except try not to torque off God quite so much). But once the plague was caused by bacteria, the possibility of killing off the bacteria and not the host (and thus curing the plague), becomes a possibility. The important point here is that a change in paradigm can change an impossibility into a possibility.

Those who know me know where I’m going with this. I’ve been spending the last few months working with SQL databases and Ocaml. The true mark of a genius is the ability to create new paradigms, explicitly to turn a given impossibility into a possibility- I claim no such great intellect here. Instead, I have my new paradigm already created by greater minds than I- the functional paradigm. What follows is just my first attempt to view the problem of interfacing a "normal" programming language (for loose enough definitions of normal that include Ocaml) to a relational/SQL database. What follows should not be considered the last word on this subject, but only the first word.

As the title of the article should suggest, I think that the new paradigm clears up and cleans up a number of deep problems that the classic Object Oriented approaches to database struggle with. Along the way I hope to shed new light on the mistakes made by most (all up to this date to my knowledge) Object Oriented database libraries, that give rise to what is known as the Object-Relations Impedance Mismatch.

But first, some review. Specifically, some review of Relational theory- trust me, this is important. The core of relational calculus operates on relations, aka tables. There are three main operators, all of which work on relations/tables-

  • join, which takes two relations and creates a third relation which consists of all possible pairings of the rows of the two original relations,
  • selection, which takes a relation and creates a new relation with only a subset of rows of the original relation, and
  • projection, which takes a relation and creates a new relation with a subset of the columns of the old relation

These three operations are the core, the heart and soul, of relational calculus- which is the core of SQL. To faithfully model SQL we must, on some level, faithfully model the relational calculus. And this is where I think the Object Oriented programmers go astray in trying to interface to SQL. In their hurry to make things into objects, they immediately (and without fail) declare the rows to be objects- and thus miss the fact that relational calculus and thus SQL is about relations, not rows. They’re abstracting at the wrong level.

The allure of making rows objects can’t be denied- especially as it encourages the programmer to think of the database as a persistent object store- a magical storage space where objects don’t go away when the program dies. The problem is that this is something that SQL is manifestly not. And this fundamental misconception leads to a number of symptoms. For example- when are changed objects written back to the database? If the application is too eager in storing objects back into the database after changing them, performance suffers as the number of slow SQL queries skyrockets. Don’t be eager enough in writing them back, and a sudden crash of the program could lose too much data.

Another problem that the database as a persistent object store incurs is the inheritance problem. All rows may well be objects- but not all objects are rows. I am human, Socrates is human, therefor I am Socrates. One important restriction the relational calculus and SQL impose is that all rows have exactly the same structure- the same elements. This flies directly in the face of the fundamental assumption of object oriented programming- that two objects may be of the same type even if they have different internal representations. So, for example, lines and points may have different internal representations- different numbers, and even types, of internal members. But in OO programming, they can quite happily cohabitate in a list of Drawables. However, SQL strictly forbids different rows of a table from having different structures, from having different numbers or types of members, and discourages those members from even having different meanings from one row to the next.

But the even bigger problem this approach to databases have I’ve already alluded to- they’re abstracting at the wrong level. By concentrating the row level and demoting the relation level to second class citizens, it totally misses the power and point of relational calculus. Again, there is a cause for this failure- relations are data structures, and the relational operations are operations on data structures as a whole. In the terms used by functional programmers, relations are composable data structures. While not something disallowed by Object Oriented programming, creating and working with composable data structures is not some thing encouraged. OK, set implementations generally have some composable operations (union, intersection, difference)- but most data structure operations are accessors- insertions, deletions, and retrievals of various flavors. The idea of operating on whole data structures is not generally done.

And this is the first major advantage approaching the problem in the functional paradigm brings- because operating on whole data structures is natural and intuitive to a functional programmer. Operating on whole data structures started as a necessary optimization. Retrieving all elements from a tree one at a time, altering them, and inserting them into another tree is O(N log N). But I can generally do the same operation in O(N) by operating on the whole tree, giving orders of magnitude speed up. But even though this paradigm shift was originally born of necessity, it is now bearing fruit in a different realm. The natural way to model SQL in a functional language is to directly model the relational calculus, modelling relations/tables as abstract composable data structures.

Let me be somewhat more specific here. At a 10,000 foot level, the SQL interface would have some abstract data type representing a relation, call it a relation_t. There would be a join function with type relation_t -> relation_t -> relation_t, there would be a selection function of type relation_t -> sql_selection_t -> relation_t, and there would be a projection function relation_t -> column_selection_t -> relation_t. The definition of the types sql_selection_t and column_selection_t obviously need to be defined (I’m still working on that).

These functions would not actually be firing stuff off to the database, they’d just be local data structures that would be gathering the information that gets turned into the appropriate SQL query. Again, I’d like to emphasize that there is nothing here that couldn’t be done in an object oriented fashion- it just isn’t done this way.

How does one actually get elements (rows) out of these abstract data structures? Here again, the natural inclinations of a functional programmer give a radically different answer. In this situation, the functional programmer will reach for a lazily evaluated list.

Some introduction for those who aren’t familiar with lazy evaluation is in order. In Ocaml, a lazily evaluated expression is introduced with the lazy keyword. What follows the lazy keyword is an expression- which isn’t evaluated at that point, but instead is saved, creating a a lazy value, or thunk. The first time the thunk is forced (by calling Lazy.force on it), the expression is evaluated, and cached inside the thunk- so on the second and all subsequent times the cached value is returned.

An example makes this more obvious:

# let f () =
        print_string "Enter a number: ";
        flush stdout;
        int_of_string (read_line ())
  ;;
val f : unit -> int = <fun>
# f ();;
Enter a number: 3
- : int = 3
# let z = lazy (f ());;
val z : int lazy_t = <lazy>
# Lazy.force z;;
Enter a number: 3
- : int = 3
# Lazy.force z;;
- : int = 3
# Lazy.force z;;
- : int = 3
#

Notice how the prompt to enter a number is not issued when the lazy thunk is created (on the let z = ... line), nor on the second or later times when it’s forced- it’s only on the first time it’s forced that the expression is evaluated, the function is called, and the user is prompted to enter a value. Lazy evaluation is a form of mutable data that "plays nice with" purely functional programming. A lazy list, then, is a singly linked list where the next element of a node is not a reference, but instead a lazy thunk that can be forced to get the next element.

Lazy lists are in many ways not unlike the iterators and enumerators of classic OO programming, but with one critical difference- many operations on the lazy list are applied lazily. Specially, maps and filters can be applied lazily. A map just takes a conversion function that converts from type a to type b, and converts a lazy list (or other data structure) of type a into a lazy list of type b. But rather than going through and changing all the elements immediately, the map function is just folded into the lazy thunk- forcing an element of the list of type b first forces the next element of the list of type a, and then applies the mapping function to it.

This is a critical difference because it moves when the computation happens. So we have two functions- one function that converts a relation_t into a lazy list of elements, which actually issues the select statement to the database, and a function which inserts a lazy list of elements into a relation_t, which performs an insert or update. So we start with a base table or so, perform various joins, selects, and projections on them to produce our source and destination relations. We convert the source relation to a lazy list. We then perform several maps and filters on the lazy list, then insert it into our destination table. Now the library goes to work- pulling items from the select, passing them through the various maps and filters, and inserting them into the destination insert.

It is the sign of a good abstraction layer that new forms of optimization can be added easily and automatically. And that’s what we’re seeing here. Many databases- including PostgreSQL- have the idea of a cursor, that allows us to bring results of a query back in small chunks, rather than all at once. This is a natural thing to use when converting a select into a lazy list- only pull in elements as they’re needed. On inserting in PostgreSQL, you can do a COPY, which is basically a form of insert where multiple elements are inserted at once, which has much higher performance than individual inserts. If you can batch multiple inserts into a single copy, you have much better performance. Note that all of this can go on without the intervention of the user of the library- this optimization goes on under the covers.

Another advantage functional programming confers on interacting with a database is immutable data structures- the elements of an instantiated relation/table are immutable, and thus the question of what happens when you change a row of a table is completely avoided, because you can’t change a row of the table except by explicitly inserting them.

Again, there is nothing here that can’t be done in an object oriented language- but functional programming encourages it to a greater extent. Like compositional data structures, lazy evaluation, especially lazy evaluation of data structures, was mainly forced on functional programming. But the paradigm is more broadly applicable then just overcoming the self-imposed limitations of functional programming. It is truly a different paradigm- in that it makes the impossible, or at least excessively difficult, easy, or at least easier. Several things still need to be worked- for example, the form of the selection_t and column_selector_t types, and what a row data structure (the member of the lazy lists casually tossed around above) looks like. But I think the broad strokes I have laid down here are fundamentally correct, and will lead to an interface layer that looks less like Hibernate, and more like SQL.

Popularity: 18% [?]

13 responses so far

Jun 08 2006

“Stoking the Beast”

Published by Robert Fischer under To Be Categorized

There’s an interesting article in June’s Atlantic. Although it prophesies the death of Roe v. Wade in big letters on the cover, the more realistic and interesting discussion is just inside. By “just inside”, I mean past the first 25 pages that are drowning in advertisements and smother the otherwise greatly entertaining Calendar.

The article is by Jonathan Rauch, the Atlantic’s introvert-in-chief, and a Brookings Institution name, and it has a single point that it delivers with impressive force: cutting taxes to “starve” the government has consistantly lead to an expansion of government.

In the Atlantic’s true “Don’t believe me — try this on for size!” style, there are a bunch of juicy numbers and historical defenses for this basis. The big one for me is this: over the last 25 years, a tax cut of 1% of the GDP corresponds directly to an increase in spending of about 0.15% of the GDP. Better yet, an equivalent tax hike of has corresponded to an equivalent decrease of about the same amount.

This isn’t hypothetical nonsense about what might happen — these are hard and fast numbers looking at the correlation between tax cuts and spending over the last 25 years, and they’re already controlled for unemployment (the biggest third variable in this equation). The reality is that, in our current climate, it seems like raising taxes corresponds directly to shrinking government. I buy this, although for reasons other than Rauch’s. I think that people who raise taxes have their fiscal policies consistantly coming under heat — after all, if you’re raising taxes, you must be spending wildly!! However, people who cut taxes are assumed to be fiscally responsible and in control of government growth, which has been demonstrably false.

And here’s the quote that I love.

…for the modern conservative coalition, the implications of his findings are discomforting, and in a sense tragic.

…the most effective constraint of all is to raise taxes and cut spending: exactly the sort of anti-deficit package that anti-tax conservatives pummeled the first President Bush and President Clinton for approving, and exactly the sort of package that the current President BUsh and his anti-tax allies are sworn to block.

Any disenfranchised conservative should already have a subscription to the Atlantic, and if they don’t have one, this is a good month to start. Any liberal who wants to try to win over the fiscal conservatives should read this article, too.

Popularity: 2% [?]

One response so far

Green Web Hosting! This site hosted by DreamHost.