Aug 13 2007
It’s not just about multicores
There has been a lot of talk recently, with programmers waking up and realizing that with the advent of multicore CPUs, multithreading is here to stay. I’ve just come to the realization that they’re missing the point- it’s not just about the multicores. Even if you take multiprocessing out of the picture, and just look at, say, a web app hitting a database, you run into many of the same problems. Problems that functional programming solves.
The problem is that even if the system naturally scales like web apps do (one connection = one process), there are still advantages to having shared state. Database connection pools, if nothing else. But even within the context of a single process, there is a great deal of advantage to be gained in being able to do more than one thing at a time. And once you go multithreaded for whatever reason, all of the advantages of functional programming for multithreaded code come to the forefront.
But it’s not just about being multithread. It’s about expressing asynchronous computation. Which is usefull even in a single threaded environment.
The reasons I’ve been all about monads recently is that we’ve started using them at work. Dave and Stephen wrote a Deferred monad implementation, and I (and others) have started building libraries on top of it. For those of you who didn’t read my Monad tutorial, a Deferred monad is an object, which represents a computation in process, which will (sooner or later) produce a value of the given type. The two most important operations on a Deferred monad are bind and wait_for. You give the bind operation a function to execute when the deferred value is complete and exists, using the value created by the process. It creates a whole new Deferred monad, which is complete when both the original monad is complete, and the passed-in function (which can block itself) is also complete.
Now, Ocaml is still single-threaded, in that the run time holds a single global lock when executing Ocaml code, so only one thread can be running (Ocaml) at a time. You can have lots of threads running, but most of them will be blocking, and you won’t be utilizing multiple cores in a single program.
But that’s not what we’re using Deferred for- we’re using Deferred for big, blocking operations- talking to the filesystem, or the network, or the database. Processes that spend most of their time blocking waiting for I/O. Exactly the sort of problem Chia says will be the last to go functional.
Take the following scenario: you have a database query set up as a cursor, so you can return the results in small chunks (the number of results is way too large to bring back as a single result set). You’re going to be processing the results in some way, and then inserting them into another database connection.
This code is probably normally structured like this: first the query is fired off to read the next chunk of data from the cursor. The program then blocks waiting for the database to return the data from the query. When it returns, the program does it’s processing, and then fires off a query to the second database connection to write the results. The program then blocks waiting for the write to complete, before looping back to the begining. Repeat until you have no more elements to process.
The problem with this structure is that it’s incredibly inefficient- at any given time you are only doing either a read query or a write query- you’re only ever using half your available bandwidth. Now, imagine we did instead: when we’re done reading block n, we immediately fire off the read for block n+1, but we don’t wait for it to complete. Instead, we do whatever processing we need to do on block n, then we wait for the write of block n-1 to complete, before starting the write of block n. But we don’t wait for that write to continue, we immediately loop and wait for the read of block n+1 to complete. Now we’re doing both a read and a write simultaneously, doubling our throughput (assuming the network and databases can keep up).
Assuming you have a sufficiently AJAX/SOAP/XMLRPC/Web 2.0 -ish environment, you can end up in the same situation, where you have two (or more) blocking I/Os that you want to overlap. Using the deferred monad, overlapping the I/Os is easy- indeed, it’s the natural way to express the problem. The dependency that the read of block n+1 can not start until the write of block n is complete is unnecessary and “accidental”.
But this means that the program is being executed asynchronously. Even if it is being executed in a single thread (as in Ocaml), you still have the problems of multithreading- the problems aren’t about multithreading, they’re about asynchronous code execution. As C programmers have discovered with signals.
Note that the problems tend to be much coarser grain. You don’t have to worry about x := x + 1; being a race condition. But the problems still occur.
Functional programming, on the other hand, is all about asynchronous programming. Take a look at my later monad implementations- how many of them were building up large continuations, functions to be executed at a later point? Or consider lazy evaluation. In both cases you have peices of code which are not executed immediately, but quietly saved to be executed at some unspecified later point.
Yeah, yeah, I can here all the imperitive programmers now going “Callbacks aren’t unknown!” Not unknown, yes. But still pretty rare, compared to Ocaml.
Now, consider Haskell.
Everything in Haskell is lazily evaluated. It’s all callbacks, and asynchronously executed code. This is why Haskell is an important language to know, even if you’re not programming in Haskell. Because Haskell puts the asynchronicity front and center. No dodging the question, no avoiding the issue. The whole concept of monads as a design pattern came out of Haskell. Most of our purely functional lazy data structures came out of Haskell. More ideas are brewing in Haskell, such as Arrows (which I freely admit I don’t understand yet). And STM came out of Haskell, as Haskell is a natural language for thinking about concurrency in.
Oh, and multicore chips are here, and are just getting more cores. We’ve delayed long enough in solving the problems of asynchronous program execution, now we have no choice. But let us not be blind to the fact that there are advantages to solving this problem even if we didn’t have too.
Popularity: 29% [?]







The deferred monad you’re talking about I think would more normally be called a future.
Futures are pretty good (I use them all the time), but they do have two practical problems. The simplest to understand is that any cycle in the dependency graph can cause a deadlock so futures aren’t in general deadlock free. Depending on how you’re monad is programmed it may side step this problem by itself not allowing cyclic dependencies — although this may stop you from being able to do useful things like pass lambdas into functions that can block on a future.
The second has to do with the number of threads that get consumed. The problem here is that you end up consuming a lot of threads and sharing calls between threads means that you may deadlock where otherwise you wouldn’t — because you may block on a future that happens to be in a thread that’s blocked waiting on you.
Neither of these problems will become apparent until you actually start to use the futures in a multi-threaded environment.
If you want to experiment with futures I can send you an alpha copy of Mahlee which will let you play around with them in JavaScript. It can run in either a single threaded or multi-threaded environment and you’ll find it quite easy to write code that will block when run multi-threaded but not single threaded.
Mahlee is object oriented rather than functional, but you’ll find that makes a lot less difference than you might expect — the issues around the use of futures are identical.
http://www.kirit.com/Categories:/Mahlee%E2%84%A2
I’m aware of this, and putting some time into mastering Erlang at the moment.
Out of curiosity, what do you think of Haskell & Erlang? They both seem similar from my newbie perspective other than the fact that one is static/dynamic typing, and I know Haskell has a thing called type classes which is apparently a pretty good thing.
Chris
I’m still not sure how valuable these kinds of things are for a web application. In that structure, you’re usually unable to advance much (if any) computation until you get the result of the SQL query or file read back. And letting the process block in order to let another process go ahead is a nice, simple way to get the scaling you need on a busy web server. Your system would go faster in a non-busy web server, but that’s not the case we’re optimizing for.
And if you’re digging at this spot, I’m not sure why you don’t just go to JoCaml, particularly considering its recent resurrection and the increased interest.
(BTW, Java technically has a Future, with some implementations, but I’ve never seen a particularly good use of them. Does anyone have an example with code?)
AFAICS, what you’re talking about is a monadic style (or say CPS, as next op passed in as arguments of bind) user-space threads implementation. If that’s true, there are several such kind of OCaml libraries existed for long time. E.g. lwt is probably the most popular one and adopted in many projects (unison, ocsigen, ocamlp3l etc). Could you elaborate the difference here?
I’m also very curious what your thoughts on Erlang specifically are and working on learning some at present.
Robert: You are missing the big picture because you seem to be thinking with the assumption that 1 process is handling 1 request. I want 1 process to handle hundreds of requests at the same time. It could easily have 30 or 40 queries (all from different users) that have been sent to the database and are waiting for results. Meanwhile its still going ahead and doing the rest of the work for other requests: serving pages, rendering templates, etc. Just because you can’t advance the computation of the request that triggered the database query, that doesn’t mean you can’t advance the computation of the other hundred requests while you’re waiting on the db.
A single process application server using userland threading and non-blocking I/O is much faster and more effecient than a process per request model, and it allows for much easier caching and sharing data. This allows you to write your app in a better performing way instead of being forced into using a bad implimentation by the architecture limitations.
Lets say I want to have in my ruby on rails or PHP app a page that opens an 8MB file and sends its contents to clients while doing a search and replace on its contents (the replaced value varies per user). There is no good way to do this using a process per request model like rails or PHP, you have to do it a bad way just because of the limitations imposed by the multi process architecture. In a good app server you can read in the file on startup, search for the value and split the file contents on that in advance, and then when serving each request it can share that data and just plug in the per-user replacement data along the way.
Joe –
My point was that 1) I’m not sure if I buy the complexity/benefit trade-off of using futures to have one single-threaded process going to town on all these things, and 2) if you are already heading this direction, why not just just go to JoCaml?
You didn’t approach #2, so let’s focus on #1, and specifically, let’s consider your case. To handle that situation in the process-per-request Rails implementation, you could provide a service sitting on that file which the actual request handling would call out to (see REST). That way, you don’t have to even think about (or fork) the big file munging code while you’re working on the user management code. And if your user management or file service goes down, it doesn’t take the other with it. Plus you get enhanced and more targetable scalability, generally cleaner code, a versioning balwark, etc., etc. — all the normal service-oriented architecture goodness.
Even if I had a fancy single-process-multiple-request structure, I would probably structure things that way. So your example doesn’t have any architecture consequence.
And how is your REST service architected? Pushing the problem back with another level of unwanted overhead isn’t solving the problem, its throwing up your hands and saying “my tools are incapable of solving the problem, I will use some other tool instead”. I have no idea why you think having the user management and file service depend on each other is bad, they have to. If the user management is broken, then the file service can’t get the user specific info to insert into the file, so its broken too. Seperating them (as in seperate processes) buys you none of the things you mentioned, I can’t think of any benefits at all. Obviously seperating them code wise (as in modules) would still be done when using a single process, so the idea that the code would be cleaner is silly.
And there is nothing fancy or complicated about this architecture, its perfectly normal. In C/C++ land its done using callbacks in an event driven style (notice how all the high performance webservers are like this?), but for the sake of making it easier for programmers, using a lightweight thread/task abstraction is nicer (although equivilent).
As for why not jocaml, there is no real reason why not. Its the same idea, just using an existing implimentation vs writing your own. In practical “I am writing this as we speak” terms, jocaml may or may not perform adequately, or offer the flexibility needed, I haven’t looked into it much yet. But in the big picture “can’t jocaml do this now or at some point in the future?”, yes it can/will.
@Joe
Check into the advantages of service-oriented architectures for information on why you would want to break them apart. It is a standard way of approaching business applications, particularly those on the net, and the advantages are primarily advantages of scale, resiliency, and maintainability.
I grant that you could have one process doing all that stuff, and that’s a nifty technological feat — I just don’t see a lot of practical gain out of it. Sure, Rails scales using process-per-request…but the only problem with that is that the process is being too many things at once (user communication tool, database communication tool, file system communication tool). I’d rather fix the underlying problem (have the request handler do one thing and do it well) than gloss over it with a cool trick.
I’ve got nothing against cool tricks per se — I’ve done more of them in my time than I probably should have — I just don’t see this one as a revolutionary new capability.
Bulk replies once again. Chia- I do kinda wish we had a more threaded model for comments, so replys can go with the original comments.
Kirit: That’s the big advantage Deferreds have over Futures- you can’t express circular dependencies. Which is good, because circular dependencies are always deadlocks. If monad B depends upon monad A, monad A can’t depend upon monad B. Also note that we’re doing functional programming’s approach of having lots of immutable objects- the deferred monad for reading line n and the deferred monad for reading line n+1 are different objects. So you can express that some computation requires line n to be read, but must be completed before line n+1 is read.
Chris Khoo, Brit Butler: I’m learning Haskell at the moment. Primarily because the lack of static typing is annoying. Both have interesting things to say about asynchronous programming. Also, I’d recommend this post of mine for more indepth discussion of functional programming and parallelism.
Chia: You’re right- it’s a good sign when a web server takes a full minute to process your query because then you know the web server is making full utilization of it’s I/O capabilities. More seriously: simply because that is what people people are optimizing for doesn’t mean it’s what they should be optimizing for. They may just be making lemonaid from the lemons they’ve built. Note that the monad implementation has the same performance as the naive implementation in the loaded case (the monad implementation will simply devolve into doing everything synchronously). The big advantage the monad implementation will have is that it will be signifigantly more responsive in the common case where the server is unloaded, and I/O can be done in parallel.
As for Jocaml, I’m not sure the two are completely contiguous. Most of the computations being deferred are really, really small (a few dozen, maybe a few hundred, clock cycles). Forcing a task switch costing thousands to tens of thousands of clock cycles to perform a few hundred clock cycles is just silly.
code17: a more accurate comparison would be with old-style big event loops. The heart of the deferred monad is a big ol’ select call.
“Check into the advantages of service-oriented architectures for information on why you would want to break them apart. It is a standard way of approaching business applications, particularly those on the net, and the advantages are primarily advantages of scale, resiliency, and maintainability.”
Layering 10 levels of java crap on top of each other is a standard way of approaching business applications too, that doesn’t mean its a good idea. Again, in this example seperating file download part out of your app just moves the problem. You still have to impliment the file download, and you are either going to do it nicely using an event driven or userland thread/task approach, or you are going to do it poorly using a process per request approach (or a little less poorly using a thread per process approach). Admitting that there is no way to do this with PHP/rails doesn’t exactly support the claim that a process per request model is ideal.
“I grant that you could have one process doing all that stuff, and that’s a nifty technological feat — I just don’t see a lot of practical gain out of it.”
Scaling and performance are the practical gains. A well architected app server will scale and perform signficantly better than a process per request server will in some cases, and it will perform slightly better in other cases. There’s no trade off where its worse for anything, so why not do it this way? Compare ocsigen to apache+php, there’s a whole slew cases where php just can’t do things effeciently because of its process per request model, where ocsigen will handle it just fine.
“Sure, Rails scales using process-per-request…but the only problem with that is that the process is being too many things at once (user communication tool, database communication tool, file system communication tool).”
The other problems are it makes caching far more complex and slower, it is terrible for handling large numbers of simultaneous connections, it makes persistant connections for long lived multi-request processing so complex that its simply not done, it makes you either waste tons of db connections, start and tear down db connections on every request, or add yet another layer to handle db connection pooling for you because rails can’t since its architected wrong, and I’m sure there’s other problems I am not thinking of off the top of my head. Considering the fact that there is no downside to using an effecient architecture, why would you want to use a poor one like process per request?
“I’ve got nothing against cool tricks per se — I’ve done more of them in my time than I probably should have — I just don’t see this one as a revolutionary new capability.”
Its not supposed to be a revolutionary new capability, or a cool trick. Its just demonstrating one way to create a nicer abstraction that gives you the architectural benefits of event driven servers, while being simpler and easier to understand/debug/code than using an event driven style directly.
This is why DHH’s hand-waving dismissal of rails poor scalability is so pathetic. He pretends that “its just like every other shared nothing architecture so it scales perfectly”, while completely ignoring the practical implications of using such a bad architecture in reality.
Brian: Does jocaml actually use heavyweight threads/processes that would force a context switch? Having not looked at the code, I would have hoped it was implimented using a single threaded userland scheduler like erlang does, allowing for tens of thousands of “processes” (tasks) with very low context switch costs. How useful jocaml will be depends a lot on the implimentation details.
Also guys, you’ve got a PHP warning on the bottom of the page.