There has been a lot of talk recently, with programmers waking up and realizing that with the advent of multicore CPUs, multithreading is here to stay. I’ve just come to the realization that they’re missing the point- it’s not just about the multicores. Even if you take multiprocessing out of the picture, and just look at, say, a web app hitting a database, you run into many of the same problems. Problems that functional programming solves.
The problem is that even if the system naturally scales like web apps do (one connection = one process), there are still advantages to having shared state. Database connection pools, if nothing else. But even within the context of a single process, there is a great deal of advantage to be gained in being able to do more than one thing at a time. And once you go multithreaded for whatever reason, all of the advantages of functional programming for multithreaded code come to the forefront.
But it’s not just about being multithread. It’s about expressing asynchronous computation. Which is usefull even in a single threaded environment.
The reasons I’ve been all about monads recently is that we’ve started using them at work. Dave and Stephen wrote a Deferred monad implementation, and I (and others) have started building libraries on top of it. For those of you who didn’t read my Monad tutorial, a Deferred monad is an object, which represents a computation in process, which will (sooner or later) produce a value of the given type. The two most important operations on a Deferred monad are bind and wait_for. You give the bind operation a function to execute when the deferred value is complete and exists, using the value created by the process. It creates a whole new Deferred monad, which is complete when both the original monad is complete, and the passed-in function (which can block itself) is also complete.
Now, Ocaml is still single-threaded, in that the run time holds a single global lock when executing Ocaml code, so only one thread can be running (Ocaml) at a time. You can have lots of threads running, but most of them will be blocking, and you won’t be utilizing multiple cores in a single program.
But that’s not what we’re using Deferred for- we’re using Deferred for big, blocking operations- talking to the filesystem, or the network, or the database. Processes that spend most of their time blocking waiting for I/O. Exactly the sort of problem Chia says will be the last to go functional.
Take the following scenario: you have a database query set up as a cursor, so you can return the results in small chunks (the number of results is way too large to bring back as a single result set). You’re going to be processing the results in some way, and then inserting them into another database connection.
This code is probably normally structured like this: first the query is fired off to read the next chunk of data from the cursor. The program then blocks waiting for the database to return the data from the query. When it returns, the program does it’s processing, and then fires off a query to the second database connection to write the results. The program then blocks waiting for the write to complete, before looping back to the begining. Repeat until you have no more elements to process.
The problem with this structure is that it’s incredibly inefficient- at any given time you are only doing either a read query or a write query- you’re only ever using half your available bandwidth. Now, imagine we did instead: when we’re done reading block n, we immediately fire off the read for block n+1, but we don’t wait for it to complete. Instead, we do whatever processing we need to do on block n, then we wait for the write of block n-1 to complete, before starting the write of block n. But we don’t wait for that write to continue, we immediately loop and wait for the read of block n+1 to complete. Now we’re doing both a read and a write simultaneously, doubling our throughput (assuming the network and databases can keep up).
Assuming you have a sufficiently AJAX/SOAP/XMLRPC/Web 2.0 -ish environment, you can end up in the same situation, where you have two (or more) blocking I/Os that you want to overlap. Using the deferred monad, overlapping the I/Os is easy- indeed, it’s the natural way to express the problem. The dependency that the read of block n+1 can not start until the write of block n is complete is unnecessary and “accidental”.
But this means that the program is being executed asynchronously. Even if it is being executed in a single thread (as in Ocaml), you still have the problems of multithreading- the problems aren’t about multithreading, they’re about asynchronous code execution. As C programmers have discovered with signals.
Note that the problems tend to be much coarser grain. You don’t have to worry about x := x + 1; being a race condition. But the problems still occur.
Functional programming, on the other hand, is all about asynchronous programming. Take a look at my later monad implementations- how many of them were building up large continuations, functions to be executed at a later point? Or consider lazy evaluation. In both cases you have peices of code which are not executed immediately, but quietly saved to be executed at some unspecified later point.
Yeah, yeah, I can here all the imperitive programmers now going “Callbacks aren’t unknown!” Not unknown, yes. But still pretty rare, compared to Ocaml.
Now, consider Haskell.
Everything in Haskell is lazily evaluated. It’s all callbacks, and asynchronously executed code. This is why Haskell is an important language to know, even if you’re not programming in Haskell. Because Haskell puts the asynchronicity front and center. No dodging the question, no avoiding the issue. The whole concept of monads as a design pattern came out of Haskell. Most of our purely functional lazy data structures came out of Haskell. More ideas are brewing in Haskell, such as Arrows (which I freely admit I don’t understand yet). And STM came out of Haskell, as Haskell is a natural language for thinking about concurrency in.
Oh, and multicore chips are here, and are just getting more cores. We’ve delayed long enough in solving the problems of asynchronous program execution, now we have no choice. But let us not be blind to the fact that there are advantages to solving this problem even if we didn’t have too.
Related posts: