I was going to write a blog post flaming this blog post, but I’ve change my mind. The only real sin he’s committed is making up his mind before all the facts are in my opinion.
But I don’t think he’s right. Not because message passing is bad or is going to go away- it’s not. It’s that it’s not a case of either or. The more I think about it, the more I think the future is going to be a combination of STM and message passing. Both have their advantages and disadvantages.
The advantage of message passing is that it’s distributable. You don’t care if the other process you’re sending a message to is on the same CPU or the other side of the planet, it doesn’t matter. The disadvantage of message passing is that it requires, at minimum, at least one copy of the data, and often multiple copies of the data. Plus task switching and cache flushing. It is heavy weight, at least compared to STM and shared memory multithreading.
This is a non-trivial problem. Parallelism comes in a gradiant, based on how much data needs to be transmitted between the processing elements per time period. One on extreme end of the gradiant you have very little data that needs to be distributed. The gold standard here is stuff done by distributed.net takes on- brute force crypto key searches and the like. We’re talking about a few hundred bytes every couple of days. This makes it trivial to distribute the computational load over widely dispersed machines.
Note that this also explains why supercomputers died in the 90’s, but big iron database servers didn’t. Supercomputers were, at heart, about solving either large numbers of small linear algebra problems, or a small number of large linear algebra problems. In either case, the problems were fairly easily distributed with small communication demands- a few megabytes per second- that supercomputers could be replaced by racks of PCs communicating via ethernet. Databases, however, require more communication between nodes (especially if you want to parallelize a single query, or allow multiple queries to access the same table simultaneously), and as such didn’t distribute via ethernet. So Sun and IBM are still profitable, Cray and SGI are dead.
Distributed databases are possible, for at least some workloads, I comment (before half a dozen people do). The problem is that only certain types of workloads distribute- write a query that doesn’t distribute, and all your work has to be done on a single box. Which is why they haven’t caught on- and Sun and IBM are still in business.
The problem is that everything needs to be parallelized. But being distributed isn’t quite as big an issue. What’s driving the sudden obsession with parallelization is the sudden appearance of multicore chips. But notice something about the multicore chips- how close the CPUs are to each other. Often they even share cache. If two processes can share a data structure by just sharing a pointer to the datastructure, communication speeds are nearly infinite. This allows for multithreaded programs that have incredibly huge communication requirements. Tightly coupled programs are possible in this environment.
Tightly coupled parallelism is most likely to arise with some form of automatic parallelism. I think the fullest power of STM will be found in a Haskell-like purely functional language (with STM built into the monads automatically, meaning you can’t accidentally forget the STM) combined with cilk-style microthreading. Lazy evaluation and parallelism are very similiar concepts- both are dealing with asynchronous computation. Lazy evaluation moves the computation to the first time the value is needed, while parallel evaluation the evaluation happens (conceptually, at least) in parallel. But this level of sharing requires very tight binding, very high communication rates. But I’m not doing a lot of computation per byte of communication. This is even more the case if the different CPUs are, in fact, running on an SMT CPU (aka Hyperthreading by Intel).
I actually think that a lot of problems will be ameniable to both levels approaches. Do message passing, ala MPI or Erlang, at the top level, breaking the problem up into big chunks which can then be distributed over a local network or a highly non-uniform ccNUMA architecture. But then each chunk is solved in a microthread+STM approach, allowing for good utilization of local resources and tightly coupled parallel processes.
As a side note, the “emergent complexity” he worries about having with STM is also a problem with message passing. Emergent complexity is always a problem- and the fundamental simplicity of the individual components doesn’t help (much). And, unlike STM, message passing does admit the possibility of a deadlock- thread A waiting for a message from thread B which is waiting for a message from thread A. It’s much less likely than in the classic (and unarguably awful) threads+locks paradigm, but it’s still possible. The only advantage message passing has over STM in the emergent complexity department is that it encourages bigger chunks per thread (due to the relatively high cost of communication)- fewer interacting components means less emergent complexity.
But fewer interacting components also means less parallelism, and less performance on future hardware. The new Moore’s law is that the number of cores will doubly every couple of years now. However parallel you program is today, it’ll need to be twice as parallel in a couple of years- and dozens to thousands of times more parallel in a decade or so! Actually, the situation is even worse than this. Once multithreaded programming becomes common, the new performance metric will be performance per transistor. If I can trade off 10% performance and fit the CPU in 1/5th the die size on the same process, I can fit 5 cores where 1 used to fit and the CPU is 4.5x faster for programs that can take advantage of 5 cores.
Of coruse, 5 cores doesn’t mean I get 5x the memory bandwidth. It actually becomes more important in this case to be tightly bound, so gaggles of threads can share cache efficiently. Five threads all talking to memory and not to each other would die on performance. On the other hand, this also means that moderately size machines will start being more non-uniform in the memory architecture. Talking to the thread on the next core over is very cheap- but talking to that thread microseconds away is painfull and slow. The former is a job for STM, the latter for message passing.
That said, I could be wrong. I know what damned well doesn’t work, but I don’t know what does work yet. One good thing does arise out of this debate, I comment. Wether it’s Erlang and message passing, Haskell and STM, or some hybrid, it looks like the solution is going to be functional programming plus something else. What really drove the adoption of OO programming was GUIs. Note that Doug Englebart was one of the inventors of OO programming at the same time he was inventing the modern GUI interface. Heck, the first Smalltalk dates from 1972. But it wasn’t until GUIs hit the mainstream in the mid/late 80’s that OO programming also took off. In a similiar way, I think parallelism is going to drive the adoption of functional programming. Which will be for the betterment of programming in general.
Popularity: 17% [?]