Sep 09 2007
Use vr.s Reuse, or The Second Derivitive of Programming
I think I’ve come to the conclusion that anything Raganwald writes is required reading. Even when I don’t agree with what he’s writing. As it is with this post on abstraction vr.s abbreviation in programming. Well, disagree with is a strong term- when the question is phrased that way, I definately agree with him, I just don’t think it’s illuminating to phrase the question in that way. Productivity really has very little to do with either abstraction or abbreviation
The trick with programming is not writing code, the trick with programming is not writing code. In that the fastest code to write and debug, the fastest path to working code, is to not write code at all. So it’s all about the reuse, as opposed to the use, of code. But most languages have to become popular before the large libraries show up- and that’s where the power comes in. Not the power to write code, but the power to not write code.
Let me unpack that a bit.
First, let me define a term. What do I mean by a use of some code, vr.s a reuse of some code. In common usage, these two terms mean exactly the same thing, except that reuse implies that it is not the first use, but there is, in fact, some other use some where some when that happened before. What I am appropriating these terms to mean is this: when the code was written (or last refactored), there was some set of known use cases that the code is expected to be used in. These are the uses of the code. Note that the set of use cases the code is written for is often larger than the technical specification might require. And the set might only exist in the programmer’s head. But never the less, it exists. And, since we humans are finite, is finite. Then there are all the other possible uses of the code, uses which are (to some extent) a surprise to the architect of the code. These are the reuses of the code.
Now, any code in any programming language can be made usable. That’s not the question. The problem is that requirements change, needs change, APIs change. As Raganwald reminds us, we can’t know every purpose we’re going to use the code for ahead of time. And the more we can (re)use the code, the more valuable it is- and the less code we have to write. We don’t have to (re)write the code- it’s already written and debugged, we can just (re)use it as is. Not needing to write the code is going to be faster than writing code. No matter how easy it is to write the code, the cost is still non-zero, so zero cost still wins.
At this point, several people are probably jumping up and down and hollering “refactor!” A question to those people: do you think that reuse, as I’ve defined it, even exists? Or is at least common enough to be have an impact on development speed? Because, by the definition I’ve laid out above, refactoring is not reuse. Refactoring is the process of adding uses cases to the set of uses cases a peice of code is designed to be used with. You can tell this by the fact that “rearchitect” and “redesign” are commonly used as synonyms for refactor.
Another synonym for refactor is “rewrite”. And herein lies the rub. Programmers are always making the judgement call, when some previously written peice of code isn’t exactly what they need- comparing how hard it is to fix the code (refactor/rearchitect/redesign the code) vr.s how hard it is to simply rewrite the code. Everyone who has ever worked on a large scale project has stories to tell of the massively duplicated function- the 100 different ways to plot a pixel on the screen, for instance.
Rewritting code is simply the most egregious example of the problem- as when the programmer decides to scrap and rewrite a bunch of code, the code being scrapped and rewritten now has zero value. It has non-zero cost- it cost time to write, to debug (to the extent it got debugged). And rewrites are rarely contained, generally the code using the rewritten code has to be rearchitected and refactored to take into account the new code. But all the different degrees of refactoring have this problem- the time you spend refactoring code is time not spent writing new code, solving new problems, or adding new features.
And here’s the point of this whole diatribe: that various language features help or hurt reuse. The more features a language has to help reuse, and the fewer features a language has to hurt reuse, the more powerful that language is- as the less time you will spend refactoring and the more time you will spend writing new code.
Indeed, I’d argue that encouraging reuse is the reason some languages are more powerfull than others- and for why this is so, I’m going to bring up an insight Robert Fischer here had a while back on project management- he said “it’s all about the second derivitive of code.”
For those of you who have forgotten (or never taken) your calculus, let’s say you have a function P(t) which represents your position at time t. The derivitive of your position is your velocity- how fast is your position changing. The second derivitive of your position is then your acceleration (or decelleration, if the second derivitive is negative)- how fast your velocity is changing. Applying this to code, then, the position is how much working code you have- by whatever measurement standard you want to use, lines of code, function points, working features, whatever (just like it doesn’t matter if you’re measuring you position in millimeters or furlongs). The derivitive of this is then how fast you are adding new code. The second derivitive is then how fast you are adding new code is changing. The second derivitive, then, is the leading indicator of future code productivity- if acceleration is decreasing, or worse yet, turning into deceleration, this forecasts long term problems for the project.
Early on in the project, most teams have a high velocity, and at least a non-negative acceleration. They’re adding new code/features/whatever all the time. And the code being added is, at least initially, making it easier to add yet more code. This is because the new code is, by and large, using the old(er) code, and not reusing it. As time goes on, however, the use vr.s reuse ratio falls, and more time has to be spent in the red queen’s race of refactoring code. For most projects, it seems, acceleration goes negative fairly quickly, and code development slows down. Acceleration continues being negative until velocity goes to zero, and the project halts. Shortly thereafter, velocity goes negative, and features that used to work stop working. At which point, software development becomes not unlike the Red Queen’s race: running faster and faster just to stay in place.
For the lucky projects, acceleration goes negative for a while, then returns to zero, and velocity does not fall to zero but instead limits out at some positive value. Forward progress continues being made. These projects are exceptional enough that we tend not to ask how much forward progress is being made, at what cost, and what is preventing yet more forward progress from being made.
So the question is, where does the code velocity of a project limit out at? This is where reuse comes in. Consider the steady state of constant code velocity- the project with a higher velocity will, sooner or later, overtake and surpass the project with the lower velocity. And the more time the programmers spend refactoring, the less reuse there is, the lower the code velocity. It may take a while- depending upon the differences in asymptotic velocity and how much of an initial deficiency needs to be made up, but it will happen.
So what sorts of features encourage reuse, or discourage reuse? To answer this question, it’s usefull to think of non-use cases, ways to use the code that may not work, or won’t work as expected. Take, for example, the gold standard for breaking reuse: Perl’s global variables. It’s very easy (even when using strict) for a perl function to be dependent upon the value of $* say, or $/, without it being obvious upon casual inspection of the code, or documented. Use of a function where these variables do not have “correct” values is incorrect.
Or take C++’s lack of garbage collection. This is another feature that limits reuse, as opposed to use. Any object, any function, needs to specify what the memory management requirements of the objects it passes in or out are (those that aren’t pass by value). The calling function and the called function have to agree- otherwise you get memory leaks, double frees, and/or dangling pointers. Again, this doesn’t mean that you can not use code, it means there is a limit on the reuse of the code.
Another example: synchronized methods in Java. If don’t synchronize a method, then that method can not be called simultaneously by multiple different threads (or if it is, it’s a race condition waiting to happen). If you do, you’ve littered you code with performance-killing synchronized in many places where you don’t need to do. Sun itself kind of figured this out with it’s standard library, eventually opting to provide two complete APIs, one synchronized, one not.
Another example: mutable data in any programming language. When you pass mutable data between objects/modules/functions, there is an implicit contract between the sender and receiver on how (and wether) the mutable data will be changed. Violating this contract by either party is liable to lead to nothing but trouble.
Another important difference languages provide- how easy is it to “paper over” reuse problems? Here’s an example I run into constantly with Ocaml: function foo takes a function which it calls. The function you pass in takes two arguments. I really want to pass function bar in for foo to call, but dang it, the arguments are in the wrong order. No problem- just whip out a quick anonymous function: foo (fun x y -> bar y x) and you’re done.
This is an important example. There was an obstacle to the reuse of bar in this case. Obviously I wasn’t in a use case, if I was, the arguments would be in the right order. But it was easy enough to work around the problem and still reuse bar. Reuse doesn’t just occur in big chunks, it also occurs is small bits.
Duck typing, aka structural type equivelence, is also a big feature helping code reuse. All of the various interfaces and base classes a Java or C# class implement- those are use cases. The developer had to explicitly state “yes, it’s OK to use this class as that interface or class” when the code was (re-)written. Use of the class outside of the explicitly white-listed use cases, reuse of the class, is explicitly verboten. It doesn’t matter if the class can quack and waddle, if the programmer didn’t say you could use it as a duck, you can’t use it as a duck. At least, not without a lot of work. I fully agree with the dynamic typing people that this is an important feature, I just disagree that it’s limited to dynamically typed languages.
Large libraries are usefull- but they only change the initial starting point of the project. There is a power to a language above and beyond just it’s library collection- otherwise, we’d all still be using Fortran and Cobol. And large libraries are more a sign of popularity than power in any event- this is why large corporate backing is so usefull, as it creates an “artificial” popularity.
Popularity: 22% [?]








One thing I wanted to work in to this post, but just didn’t “fit in”, is Haskell’s deforrestation and it’s importance to reuse. Consider the code:
map f (map g lst)What the Haskell compiler does (well, the GHC compiler with the right options turned on, anyways) is convert the above code to:
map (f . g) lstwhere the period is the composition operator (
(f . g) x = f (g x)).Now, what’s important about this is that the original nested maps are not something programmers tend to write straight up (although maybe we should)- code like that doesn’t tend to show up in the use cases programmers tend to think about. But they are a common pattern that shows up when you’re reusing code.
A common excuse programmers give when breaking reuse is performance. It’s important for the compiler to eliminate that excuse as much as possible, and that encourages reuse. Deforrestation is one of the many reasons Haskell is a powerfull language.
That’s not refactoring like I would use the term.
Wikipedia (the real-life The Guide) has a better definition: A code refactoring is any change to a computer program’s code which improves its readability or simplifies its structure without changing its results.
The purpose of refactoring a piece of code is not to tack on additional use cases — it’s to make maintenance of the code easier. This is why you don’t have to write or change test cases while you’re refactoring: the API is staying stable.
For more on this, see the 2nd and 3rd paragraph of my “Implementation Exposure Through Inheritance” post.
Development Acceleration: The Second Derivative of Functionality…
So, Brian beat me to the punch: he’s posted about The Second Derivative of Programming, which is an idea that we’ve been kicking around for a while over the phone. It’s a good post, and I highly suggest tackling it.
(Note to self #1:…
[...] second derivative. So I started reading ‘Use vr.s Reuse, or The Second Derivitive of Programming’ (sic), and soon enough bumped into some points I disagree with, I thought they were eager [...]
[...] code, and just the core logic is left behind. It also enables that wonderful productivity boost of surprising reuse in more places: you don’t have to be a Java coder for too long before you discover a place [...]
[...] Brian beat me to the punch: he’s posted about The Second Derivative of Programming, which is an idea that we’ve been kicking around for a while over the phone. It’s a [...]
[...] Use vr.s Reuse, or The Second Derivitive of Programming [...]