Archive for the ‘languages’ Category

The Verbosity Difference

Tuesday, August 18th, 2009

I like conciseness. Syntactic sugar increases the amount of code I can fit on a single screen, which increases the amount of code I can read without scrolling. Eye saccades are a hell of a lot faster than keyboard scrolling, so not having to scroll is a good thing.

However, I recently realized that simple absolute size is actually the wrong metric with which to judge language verbosity, or at least not the most important one.

Consider the evolution of a chunk of C++ code. We start with a single idea, and encode it as a single class to encapsulate the structure. We add a class declaration, some constructors and a destructor, perhaps even a private operator= to disallow copying. Fine. After this boilerplate, we add various methods to the class to encode the actual behavior. The class also develops a few fields, because fields let us easily share data between the related methods.

Next we have another idea. Conceptually the new idea is distinct from the original one, so we should really make a new class. However, we’ve just gone through all the work of setting up a C++ class, with it’s constructors, destructor, private operator=, access specifiers, etc., and it’d be a shame to have to redo all that effort. Maybe it won’t be so bad if we just add the new idea into the same class…

Boom. Now we have two ideas merged into the same class. You can’t pass around one idea without passing around the other. You can’t rewrite one without analyzing dependency chains to make sure the class fields doesn’t overlap between concepts. After a while, we start to forget that the ideas were ever really distinct. That’s right: the language has actually made us stupider.

You can’t blame the programmer here. We were only maximizing our local utility. We might be smart, but we’re not omniscient, and we can’t always be bothered to follow style manuals. The problem also can’t be ascribed to the overall verbosity of C++; it’s quite possible that the code would be larger if it was written in C, since C++ class syntax, fields, etc. really can make for smaller (source) code.

The problem is that the marginal cost of adding a new class is greater than the marginal cost of extending an existing class. If it was easier to make a new class, we would have done so. But we would also have made a new class if it was harder to add methods to an existing class, because then the trade-off would have been different. In other words, what matters is the difference in verbosity between the “right way” and the “wrong way”, not the absolute level of verbosity.

Therefore, the conclusion is that any new abstraction with a large startup cost but a low marginal cost is bad, because people end up merging them in disgusting ways. Examples include interfaces (adding one more method is easier than splitting one interface into two), Haskell type classes (see fail), and monads (once you’ve converted your code into monadic form, making the monad do something else is easy).

Similarly, any abstraction which merges two benefits into one language construct is also bad, even if the extra benefits are free. The best example of this is inheritance, which merges the benefits of code reuse and subtyping. If I’m making a new class, and it would be really convenient to be able to call one of them methods in an old class, I may end up inheriting from that class in order to save typing even if subtyping makes no sense. By contrast, if I’d been writing the same code in C, that function I really wanted to call would probably just be a function, and I’d just call it. Object oriented programming makes you stupider.

Happily, it’s easy to notice when you’re running into one of these language flaws. Most of us have a good sense for what the right way of doing things is. If we set out to write a new piece of code, the right way will generally be the first thing that comes to mind, but then we’ll remember that doing it the right way is hard. We’ve probably trained ourselves not to notice this conflict after years of painful compromise, so all we have to do is untrain ourselves.

Haskell vs. C

Tuesday, June 30th, 2009

Here’s a summary of the differences between typed functional languages and unsafe languages:

  • Difficulty of easy things: Haskell ~ C
  • Difficulty of hard things: Haskell < C
  • Difficulty of impossible things: Haskell >> C

Kudos to anyone who knows what this means.

Consciousness vs. efficiency

Thursday, June 25th, 2009

Imagine a computer stored in a box with a single small hole connecting it to the outside world. We are able to run programs inside the box and receive the results through the hole. In fact, in a sense results are all we can see; if the program makes efficient use of the hardware inside, the size of the hole will prevent us from knowing exactly what went on inside the box (unless we simulate the workings of the box somewhere else, but then the box is useless).

The human brain is a good example. To an outside observer, the hole is speech; other people can’t know what you’re thinking any faster than you can say it. However, speech is also the hole to ourselves. As I write these words, I am only fully aware of them as they appear on the screen in front of me. Until then, I do not consciously know what they are. I can choose to become conscious of them if I say the words to myself internally, but I must slow down in order to do this. The reason is that I am capable of being fully conscious of only one thing at a time, and it is more efficient to be conscious of the words visually on the screen rather than as I “think” of them.

Thus, we have a trade-off between consciousness and efficiency. In order to be fully aware of our thoughts, we must slow them down until they fit through the hole of consciousness. Conscious thought is necessary in order to correct mistakes in our thinking, remember our conclusions, and communicate with others. However, since our brains are internally structured as a massive parallel computer, the only way to use our brains efficiently is to not be aware of what we’re thinking.

This trade-off applies to thoughts of all kinds, and most areas of human endeavor require carefully switching between the different modes. For example, working out a mathematical proof is often said to require intuitive “leaps” of thought. The reasons these appear as leaps is not because they are actually sudden; our brain does not sit around doing nothing and then suddenly have the idea. Rather, our unconscious mind is considering various different avenues of thought in parallel. If an avenue appears fruitful, the unconscious part will make it available to the conscious part, and it seems to “pop into our minds”. Similarly, it is very difficult to consciously try to remember a particular fact. If you let your mind wander, the unconscious part is better able to search around for what we need in a massively parallel fashion, providing us with the answer asynchronously.

Movement is the same. When you dodge a ball thrown at you, your unconscious sees the ball and moves out of the way before there is time to articulate what is happening. Soon afterwards, your brain retroactively explains what happened: “If anyone asks, say we saw the ball coming towards us and dodged.” One of my hobbies is swiveling to fit through doors as they close without touching them. The last time I did this, I had a vague memory of my vision dimming during this motion, almost as if I was passing out. I think this effect was my consciousness temporarily shutting down to avoid interfering with the reflexive motion.

The same trade-off applies to computers and algorithms. Consider the problem of checking a C++ program for type errors. Current compilers do this by running full template instantiation, which generates code for each instance of the template. In other words, the compiler is “conscious” of the entire process. If we all want is the type errors, it would be much faster to use a specially tailored algorithm that checked for errors only, remembering only enough of the results to speech up the rest of the error checking process. It would be even faster if want to know only whether errors exist, not what they are, since the program could forget line numbers and other details. The cool part is that it is possible to combine the different approaches to get the best of all words: we can running error checking before code generation to reduce the latency of reporting errors back to the user, and we can speed up error checking by first running the stripped down yes/no algorithm and reprocessing any portion that has an error. In a decent programming language, all three variants can be generated from the same source using partial specialization.

This last point is very important, since it means that the trade-off between consciousness and efficiency can often be eliminated; we can start with “fully conscious” code (which remembers everything it does) and apply various “forgetfulness” transformations to shift towards the efficiency side. The different versions can be seamlessly interleaved so that it looks from the outside as if the fully conscious version is operating at the speed of the fastest version; the missing detail is recovered only when we need it. This is similar to human vision; any object we look at appears sharp, so we generally imagine that we see with uniformly high detail. What is actually happening is that almost all of our visual field is blurry, with one high resolution point in the center. We can shift this high resolution point to anywhere we wish, so we get the illusion of uniform sharpness for free.

Taking full advantage of this trade-off to speed up programs will require languages that combine low level and high level features and make it easy for the program to inspect and transform its own code. I’ll write more about this later.

Two Quotes

Friday, April 17th, 2009

These are very refreshing:

From Hudak et al., “A History of Haskell” [1]:

The fact that Haskell has, thus far, managed the tension between these two strands of development [as a mature language, and as a laboratory in which to explore advanced language design ideas] is perhaps due to an accidental virtue: Haskell has not become too successful. The trouble with runaway success, such as that of Java, is that you get too many users, and the language becomes bogged down in standards, user groups, and legacy issues. In contrast, the Haskell community is small enough, and agile enough, that it usually not only absorbs language changes but positively welcomes them: it’s like throwing red meat to hyenas.

And Alan Perlis [2]:

“I think that it’s extraordinarily important that we in computer science keep fun in computing. When it started out, it was an awful lot of fun. Of course, the paying customers got shafted every now and then, and after a while we began to take their complaints seriously. We began to feel as if we really were responsible for the successful, error-free perfect use of these machines. I don’t think we are. I think we’re responsible for stretching them, setting them off in new directions, and keeping fun in the house. I hope the field of computer science never loses its sense of fun. Above all, I hope we don’t become missionaries. Don’t feel as if you’re Bible salesmen. The world has too many of those already. What you know about computing other people will learn. Don’t feel as if the key to successful computing is only in your hands. What’s in your hands, I think and hope, is intelligence: the ability to see the machine as more than when you were first led up to it, that you can make it more.”

Thanks to Benjamin Russell for pointing these out on the Haskell-cafe list.

[1] Hudak, Paul, Hughes, John, Peyton Jones, Simon, and Wadler, Philip. “A History of Haskell: Being Lazy With Class.” San Diego, California: The Third ACM SIGPLAN History of Programming Languages Conference (HOPL-III) (2007): 12-1 - 12-55, 2007. http://research.microsoft.com/en-us/um/people/simonpj/papers/history-of-haskell/history.pdf

[2] Abelson, Harold and Sussman, Gerald Jay with Sussman, Julie. Structure and Interpretation of Computer Programs, Second Edition. Cambridge, MA: The MIT Press and New York: McGraw-Hill, 1996. http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-3.html

Sorting in linear time

Wednesday, April 15th, 2009

If you assume fixed word length, integers can be sorted in linear time using radix sort. This paper gives a stunningly elegant generalization of this to essentially arbitrary data structures, including lists, sets, bags, etc.:

Henglein, F. (2008). “Generic Discrimination: Sorting and Partitioning Unshared Data in Linear Time”.

Wow. The most interesting aspect of this to me is that it provides some extra hope about the practicality of functional containers. Right now everyone is using hash tables, which are fast but extremely lacking in functionality (e.g., they’re not sorted). In many languages they’re also nondeterministic across multiple program runs for security reasons, which is horrible for debugging. Maybe some future abstraction along the lines of this paper (generalized tries?) will make ordering-based containers competitive with hash tables.

Unfortunately, Henglein notes that efficient implementations of these techniques are heavily imperative, which is fine for sorting but bad for use in flexible containers. I’ll keep hoping.

Note: Containers and tries are mentioned as future work in the paper. There are lots of challenges in the way, but Henglein seems optimistic. I expect that some fairly serious compiler support would be required to make an implementation of these ideas both clean and fast.

Lazy vs. strict vs. lenient?

Sunday, March 1st, 2009

Here’s an interesting short paper by Wadler:

To paraphrase, the simplest model of lazy evaluation is given by Church’s original λ calculus, and the simplest model of strict evaluation is given by Plotkin’s λ v calculus. λ is problematic because it gives a poor notion of the cost of programs, and λ v is problematic because it is not complete. These flaws can be removed at the cost of slightly more complicated models, and the resulting models turn out to be extremely similar. Specifically, the only difference between them is the following law, let x = M in N -> N which holds with laziness and fails for strictness. In other words, the different between strict and lazy languages is exactly what one would expect: in a lazy model unused terms can be ignored, and in a strict model they must be evaluated to make sure they don’t loop forever.

This is a fairly obvious result: all it says is that lazy languages are the same as strict except that it’s okay to have an infinite loop if the result is unused. It’s still an interesting point to emphasize though, since it highlights the importance of trying to come up with alternate evaluate schemes that combine the advantages of lazy and strict (e.g., Tim Sweeney’s discussion of lenient evaluation).

Strict aliasing and std::vector

Tuesday, February 10th, 2009

Something just occurred to me about the gcc implementation of std::vector in C++. Internally, an instance of vector needs to store (1) a pointer to the start of the memory, (2) the current vector size, and (3) the current buffer size. However, (2) and (3) can be represented either as integers relative to the start pointer or as absolute pointers past the end of the active and buffer spaces. Specifically, the two representations are

    template <class T>
    struct relative {
        T* start;
        size_t size, buffer_size;
    };

and

    template <class T>
    struct absolute {
        T* start;
        T* end; // = start + size
        T* buffer_end; // = start + buffer_size
    };

The relative integer representation seemed more natural, so it’s always struck me as odd that gnu stdc++’s vector class used the absolute representation.

What just occurred to me is that the absolute representation is better from a strict aliasing standpoint, since arrays of pointers are much less common than arrays of integers. This potentially gives the compiler more knowledge with which to optimize. For example, consider the following implementation of copy:

    void copy(vector<size_t>& dest, const vector<size_t>& src)
    {
        for(size_t i=0;i<src.size();i++)
            dest[i] = src[i];
    }

If the compiler is allowed to assume strict aliasing and vector<size_t> uses the absolute representation, it knows that the write to dest can’t change the value of src.size(), which allows the loop bound to be computed at the start of the loop. However, if the relative representation is used, nothing tells the compiler that dest.start != &src.size. If a write to dest[i] might change the value of size, the compiler has to repeat the memory lookup during every iteration of the loop, which prevents all interesting loop optimizations.

P.S. Someone pointed out that I forgot to define “strict aliasing”. Strict aliasing means that it is invalid in C++ to access the same memory through pointers of different types, and in particular that the compiler is allowed to make this assumption when optimizing. See the Wikipedia page for more details.

User Error

Sunday, February 8th, 2009

Gah. I’m trying to write some straightforward combinatorial code in Haskell, and I finally got to the point where I had to start using nontrivial algorithms to make it acceptably fast. When I run those algorithms the first time, I get the following gem:

    *Main> drops 100 2
    *** Exception: divide by zero

The “drops” function is now 15 lines of mutually recursive Haskell, and the interactive interpreter conveniently informs me that there was a division by zero. No hint as to where it might be, of course. Come to think of it, this was probably the reason I dropped back to Python after my last Haskell stint.

This came up all the time in OCaml compiler work too. Deep in the middle of an optimization pass, I’d mess up a symbol context. The compiler would conveniently inform me that

    NotFound

OCaml exceptions are monomorphic, so the default exception thrown by map classes couldn’t even include the key, much less something like a stack trace.

The conclusion: any language that claims to be high level should be able to give the exact context in which errors occurred. This means either some form of trace reporting and/or the ability to rule out the possibility of certain errors (dependent types). Yes, both Haskell and OCaml have debuggers for this purpose, but that isn’t good enough; in my case the history function in the Haskell prints out gobs and gobs of tail recursion frames before it ever gets to the real source of the problem. Debugging in high level languages should be easy, not just possible.

Update: the problem turns out to be integer overflow, which means I’m hitting the same language flaw Guido van Rossum described here. Converting my code from Int to Integer requires adding a slew of conversion calls and type annotations because I’m using integers as both container indices and as the results of large combinatorial values. Hopefully Tim Sweeney’s idea will work out and this dichotomy can go away.

Update (16mar2009): Possible hope for the future:

http://ghcmutterings.wordpress.com/2008/12/04/explicit-stack-traces.