Archive for February, 2009

Strict aliasing and std::vector

Tuesday, February 10th, 2009

Something just occurred to me about the gcc implementation of std::vector in C++. Internally, an instance of vector needs to store (1) a pointer to the start of the memory, (2) the current vector size, and (3) the current buffer size. However, (2) and (3) can be represented either as integers relative to the start pointer or as absolute pointers past the end of the active and buffer spaces. Specifically, the two representations are

    template <class T>
    struct relative {
        T* start;
        size_t size, buffer_size;
    };

and

    template <class T>
    struct absolute {
        T* start;
        T* end; // = start + size
        T* buffer_end; // = start + buffer_size
    };

The relative integer representation seemed more natural, so it’s always struck me as odd that gnu stdc++’s vector class used the absolute representation.

What just occurred to me is that the absolute representation is better from a strict aliasing standpoint, since arrays of pointers are much less common than arrays of integers. This potentially gives the compiler more knowledge with which to optimize. For example, consider the following implementation of copy:

    void copy(vector<size_t>& dest, const vector<size_t>& src)
    {
        for(size_t i=0;i<src.size();i++)
            dest[i] = src[i];
    }

If the compiler is allowed to assume strict aliasing and vector<size_t> uses the absolute representation, it knows that the write to dest can’t change the value of src.size(), which allows the loop bound to be computed at the start of the loop. However, if the relative representation is used, nothing tells the compiler that dest.start != &src.size. If a write to dest[i] might change the value of size, the compiler has to repeat the memory lookup during every iteration of the loop, which prevents all interesting loop optimizations.

P.S. Someone pointed out that I forgot to define “strict aliasing”. Strict aliasing means that it is invalid in C++ to access the same memory through pointers of different types, and in particular that the compiler is allowed to make this assumption when optimizing. See the Wikipedia page for more details.

User Error

Sunday, February 8th, 2009

Gah. I’m trying to write some straightforward combinatorial code in Haskell, and I finally got to the point where I had to start using nontrivial algorithms to make it acceptably fast. When I run those algorithms the first time, I get the following gem:

    *Main> drops 100 2
    *** Exception: divide by zero

The “drops” function is now 15 lines of mutually recursive Haskell, and the interactive interpreter conveniently informs me that there was a division by zero. No hint as to where it might be, of course. Come to think of it, this was probably the reason I dropped back to Python after my last Haskell stint.

This came up all the time in OCaml compiler work too. Deep in the middle of an optimization pass, I’d mess up a symbol context. The compiler would conveniently inform me that

    NotFound

OCaml exceptions are monomorphic, so the default exception thrown by map classes couldn’t even include the key, much less something like a stack trace.

The conclusion: any language that claims to be high level should be able to give the exact context in which errors occurred. This means either some form of trace reporting and/or the ability to rule out the possibility of certain errors (dependent types). Yes, both Haskell and OCaml have debuggers for this purpose, but that isn’t good enough; in my case the history function in the Haskell prints out gobs and gobs of tail recursion frames before it ever gets to the real source of the problem. Debugging in high level languages should be easy, not just possible.

Update: the problem turns out to be integer overflow, which means I’m hitting the same language flaw Guido van Rossum described here. Converting my code from Int to Integer requires adding a slew of conversion calls and type annotations because I’m using integers as both container indices and as the results of large combinatorial values. Hopefully Tim Sweeney’s idea will work out and this dichotomy can go away.

Update (16mar2009): Possible hope for the future:

http://ghcmutterings.wordpress.com/2008/12/04/explicit-stack-traces.

Externalities

Saturday, February 7th, 2009

I got The Age of Turbulence for Christmas. Very interesting so far, especially to read with hindsight given recent history. This passage is fascinating:

“…According to objectivist precepts, taxation was immoral because it allowed for government appropriation of private property by force. Yet if taxation was wrong, how could you reliably finance the essential functions of government, including the protection of individuals’ rights through police power? The Randian answer, that those who rationally saw the need for government would contribute voluntarily, was inadequate. People have free will; suppose they refused?

I still found the broader philosophy of unfettered market competition compelling, as I do to this day, but I reluctantly began to realize that if there were qualifications to my intellectual edifice, I couldn’t argue that others should readily accept it.”

Greenspan does not come off as an ideologue throughout most of the book; he seems for the most part rational, thoughtful, and balanced. This makes it all the more fascinating and disturbing to me that, confronted with a flaw in his model of the world, his response would be to accept that other people might not be able to accept it!

The issue here is very simple. In the case of a public good such as individuals’ rights (or a public negative such as pollution), rational individual self-interest implies that one should attempt to shift the cost of the public good onto someone else. Markets are brilliant (most of the time) at optimizing functionals, but here the functional is incorrect. If a theoretical free market system optimizes one functional, it is entirely irrational to believe that it will also magically optimize a very difficult functional at the same time.

Moreover, this flaw isn’t even hard to fix! If one believes that free markets are the best system ignoring externalities, the solution is to make the minimal regulatory change required to replace the externalities with individual self-interest. E.g., criminal penalties for property rights violations, carbon taxes for global warming, etc. To my eyes, at least, the market system preserves all of its theoretical and moral elegance even with this correction, and has the benefit of not having a brutally obvious flaw.

Imagine a physicist who believes in the Newtonian theory of relativity, but discovers that it is inconsistent with the theory of electromagnetics. The analogous response to Greenspan here would be for the physicist to accept that the contradiction meant that other physicists would disagree, but to continue to believe in both theories personally! What actually happened, of course, is that physicists came up with a better theory, one that turned out to be even more elegant and beautiful than the first one. The correction required was quite small; the Newtonian theory is still almost right and is still useful in calculations.

One difference between economics and physics is that economics is often filtered through politics, and having a “pure” theory can be advantageous from a marketing perspective. However, Greenspan does not appear driven by marketing concerns; he frequently admits to trying to convey a nuanced view even when afraid that it will be misinterpreted. Therefore, I am not sure why he resists it.