Even Fewer Books

March 23rd, 2010

Another move (now to New Zealand), another round of book disposal. I’ve been transitioning to ebooks anyways, so my physical collection will keep diminishing over time. Now I just need to lug them all to The Strand:

Update: A few more:

Insurance choice can be bad

August 30th, 2009

This is a followup to the previous post about health insurance elaborating on the fact that it can be bad to let individuals make choices about their insurance policy. I stated without much detail that “assuming sufficient options and perfect competition, the result of this individual choice would be exactly the same as if the insurers were allowed to use knowledge of K.” The “sufficient options” assumption is important (and not necessarily realistic), so more explanation is warranted.

Imagine there’s a genetic test that predicts the occurrence of a particular disease with overwhelming probability. Let’s call this disease H (for Huntington’s disease or maybe HIV/AIDS). Further imagine that the disease is treatable, but the treatment is expensive (not true for Huntington’s yet, unfortunately). Say the price of treatment is c.

If insurers are allowed to administer the genetic test and adjust policy prices accordingly, prices will converge towards being c greater for those with H. If you have H, you’ll pay the entire cost yourself. This situation is clearly bad, so we’ll ban insurers from knowing about the genetic test.

However, individuals still know about the genetic test, and are allowed to make decisions accordingly. Let’s say an insurer provides two insurance policies, identical except that one pays for the treatment for H and one does not. Anyone who knows that they’re H-negative will buy the policy that doesn’t treat H, and anyone who has H will buy the other. If the insurer is allowed to charge different amounts for the two policies, they will adjust the prices to match the different expectations of cost. This different is c.

Can we ban insurers from having one policy that covers H and one that doesn’t? Possibly, but it’s hard. First, we have to choose all or none; if one insurer covers H and a different one doesn’t, the non-H people will flock to the second insurer, and the same thing happens. Second, the connection between H and the treatment for H may be far from obvious, and certainly can’t be expected to be known at the time we pass any particular law. For example, Down’s syndrome increases the likelihood of recurrent ear infections, so allowing policies to not cover recurrent ear infections would penalize anyone with Down’s syndrome. This might be harmless by itself, but a thousand similar options could add up quickly.

There are probably examples of insurance policy choices which wouldn’t be problematic, but figuring out which these are is an extreme subtle proposition. Moreover, this issue will become rapidly more important as our knowledge about genetic risk factors and relations between different diseases expands. I have no confidence that these nuances can be encoded in any kind of government regulation.

Does this mean we can only have one insurance policy for everyone if we want to be fair? Unfortunately to a first approximation, it seems like the answer is yes. I’d love to hear details if anyone knows of a type of policy choice which doesn’t suffer from this problem, though.

Free market insurance is incompatible with knowledge

August 30th, 2009

Yes, it’s an extreme title, but it’s true. The idea of insurance is to average risk over a large group of people. If advance information exists about the outcomes of individuals, it’s impossible for a fully competitive free market to provide insurance.

In particular, free markets cannot provide health insurance.

To see this, consider a function u:SR which assigns a utility value to each point of a state space S. For example, one of the elements of S could be “you will have cancer in 23 years”. This outcome is bad, so the corresponding u(s) would be a large, negative number.

We also have a probability distribution p:SR over S. Without insurance, the expected value of u is E(u)= sSp(s)u(s). With insurance, we can average over a large number of people to change the utility function to be closer to the average. For simplicity, we’ll consider only the case of perfect insurance, where the new utility function is exactly the average. In the perfect insurance model, we pay an insurance company E(u)+o, and in return they agree to pay us u(s) depending on the particular outcome s. o is an extra amount to cover administrative costs, risks due to lack of independence and finite numbers of customers, and profit (in the case of imperfect competition).

Assuming no one has any prior knowledge of the state s, the only way for different insurers to compete in the perfect insurance model is to reduce overhead. Everyone looks the same, so there’s no advantage in charging different amounts to different people. The insurers profit from anyone with u(s)>E(s) and lose money from anyone with u(s)<E(s), but there’s nothing they can do about it if they can’t tell the difference in advance.

Now assume there’s some prior knowledge about the state, say S=K×U where K is known in advance and U is unknown. In the absence of regulation, it becomes possible for an insurance company to charge different amounts based on the different kK. In particular, it’s possible for an insurer to sell policies only to people with a favorable value of k, and charge E(uk)>E(u). In a free market, anyone with a favorable value of k will flock to these cheaper policies. Insurers offering policies to those with unfavorable values of k will have to raise rates in order to stay in business, since they will have lost the customers from which they make money. Assuming a sufficient level of competition, the price of all insurance policies will converge on E(uk)+o(k).

The result is that we’re now insuring only over the uncertainty contained in U, not K. In the worst case, if K=S, E(uk)=u(s) and insurance vanishes completely.

Whether this is good or bad policy-wise depends on what K and U look like. For car insurance, K includes whether the driver was considered at fault in accidents in the past, whether they’ve driven drunk, whether they drive a muscle car or a Honda Civic, etc. Charging different amounts depending on these factors seems fair, since intuitively these factors can be considered the “fault” of the individual. Similarly, charging more for home owners insurance if you live in the path of a hurricane is also (arguably) reasonable.

In the case of car insurance, even with these known factors out of a way, the space of uncertainty U is still quite large. It includes the actions of other drivers, random equipment failure, invisible road conditions, etc. It is impossible for insurers to predict these factors, which means that private, free market insurance can efficiently insure against them.

For health insurance,the space of known factors includes all past medical history and preexisting conditions, public genetic information including gender and race, healthy or unhealthy lifestyle, etc. In many cases, it includes information about the current medical problem, since insurers have significant control over what kind of treatment people can receive once they are diagnosed. Now, we can argue about whether it’s fair to blame people for unhealthy lifestyles, but I highly doubt anyone will argue that black men should be held responsible for their higher rates of prostate cancer.

If we accept that the space of known factors K is too large, the only way to reduce it is to apply some type of regulation to reduce the effective size of K. A fair amount of subtlety is required to make such regulation effective. For example, let’s say we ban insurers from discriminating based on race, but still allow them to collect information about healthy lifestyle. It’s healthy to play sports, so the insurer might ask whether the person plays basketball. People who play basketball are more likely to be black than those who don’t (caveat: I’m just guessing here), and therefore it’s quite possible that they have higher risks of prostate cancer. Unless the government is smarter than the insurers (impossible, since the insurers have access to the text of laws), the only reliable way to solve this is to ban knowledge of K entirely.

However, banning insurers from using knowledge of K is dangerous unless you also ban customers from using knowledge of K. In an extreme case, it would be very bad to allow people to buy insurance policies in response to accidents of unexpected diagnoses. Everyone would wait until they needed medical coverage to buy insurance, and all insurers would rapidly go out of business.

In general, if individuals are allowed to use any information prohibited to insurers, and the space of available policies is large enough, sufficiently diligent individuals with favorable k values can use this information to lower their insurance premiums without raising their risk. Insurers will have to raise their premiums in response, which results in an increase in cost for those with unfavorable k values.

In fact, assuming sufficient options and perfect competition, the result of this individual choice would be exactly the same as if the insurers were allowed to use knowledge of K! Wow. I didn’t fully understand that point before writing this post.

The conclusion is that if we believe true health insurance is a good thing, and that health insurance means insuring over factors which can be known in advantage, free markets don’t work either for insurers or for individuals. We can’t allow insurers to base prices on prior knowledge, and we can’t even allow individuals to choose which policy they buy based on their knowledge of their own medical history.

Hmm. The individual side of this is somewhat unfortunate, but I don’t see any way around this argument.

Followup: Here are more details about the individual side.

Duck talk

August 21st, 2009

I gave a short presentation on the ideas behind duck at DESRES today. Here are the slides. Caveat: I made these slides in the two hours before the presentation.

The Verbosity Difference

August 18th, 2009

I like conciseness. Syntactic sugar increases the amount of code I can fit on a single screen, which increases the amount of code I can read without scrolling. Eye saccades are a hell of a lot faster than keyboard scrolling, so not having to scroll is a good thing.

However, I recently realized that simple absolute size is actually the wrong metric with which to judge language verbosity, or at least not the most important one.

Consider the evolution of a chunk of C++ code. We start with a single idea, and encode it as a single class to encapsulate the structure. We add a class declaration, some constructors and a destructor, perhaps even a private operator= to disallow copying. Fine. After this boilerplate, we add various methods to the class to encode the actual behavior. The class also develops a few fields, because fields let us easily share data between the related methods.

Next we have another idea. Conceptually the new idea is distinct from the original one, so we should really make a new class. However, we’ve just gone through all the work of setting up a C++ class, with it’s constructors, destructor, private operator=, access specifiers, etc., and it’d be a shame to have to redo all that effort. Maybe it won’t be so bad if we just add the new idea into the same class…

Boom. Now we have two ideas merged into the same class. You can’t pass around one idea without passing around the other. You can’t rewrite one without analyzing dependency chains to make sure the class fields doesn’t overlap between concepts. After a while, we start to forget that the ideas were ever really distinct. That’s right: the language has actually made us stupider.

You can’t blame the programmer here. We were only maximizing our local utility. We might be smart, but we’re not omniscient, and we can’t always be bothered to follow style manuals. The problem also can’t be ascribed to the overall verbosity of C++; it’s quite possible that the code would be larger if it was written in C, since C++ class syntax, fields, etc. really can make for smaller (source) code.

The problem is that the marginal cost of adding a new class is greater than the marginal cost of extending an existing class. If it was easier to make a new class, we would have done so. But we would also have made a new class if it was harder to add methods to an existing class, because then the trade-off would have been different. In other words, what matters is the difference in verbosity between the “right way” and the “wrong way”, not the absolute level of verbosity.

Therefore, the conclusion is that any new abstraction with a large startup cost but a low marginal cost is bad, because people end up merging them in disgusting ways. Examples include interfaces (adding one more method is easier than splitting one interface into two), Haskell type classes (see fail), and monads (once you’ve converted your code into monadic form, making the monad do something else is easy).

Similarly, any abstraction which merges two benefits into one language construct is also bad, even if the extra benefits are free. The best example of this is inheritance, which merges the benefits of code reuse and subtyping. If I’m making a new class, and it would be really convenient to be able to call one of them methods in an old class, I may end up inheriting from that class in order to save typing even if subtyping makes no sense. By contrast, if I’d been writing the same code in C, that function I really wanted to call would probably just be a function, and I’d just call it. Object oriented programming makes you stupider.

Happily, it’s easy to notice when you’re running into one of these language flaws. Most of us have a good sense for what the right way of doing things is. If we set out to write a new piece of code, the right way will generally be the first thing that comes to mind, but then we’ll remember that doing it the right way is hard. We’ve probably trained ourselves not to notice this conflict after years of painful compromise, so all we have to do is untrain ourselves.

Marshmallows and Achievement Gaps

August 16th, 2009

Here are links related to a few interesting studies that came up in a discussion with Ross. I figured I’d post them here so I have somewhere to point other people:

Marshmallows and Delayed Gratification

Walter Mischel did a study where he put children in a room, gave them single marshmallow, and told them that if they held off from eating the marshmallow for a while they would get two marshmallows later. He then left the room and watched via hidden camera to see how long they would hold out. Several years later, he happened to do a follow-up study on the same kids, and discovered that the time they held out was strongly correlated to their grades, whether they went to college, SAT scores, etc. Here are some links:

Racial and Gender Achievement Gaps

Claude Steele and Joshua Aronson did a study where they gave the GRE exam to African Americans and European American students. The two groups performed at roughly the same level. However, if they told the students they were taking an intelligence test, the black students performed significantly worse. There are a lot of variant of this experiment for different kinds of tests or gaps (physical activities, gender, etc.) with similar results. I.e., you can dramatically change test scores by saying or not saying a single particular sentence to the test takers before the test.

I agree with Dan Ariely that these studies can and should be interpreted extremely optimistically. If the driving factors behind success or failure are this simple or fragile, we should be able to find easy ways to make huge improvements.

The Anonymous, Recursive Suggestion Box

August 16th, 2009

Good discussion with Ross today, resulting in one nice, concrete idea.

Consider the problem of suggesting policy improvements to the government. In particular, let’s imagine someone has a specific, detailed policy change related to health care, financial regulation, etc. Presumably, the people who know the most about these industries are (or were) in the industries themselves, so you could argue that they can’t be trusted to propose ideas that aren’t just self-serving. Maybe it’s possible for someone to build a reputation of trustworthiness, but that’s hard and would ideally be unrelated to the actual ideas proposed. Instead of relying on reputation, we’ll remove the issue entirely by making the suggestion box anonymous.

Now we have an anonymous suggestion box on a website. People go to it and propose ideas. There are a few good ideas, and a vast amount of bad, malicious, and nonsense ideas (including spam). Eliminating the spam is easy (I have a single, completely public email address and get roughly one spam message per day, from which I conclude that the spam issue is solved). In order to eliminate the bad or malicious ideas, we need to be able to judge their correctness in a logical manner. For this, we rely on distributed intelligence: other people are allowed to judge whether each idea is good or bad. To get loaded words out of the picture, let’s replace “good” and “bad” with the words “true” and “false”. “Ideas” become propositions of the form “Implementing this idea would be good” (yes, “good” is still there, but keep reading).

Let’s assuming voting isn’t a completely reliable system for determine the truth or falsity of ideas (otherwise, we’re done). Therefore, some true propositions will get a lot of false votes, and vice versa. To solve this, we allow people to propose arguments for or against each proposition. These have the form of some statement, like “That proposition is false because the author is a moron”, together with a more details argument for why the statement is true. Now we let people vote on two more things:

  1. Whether the truth of the statement would imply that the original proposition is true or false.
  2. Whether the argument for the truth of the statement itself is sound.

If we get enough votes in favor of both (1) and (2), we conclude that the original idea is true (or false), and discount the votes for or against the original idea.

This is the key part, so I’ll restate it. If we have propositions A, B, and BA, then enough votes for both B and BA override any votes against A. You can’t kill a good idea unless you can the arguments for it as well.

Now we have to get recursive. What if B gets a lot of votes, but is actually wrong? Then you let people propose arguments for the falsity of B, and so on. What if there are two competing arguments which appear to contradict? Then you let people propose arguments about why there isn’t a contradiction? There are a lot of logical issues to deal with, but people can post arbitrary arguments written in normal human languages and we have the full power of human intelligence to judge them, so we’re not limited by artificial logical restrictions. This isn’t a formal proof system.

Unfortunately, we are limited by what happens along the full recursive tree. If people lie about the propositions all the way down, and manage to flood away all the counter arguments, the system will fail. However, this is basically a problem of spam, and can be solved in the usual way. If you detect that someone is consistently voting opposite the correct answer, you flag them as malicious and discount their votes. This rule is circular, but that’s what probabilistic analysis is for: we take all the data and compute the most likely assignment of truth values to propositions and spam flags to people. There’s some threshold of validity that you need to achieve in order to such a solver to converge to the correct answer, but that level of trust is often quite low due to network effects and self-reinforcement. In other words, contradictions don’t fit together.

Since this is a website, we have to identify whether the “users” are actually people. We could do this conventionally with a system like ReCAPTCHA, but since we’re in recursive mode it’s much cooler to instead ask users to judge the correctness of randomly selected propositions. If you want to vote on whether a proposition is true or false, or propose a new proposition, you need to spent a little time judging the ideas of others. If someone comes up with a way to trick this system by writing a program that can judge the truth or falsity or arbitrary English propositions, this discussion may be obsolete (thanks to Ross for this particular bit of reasoning).

Other issues probably abound, but they can be fixed by allowing people to suggest improvements to the system. If deemed reasonable, these ideas can be implemented and tested in parallel with the existing system, resulting in a potentially large number of competing systems for determining truth values from the same data set. The data set itself could probably be made freely available (under a suitable license), so that others could build competing systems.

I don’t think this system would be all that difficult to implement. Thanks to the previous paragraph, if it reached a sufficient level of quality it would start to improve itself. Maybe that would even get scary.

Of course, if we apply this to a realm like politics, the truth or falsity of various statements will be very controversial, and different people will have legitimately different opinions. This can be solved by adding side conditions to the statements, like “If you believe in flat tax systems, we should do this” or “If you believe that health care is a basic human right, we should do that.” More importantly, however, there is a vast range of ideas that any rational person should agree to. Statements like “the proposed health care bills do not include death panels”, and “given an otherwise equivalent choice between taxing a public good and a public evil, we should tax the latter.” I think it’s fair to say that the U.S. would be better off if we could agree on the statements that don’t need side conditions.

Note: I’ve done zero checking to see if this has been proposed or implemented before (this discussion happen just now), so I’m curious if anyone knows related references or links.

Another note: presumably this would be set up as a nonprofit supported by donations of some kind. If this system actually existed, I would probably be willing to donate at least $1000.

Haskell vs. C

June 30th, 2009

Here’s a summary of the differences between typed functional languages and unsafe languages:

  • Difficulty of easy things: Haskell ~ C
  • Difficulty of hard things: Haskell < C
  • Difficulty of impossible things: Haskell >> C

Kudos to anyone who knows what this means.

Externalities and tariffs

June 29th, 2009

Krugman correcting a flaw in an Obama speech about the energy bill. It’s very unfortunate that the president didn’t get this right, since externalities are the whole point behind cap and trade legislation. If he isn’t able to articulate this point consistently and correctly, he won’t (and shouldn’t) be able to convince anyone. Moreover, any bill that comes out of this process that isn’t based on people understanding (and admitting to understanding) externalities will likely be hopelessly flawed.

Consciousness vs. efficiency

June 25th, 2009

Imagine a computer stored in a box with a single small hole connecting it to the outside world. We are able to run programs inside the box and receive the results through the hole. In fact, in a sense results are all we can see; if the program makes efficient use of the hardware inside, the size of the hole will prevent us from knowing exactly what went on inside the box (unless we simulate the workings of the box somewhere else, but then the box is useless).

The human brain is a good example. To an outside observer, the hole is speech; other people can’t know what you’re thinking any faster than you can say it. However, speech is also the hole to ourselves. As I write these words, I am only fully aware of them as they appear on the screen in front of me. Until then, I do not consciously know what they are. I can choose to become conscious of them if I say the words to myself internally, but I must slow down in order to do this. The reason is that I am capable of being fully conscious of only one thing at a time, and it is more efficient to be conscious of the words visually on the screen rather than as I “think” of them.

Thus, we have a trade-off between consciousness and efficiency. In order to be fully aware of our thoughts, we must slow them down until they fit through the hole of consciousness. Conscious thought is necessary in order to correct mistakes in our thinking, remember our conclusions, and communicate with others. However, since our brains are internally structured as a massive parallel computer, the only way to use our brains efficiently is to not be aware of what we’re thinking.

This trade-off applies to thoughts of all kinds, and most areas of human endeavor require carefully switching between the different modes. For example, working out a mathematical proof is often said to require intuitive “leaps” of thought. The reasons these appear as leaps is not because they are actually sudden; our brain does not sit around doing nothing and then suddenly have the idea. Rather, our unconscious mind is considering various different avenues of thought in parallel. If an avenue appears fruitful, the unconscious part will make it available to the conscious part, and it seems to “pop into our minds”. Similarly, it is very difficult to consciously try to remember a particular fact. If you let your mind wander, the unconscious part is better able to search around for what we need in a massively parallel fashion, providing us with the answer asynchronously.

Movement is the same. When you dodge a ball thrown at you, your unconscious sees the ball and moves out of the way before there is time to articulate what is happening. Soon afterwards, your brain retroactively explains what happened: “If anyone asks, say we saw the ball coming towards us and dodged.” One of my hobbies is swiveling to fit through doors as they close without touching them. The last time I did this, I had a vague memory of my vision dimming during this motion, almost as if I was passing out. I think this effect was my consciousness temporarily shutting down to avoid interfering with the reflexive motion.

The same trade-off applies to computers and algorithms. Consider the problem of checking a C++ program for type errors. Current compilers do this by running full template instantiation, which generates code for each instance of the template. In other words, the compiler is “conscious” of the entire process. If we all want is the type errors, it would be much faster to use a specially tailored algorithm that checked for errors only, remembering only enough of the results to speech up the rest of the error checking process. It would be even faster if want to know only whether errors exist, not what they are, since the program could forget line numbers and other details. The cool part is that it is possible to combine the different approaches to get the best of all words: we can running error checking before code generation to reduce the latency of reporting errors back to the user, and we can speed up error checking by first running the stripped down yes/no algorithm and reprocessing any portion that has an error. In a decent programming language, all three variants can be generated from the same source using partial specialization.

This last point is very important, since it means that the trade-off between consciousness and efficiency can often be eliminated; we can start with “fully conscious” code (which remembers everything it does) and apply various “forgetfulness” transformations to shift towards the efficiency side. The different versions can be seamlessly interleaved so that it looks from the outside as if the fully conscious version is operating at the speed of the fastest version; the missing detail is recovered only when we need it. This is similar to human vision; any object we look at appears sharp, so we generally imagine that we see with uniformly high detail. What is actually happening is that almost all of our visual field is blurry, with one high resolution point in the center. We can shift this high resolution point to anywhere we wish, so we get the illusion of uniform sharpness for free.

Taking full advantage of this trade-off to speed up programs will require languages that combine low level and high level features and make it easy for the program to inspect and transform its own code. I’ll write more about this later.