Archive for March, 2009

Coin Perfection

Saturday, March 28th, 2009

Say you want to flip a perfect coin, but the only coins available are imperfect. First, let’s consider the case where we don’t know the details of the available coins, and would prefer not to measure them.

The simplest strategy is to flip a bunch of coins and determine whether the total number of heads is even or odd. As long as there’s a reasonable number of both heads and tails, the probability of an even number of heads will be close to one half.

Here is a measure of “coin perfection” which behaves additively with respect to the parity algorithm:

$G\left(p\right)=-\mathrm{log}|2p-1|$

I.e., if you have two coins with probabilities of heads $p$ and $q$, and $r$ is the probability that flipping both coins gives the same result, then

$G\left(r\right)=G\left(p\right)+G\left(r\right)$

since $\begin{array}{rl}{e}^{G\left(r\right)}& =|2r-1|\\ & =|2\left(pq+\left(1-p\right)\left(1-q\right)\right)-1|\\ & =|2\mathrm{pq}+2-2p-2q+2\mathrm{pq}-1|\\ & =|4\mathrm{pq}-2p-2q+1|\\ & =|\left(2p-1\right)\left(2q-1\right)|\\ & ={e}^{G\left(p\right)}{e}^{G\left(q\right)}\end{array}$

In particular, $G\left(0\right)=G\left(1\right)=0$ and $G\left(1/2\right)=\infty$ as one would expect; flipping a deterministic coin gets you no closer to perfection, and flipping a single perfect coin makes the parity perfectly random regardless of the other coins.

If we know the exact probability of heads $p$, we can do better than this and produce a perfect coin in a finite expected number of flips. By information theory the best we can hope for is $1/H\left(p\right)$ where

$H\left(p\right)=-p\mathrm{lg}p-\left(1-p\right)\mathrm{lg}\left(1-p\right)$

is the entropy of the coin. Unfortunately determining the exact minimum appears rather complicated, so I’m not going to try to finish the analysis right now.

Wow

Thursday, March 26th, 2009

The energy and momentum of the ground

Tuesday, March 17th, 2009

Imagine that you throw a ball in the air, and lands on the ground and stops (i.e., the collision is inelastic). What happened to the energy and momentum? The answer for energy is completely different from that for momentum, which is a trivial but interesting illustration of the differences between them. Also we get fun with infinities.

Assume a ball of mass $m$ and velocity $v$ hits the ground and stops. Before it hits, it has energy $\frac{1}{2}m{v}^{2}$ and momentum $mv$. After it hits, the energy and momentum of the ball are both zero, so by conservation they must have gone elsewhere.

Momentum is easy: the momentum went into the ground. This is possible without the ground having noticeably moved, since the ground’s mass is some huge value $M\approx \infty$. The resulting velocity of the ground is the minute $mv/M\approx 0$, but this zero is canceled out by the huge mass to get the finite momentum $Mmv/M=mv$. Thus, it is perfectly consistent to model the ground as a rigid body with infinite mass, zero velocity, and nonzero momentum $p$. Only 3 new variables are required to account for this ground momentum and make our system closed with respect to conservation of momentum.

Energy is different: it is impossible to transfer energy to a large, rigid object. The kinetic energy of the ground after the impact is $\frac{1}{2}M{\left(\frac{mv}{M}\right)}^{2}=\frac{{m}^{2}{v}^{2}}{2M}\approx 0$ In other words, $M=\infty$ and $V=0$ give $\frac{1}{2}M{V}^{2}=\infty 00=0$ since the zeros outnumber the ones. It is impossible to store energy in the ground by moving it as a whole.

What happens instead is that the energy goes into smaller scale phenomena, typically sound waves or heat. Both cases involve small amounts of matter moving extremely rapidly, which we can roughly characterize as mass $0$, velocity $\sqrt{\infty }$. The key is that we can store as much energy as we like in these tiny phenomena without ever making a dent in momentum, since $0\sqrt{\infty }=0$.

We can summarize this situation as follows:

1. It is possible to hide momentum in a large, motionless object without expending any energy.
2. It is possible to hide energy in a bunch of small objects without using any momentum.

The fact that energy hides in small places makes it harder to deal with in general; there a lot of small places, which is why the law of conservation of energy has been modified many times to account for new terms no one had previously noticed. Momentum is simpler: if you lose some momentum, all you have to do is look around for the gigantic object (e.g., the Earth).

Multigrid is the future

Wednesday, March 11th, 2009

Currently, we (or at least I) don’t know how to do multigrid on general problems, so we’re stuck using conjugate gradient. The problem with conjugate gradient is that it is fundamentally about linear systems: given $\mathrm{Ax}=b$, construct the Krylov subspace $b,\mathrm{Ax},{A}^{2}x,\dots$ and pick out the best available linear combination. It’s all in terms of linear spaces.

Interesting human scale physics is mostly not about linear spaces: it’s about half-linear subspaces, or linear spaces with inequality constraints. As we start cramming more complicated collision handling into algorithms, these inequalities play a larger and larger role in the behavior, and linearizing everything into a CG solve hurts.

Enter multigrid. Yes, it’s theoretically faster, but the more important thing is that the intuition for why multigrid works is less dependent on linearity: start with a fine grid, smooth it a bit, and then coarsen to handle the large scale cheaply. Why shouldn’t this work on a system that’s only half linear? There are probably a thousand reasons, but happily I haven’t read enough multigrid theory yet to know what they are.

So, I predict that eventually someone will find a reasonably flexible approach to algebraic multigrid that generalizes to LCPs, and we’ll be able to advance beyond the tyranny of linearization.

Wednesday, March 4th, 2009

A few days ago I was in another discussion where someone raised the question of why presidential elections are so close. In the interests of avoiding these discussions in future, here’s a histogram of the popular vote margin over all presidential elections (data from [1]):

Conclusion: presidential elections aren’t very close, and we should stop looking for explanations of why they are.

Plates

Sunday, March 1st, 2009

Here’s a combinatorial problem I found recently: given a 100 story building and two identical plates, what is the worst case number of drops required to determine the lower floor from which the plates break when dropped (if a plate drops and does not break, it is undamaged). The answer is 14, which I’ll state without proof.

In the interest of completely useless generalization, we can ask what happens if we add more plates. Specifically, what is the cost $C\left(s\right)$ to analyze an $s$ story building if we can buy as many plates as we like? Say each plate costs as much as one drop.

First we need to figure out how many drops it takes for a given number of plates, or (roughly) equivalently the number of floors we can analyze for a given number of plates and drops. If we count the zeroth story as a floor to make the formulas slightly prettier, the answer is that $n$ drops and $k$ plates can analyze a building with $S\left(n,k\right)$ floors, where $S\left(n,k\right)=\left(\genfrac{}{}{0}{}{n}{0}\right)+\left(\genfrac{}{}{0}{}{n}{1}\right)+\cdots +\left(\genfrac{}{}{0}{}{n}{k}\right)=\sum _{p\le k}\left(\genfrac{}{}{0}{}{n}{k}\right)$ (This can be proved by induction on the number of plates.) Thus, $S\left(n,k\right)$ is the number of subsets of [1..n] with at most $k$ elements, which turns out to have no closed form expression (Graham et al.). Therefore, we’ll have to settle for asymptotics. The best estimates I could find are given in Worsch 1994. Since they details vary depending on the relative growth of $n$ and $k$, we first need a rough idea of the growth rate of $n/k$.

For a building of $s$ floors, brute force binary search requires $n=k=\mathrm{lg}s+1$, so $C\left(s\right)<2+2\mathrm{lg}s$ and $n,k\le n+k=C\left(s\right)<2\mathrm{lg}s$. Since at least $\mathrm{lg}s$ tests are required regardless of the number of plates, we have $\mathrm{lg}s\le n<2+2\mathrm{lg}s$. Lacking an obvious lower bound for $k$, I’ll assume for the moment that $k=\Theta \left(\mathrm{lg}s\right)=\Theta \left(n\right)$. Numerical evidence indicates that $n/k$ is in fact between 2 and 3.

In the case where $n/k$ is a constant of at least 2, Worsch 1994 gives $S\left(n,k\right)=\left(C\left(x\right)+O\left(1\right){\right)}^{n}\approx C\left(x{\right)}^{n}$ where $C\left(n/k\right)=C\left(x\right)={x}^{\frac{1}{x}}{\left(\frac{x}{x-1}\right)}^{\frac{x-1}{x}}$ We can now use this approximation to find the optimal ratio $x$ by optimizing $n+k$ subject to $S=s$. For convenience, let $D\left(x\right)=C\prime \left(x\right)/C\left(x\right)$ be the logarithmic derivative of $C$. Applying Lagrange multipliers, we have $\begin{array}{rl}E=& n+k-\lambda \left(C{\left(\frac{n}{k}\right)}^{n}-s\right)\\ \frac{\partial E}{\partial n}=& 1-\lambda \left(n{C}^{n-1}CD\frac{1}{k}+{C}^{n}\mathrm{log}C\right)\\ =& 1-\lambda {C}^{n}\left(Dx+\mathrm{log}C\right)\\ \frac{\partial E}{\partial k}=& 1-\lambda n{C}^{n-1}CD\frac{n}{{k}^{2}}\\ =& 1-\lambda {C}^{n}D{x}^{2}\end{array}$ Equating derivatives to zero and solving gives $\mathrm{log}C+\mathrm{Dx}+{\mathrm{Dx}}^{2}=0$. Filling in the details, $\begin{array}{rl}\mathrm{log}C=& \frac{1}{x}\mathrm{log}x+\frac{x-1}{x}\mathrm{log}\frac{x}{x-1}\\ =& \mathrm{log}x+\frac{1}{x}\mathrm{log}\left(x-1\right)-\mathrm{log}\left(x-1\right)\\ D=& \frac{C\prime }{C}=\frac{1}{x}-\frac{1}{{x}^{2}}\mathrm{log}\left(x-1\right)+\frac{1}{x\left(x-1\right)}-\frac{1}{x-1}\\ =& -\frac{1}{{x}^{2}}\mathrm{log}\left(x-1\right)\\ \mathrm{log}C+\mathrm{Dx}+{\mathrm{Dx}}^{2}=& \mathrm{log}x+\frac{1}{x}\mathrm{log}\left(x-1\right)-\mathrm{log}\left(x-1\right)-\frac{1}{x}\mathrm{log}\left(x-1\right)-\mathrm{log}\left(x-1\right)\\ =& \mathrm{log}x-2\mathrm{log}\left(x-1\right)=0\\ \mathrm{log}x=& 2\mathrm{log}\left(x-1\right)\\ x=& \left(x-1{\right)}^{2}\\ 0=& {x}^{2}-3x+1\\ x=& \frac{3+\sqrt{5}}{2}\end{array}$ Thus, $n\approx 2.62k$, $S\left(n,n/2.62\right)\approx {1.94}^{n}$, and $C\left(s\right)\approx \left(1+1/x\right)\mathrm{log}s/\mathrm{log}x\approx 1.44\mathrm{lg}s$.

Lazy vs. strict vs. lenient?

Sunday, March 1st, 2009

Here’s an interesting short paper by Wadler:

To paraphrase, the simplest model of lazy evaluation is given by Church’s original $\lambda$ calculus, and the simplest model of strict evaluation is given by Plotkin’s ${\lambda }_{v}$ calculus. $\lambda$ is problematic because it gives a poor notion of the cost of programs, and ${\lambda }_{v}$ is problematic because it is not complete. These flaws can be removed at the cost of slightly more complicated models, and the resulting models turn out to be extremely similar. Specifically, the only difference between them is the following law, let x = M in N -> N which holds with laziness and fails for strictness. In other words, the different between strict and lazy languages is exactly what one would expect: in a lazy model unused terms can be ignored, and in a strict model they must be evaluated to make sure they don’t loop forever.

This is a fairly obvious result: all it says is that lazy languages are the same as strict except that it’s okay to have an infinite loop if the result is unused. It’s still an interesting point to emphasize though, since it highlights the importance of trying to come up with alternate evaluate schemes that combine the advantages of lazy and strict (e.g., Tim Sweeney’s discussion of lenient evaluation).