Archive for the ‘math’ Category

Haskell vs. C

Tuesday, June 30th, 2009

Here’s a summary of the differences between typed functional languages and unsafe languages:

  • Difficulty of easy things: Haskell ~ C
  • Difficulty of hard things: Haskell < C
  • Difficulty of impossible things: Haskell >> C

Kudos to anyone who knows what this means.

A Wikipedia-Style Proof

Friday, April 3rd, 2009

We wish to show that for small x, 1 +x1 +x2 To derive this formula, we use the well known fact that (1 +x) α= n=0 x nn! k=0 n1 (αk) The result follows by substituting α=1 /2 and truncating the series after two terms.

Coin Perfection

Saturday, March 28th, 2009

Say you want to flip a perfect coin, but the only coins available are imperfect. First, let’s consider the case where we don’t know the details of the available coins, and would prefer not to measure them.

The simplest strategy is to flip a bunch of coins and determine whether the total number of heads is even or odd. As long as there’s a reasonable number of both heads and tails, the probability of an even number of heads will be close to one half.

Here is a measure of “coin perfection” which behaves additively with respect to the parity algorithm:

G(p)=log2 p1

I.e., if you have two coins with probabilities of heads p and q, and r is the probability that flipping both coins gives the same result, then

G(r)=G(p)+G(r)

since e G(r) =2 r1 =2 (pq+(1 p)(1 q))1 =2 pq+2 2 p2 q+2 pq1 =4 pq2 p2 q+1 =(2 p1 )(2 q1 ) =e G(p)e G(q)

In particular, G(0 )=G(1 )=0 and G(1 /2 )= as one would expect; flipping a deterministic coin gets you no closer to perfection, and flipping a single perfect coin makes the parity perfectly random regardless of the other coins.

If we know the exact probability of heads p, we can do better than this and produce a perfect coin in a finite expected number of flips. By information theory the best we can hope for is 1 /H(p) where

H(p)=plgp(1 p)lg(1 p)

is the entropy of the coin. Unfortunately determining the exact minimum appears rather complicated, so I’m not going to try to finish the analysis right now.

Multigrid is the future

Wednesday, March 11th, 2009

Currently, we (or at least I) don’t know how to do multigrid on general problems, so we’re stuck using conjugate gradient. The problem with conjugate gradient is that it is fundamentally about linear systems: given Ax=b, construct the Krylov subspace b,Ax,A 2 x, and pick out the best available linear combination. It’s all in terms of linear spaces.

Interesting human scale physics is mostly not about linear spaces: it’s about half-linear subspaces, or linear spaces with inequality constraints. As we start cramming more complicated collision handling into algorithms, these inequalities play a larger and larger role in the behavior, and linearizing everything into a CG solve hurts.

Enter multigrid. Yes, it’s theoretically faster, but the more important thing is that the intuition for why multigrid works is less dependent on linearity: start with a fine grid, smooth it a bit, and then coarsen to handle the large scale cheaply. Why shouldn’t this work on a system that’s only half linear? There are probably a thousand reasons, but happily I haven’t read enough multigrid theory yet to know what they are.

So, I predict that eventually someone will find a reasonably flexible approach to algebraic multigrid that generalizes to LCPs, and we’ll be able to advance beyond the tyranny of linearization.

Plates

Sunday, March 1st, 2009

Here’s a combinatorial problem I found recently: given a 100 story building and two identical plates, what is the worst case number of drops required to determine the lower floor from which the plates break when dropped (if a plate drops and does not break, it is undamaged). The answer is 14, which I’ll state without proof.

In the interest of completely useless generalization, we can ask what happens if we add more plates. Specifically, what is the cost C(s) to analyze an s story building if we can buy as many plates as we like? Say each plate costs as much as one drop.

First we need to figure out how many drops it takes for a given number of plates, or (roughly) equivalently the number of floors we can analyze for a given number of plates and drops. If we count the zeroth story as a floor to make the formulas slightly prettier, the answer is that n drops and k plates can analyze a building with S(n,k) floors, where S(n,k)=(n0 )+(n1 )++(nk)= pk(nk) (This can be proved by induction on the number of plates.) Thus, S(n,k) is the number of subsets of [1..n] with at most k elements, which turns out to have no closed form expression (Graham et al.). Therefore, we’ll have to settle for asymptotics. The best estimates I could find are given in Worsch 1994. Since they details vary depending on the relative growth of n and k, we first need a rough idea of the growth rate of n/k.

For a building of s floors, brute force binary search requires n=k=lgs+1 , so C(s)<2 +2 lgs and n,kn+k=C(s)<2 lgs. Since at least lgs tests are required regardless of the number of plates, we have lgsn<2 +2 lgs. Lacking an obvious lower bound for k, I’ll assume for the moment that k=Θ(lgs)=Θ(n). Numerical evidence indicates that n/k is in fact between 2 and 3.

In the case where n/k is a constant of at least 2, Worsch 1994 gives S(n,k)=(C(x)+O(1 )) nC(x) n where C(n/k)=C(x)=x 1 x(xx1 ) x1 x We can now use this approximation to find the optimal ratio x by optimizing n+k subject to S=s. For convenience, let D(x)=C(x)/C(x) be the logarithmic derivative of C. Applying Lagrange multipliers, we have E= n+kλ(C(nk) ns) En= 1 λ(nC n1 CD1 k+C nlogC) = 1 λC n(Dx+logC) Ek= 1 λnC n1 CDnk 2 = 1 λC nDx 2 Equating derivatives to zero and solving gives logC+Dx+Dx 2 =0 . Filling in the details, logC= 1 xlogx+x1 xlogxx1 = logx+1 xlog(x1 )log(x1 ) D= CC=1 x1 x 2 log(x1 )+1 x(x1 )1 x1 = 1 x 2 log(x1 ) logC+Dx+Dx 2 = logx+1 xlog(x1 )log(x1 )1 xlog(x1 )log(x1 ) = logx2 log(x1 )=0 logx= 2 log(x1 ) x= (x1 ) 2 0 = x 2 3 x+1 x= 3 +5 2 Thus, n2.62 k, S(n,n/2.62 )1.94 n, and C(s)(1 +1 /x)logs/logx1.44 lgs.

Sieving Primes 2

Saturday, January 10th, 2009

Success! I found a reference to Mertens’ theorem in a paper by Adam Kalai, which is exactly what I needed to analyze the complexity of the prime sieve from the previous post.

Alas, my previous complexity guess was wrong. The actual complexity is O(n 3 /2 /log 3 n). Apparently my numerical estimates didn’t go out far enough (also I was biased towards finding O(nlog mn)).

From before, the cost of the algorithm is O(C(n)) where C(n)= c<nπ(lpf(c)) where c<n means all composite numbers up to n. Note that any number with lpf(c)=p can be written as pm, where m is not divisible by any prime smaller than p. By Mertens’ theorem, the density of numbers not divisible by primes less than p is q<p(1 1 q) e γlogp We can now estimate C(n) as C(n)= c<nπ(lpf(c)) = p<nπ(p) k=p n/p1 ifq<p,qk p<nπ(p)(npp) q<p(1 1 q) e γ p<nπ(p)(npp)1 logp e γ 2 nπ(x)(nxx)1 logxdxlogx e γ 2 nxlog 3 x(nxx) e γ 2 nnx 2 log 3 x Since the integrand is O(n/log 3 n) for x>n 1 /3 , we get C(n)=O(n 3 /2 /log 3 n). On the other hand, the integrand is ω(n/log 3 n) for x<n/4 , so in fact C(n)=Θ(n 3 /2 /log 3 n).

This adds another entry to the set of listful Sieve of Eratostheness-like algorithms that turn out to cost as much as trial division (up to a polylog factor). It’d be cool if a faster list based algorithm existed, but I expect that the answer is probably no.

Sieving Primes

Saturday, January 3rd, 2009

Here’s an interesting lazy variant of the sieve of Eratosthenes by Nick Chapman, from a discussion on LTU:

    primes = 2 : diff [3..] (composites primes)

    composites (p:ps) = cs
        where cs = (p*p) : merge (map (p*) (merge ps cs)) (composites ps)

    merge (x:xs) (y:ys) | x<y           = x : merge xs (y:ys)
                        | otherwise     = y : merge (x:xs) ys

    diff (x:xs) (y:ys)  | x==y          = diff xs ys
                        | x<y           = x : diff xs (y:ys)
                        | x>y           = diff (x:xs) ys

In words, the algorithm computes the list of primes as all numbers which are not composites, and the computes the list of composites by extremely constructing and merging all prime factorizations.

As the discussion shows, this is also a great example of why laziness hasn’t caught on: no one could figure out how fast it was.

To see that the algorithm computes each prime p after finite time, note that all recursive calls are “upwards” towards higher prime factors. If q 2 >p, composites(q:...) won’t be expanded beyond the first term q 2 , and therefore for r>q composites(r:...) will never be called.

Only O(n) multiplications are used (one for each composite), so the cost is the construction and merging of the partial lists of composite numbers. The total runtime complexity of the algorithm is equal to the total number of terms expanded out of the following lists:

  1. composites(p:ps) for each prime p.
  2. Each mergepscs.
  3. Each map(p*)(mergepscs).

Say we evaluate all primes less than some integer n. The cost of (2) and (3) are at most a constant factor greater than the corresponding (1), so we need to compute the number of terms computed from each composites(p:...). At most one number >n will appear in each, so we can restrict considering to numbers <n. Since a given composite k<n appears in composites(p:...) iff all primes dividing k are p, the total cost due to k is roughly π(lpf(k)) where lpf(k) is the least prime factor of k.

Thus the complexity of the algorithm is given by the partial sums of π(lpf()) over the composite numbers. That is,

C(n)= n>kπ(lpf(k))

I can’t find much information about the lpf function online, and I didn’t manage to estimate it analytically. Numerically, it looks like C(n) is between O(nlog 2 n) and O(nlog 3 n).

So, not O(n), but O˜(n) is pretty good, and it’s cool that you can do it with lists alone.

Update: my complexity estimate is wrong. Here’s the correct analysis.

Balancing circles

Monday, December 15th, 2008

After extensive testing with glasses of water, I’ve determined that it’s impossible to stably balance one infinitely thin circle on top of another. This is true whether the circles are the same size or different, and does not depend on the mass distribution of the circles. This is because two circles intersect at either zero or two points (unless they are the same, and that still isn’t a stable configuration).

Interestingly, two circles seem to be the only two dimensional simple closed curves for which this property holds; any other pair of curves can be scaled or rotated until they intersect at four points, and I believe these intersections can be arranged so that their convex hull contains the center of mass (for balance). I spent a while on the plane trying to prove this, but no luck yet.

Matrix plus scalar

Wednesday, November 26th, 2008

If a is a scalar and M is a square matrix, it is very convenient to be able to write a+M. Usually people know immediately what this means, but are uneasy about “abusing” notation, so here’s the detailed justification for why this is perfectly legitimate:

Matrices should be considered first and foremost as linear transformations. You know what a matrix is if you know what it does to vectors. A scalar is also a linear transform on vectors: multiplying a scalar times a vector is a linear operation. Therefore, scalars can also be thought of as linear transformations, and therefore as matrices. It is immediate which matrix the scalar should be: the result of multiplying by a scalar k is that all components are scaled by k; the matrix that does this is just kI.

Here’s another way of saying that. Let A be an associative algebra with unity over a field K. We can define a homomorphism of algebras f:KA via f(k)=k1 A. f is an isomorphism since if jk, f(j)=f(k) implies 0 =(jk) 1 (jk)1 A=1 A, a contradiction. f is also the unique nontrivial homomorphism from K to A, since algebra homomorphisms must send 1 to either 0 or 1 . Therefore, there is a unique copy of K inside A, and we can identify K with that copy.

This is a general principle: no one writes (a+0 i)+z when a is real and z is complex, because the complex numbers are considered a superset of the reals. The same is true of matrices: once we identify scalars with scalar diagonal matrices, matrices are just another superset of the reals.

Singular values are not the magnitudes of eigenvalues

Sunday, November 23rd, 2008

A week ago someone asked whether the singular values of a general (real) matrix are the magnitudes of its eigenvalues. There are various ways to see that the answer is no, but here’s an amusingly nonconstructive proof:

Consider a random matrix A taken from GL(n) with some smooth distribution. With probability 1 all singular values of A will be unique. However, with nonzero probability A will be near a rotation matrix and will have a complex conjugate pair of eigenvalues with the same magnitude. Therefore, the singular values of A are not always the magnitudes of the eigenvalues.