## Archive for the ‘math’ Category

### Minimal continuous poker

Sunday, November 13th, 2011

Consider the following poker-like game, played with two players: Alice and Bob.  Bob posts a blind of 1.  Both players are dealt a single, continuous hand chosen uniformly at random from $\left[0,1\right]$.  Alice can fold, call, or raise any amount $b>0$ (calling means $b=0$).  Bob either calls or folds.

My original plan was to work out the Nash equilibrium for this game, and therefore derive interesting smooth curves describing the optimal way for Alice to bluff and generally obscure her hand.  It didn’t quite work out that way, but the result is still interesting.

Let $A$ be the optimal expected payoff for Alice, $A\left(x\right)$ the optimal expected payoff given Alice’s specific hand x.  Clearly A(x) is a weakly increasing function of x; if x’ > x but A(x’) < A(x), Alice can improve A(x’) to A(x) by pretending she has hand x and using the same strategy.  If A(x) > 0 for some x, so that Alice’s optimal strategy is better than folding, then Alice will never fold since any nonzero probability of folding would reduce her utility.  Thus there is a threshold hand ${x}_{t}$ below which Alice always folds (technically, always “might as well” fold, but we’ll gloss over this detail for the moment), and above which Alice always calls but might vary the amount she raises by.  A similar argument applied to Bob gives a threshold function ${y}_{t}\left(b\right)$ s.t. Bob folds if $y<{y}_{t}\left(b\right)$ and calls if $y>{y}_{t}\left(b\right)$.

These and other similar preliminaries aside, I spent a while banging my head against the wall before temporarily giving up and picking a simpler game.  Specifically, make the bet $b$ a fixed constant.  If $b=0$, Bob always checks, and Alice’s optimal strategy is (unsurprisingly) to call if $x>1/2$.  This gives a payoff of $A=1/4$.  If $b>0$, Bob’s function ${y}_{t}\left(b\right)$ is a constant. The values of ${x}_{t}$, ${y}_{t}$, and $A$ at the Nash equilibrium turn out to be

$\begin{array}{rl}{x}_{t}=& \frac{2+2b+{b}^{2}}{\left(2+b{\right)}^{2}}\\ {y}_{t}=& \frac{1+b}{2+b}\\ A=& \frac{2+2b}{\left(2+b{\right)}^{3}}\end{array}$

A few notes: for any $b>0$, ${x}_{t}<{y}_{t}$, since Bob knows Alice’s threshold and gains nothing by calling with hands only slightly over ${x}_{t}$.  Interestingly, Alice’s payoff remains unchanged if she varies her threshold anywhere satisfying ${x}_{t}<{y}_{t}$; the particular value of ${x}_{t}$ is only prevents Bob from increasing his utility with a different value of ${y}_{t}$.  However, the most important property of this result is that $A$ is a decreasing function of $b$.  If the bet is fixed, Alice would prefer it to be zero so that $A=1/4$.

Now back the general game, where Alice chooses the bet.  We have $A\ge 1/4$, since Alice can choose to always call or fold.  Can she do better, varying the bet in some exotic fashion so as to milk more money out of Bob while keeping her hand disguised?  Well, no.  Update: After disproving a conjecture (see below), the answer is maybe.

Let’s see what happens if Bob follows the optimal strategy from the fixed $b$ game, calling if $y>\left(1+b\right)/\left(2+b\right)$ and folding otherwise. Since Bob’s strategy is fixed, Alice can maximize her utility without any extra randomness, and we need only consider bets which are a deterministic function of $x$.  For $x<{x}_{t}$ she folds and $A\left(x\right)=0$, for $x>{x}_{t}$ she bets $b=b\left(x\right)$ with payoff

$\begin{array}{rl}A\left(x\right)=& 1\mathrm{Pr}\left({y}_{t}>y\right)+\left(1+b\right)\mathrm{Pr}\left(x>y>{y}_{t}\right)-\left(1+b\right)\mathrm{Pr}\left(y>x,{y}_{t}\right)\\ =& {y}_{t}+\left(1+b\right)\mathrm{max}\left(0,x-{y}_{t}\right)-\left(1+b\right)\left(1-\mathrm{max}\left(x,{y}_{t}\right)\right)\\ =& {y}_{t}+\left(1+b\right)\mathrm{max}\left(0,x-{y}_{t}\right)-\left(1+b\right)+\left(1+b\right)x+\left(1+b\right)\mathrm{max}\left(0,{y}_{t}-x\right)\\ =& {y}_{t}+\left(1+b\right)\left(x-1\right)+\left(1+b\right)|x-{y}_{t}|\\ \end{array}$

Differentiating w.r.t. $b$,

$\begin{array}{rl}A\left(x\right)\prime =& x-1+{y}_{t}\prime +|x-{y}_{t}|+\left(1+b\right)\mathrm{sgn}\left({y}_{t}-x\right){y}_{t}\prime \\ {y}_{t}\prime =& \frac{1}{2+b}-\frac{1+b}{\left(2+b{\right)}^{2}}=\frac{1-{y}_{t}}{2+b}\end{array}$

If $x<{y}_{t}$, this becomes

$\begin{array}{rl}A\left(x\right)\prime =& x-1+{y}_{t}\prime +{y}_{t}-x+\left(1+b\right){y}_{t}\prime \\ =& -1+\left(2+b\right)\left(1-{y}_{t}\right)/\left(2+b\right)+{y}_{t}\\ =& -1+1-{y}_{t}+{y}_{t}=0\end{array}$

so if $x<{y}_{t}$, Alice’s bet doesn’t matter.  If $x>{y}_{t}$, we have

$\begin{array}{rl}A\left(x\right)\prime =& x-1+{y}_{t}\prime +x-{y}_{t}-\left(1+b\right){y}_{t}\prime \\ =& 2x-1-b{y}_{t}\prime -{y}_{t}\\ =& 2x-1-b\left(1-{y}_{t}\right)/\left(2+b\right)-{y}_{t}\\ =& 2x-1-b/\left(2+b\right)+b\left(1+b\right)/\left(2+b{\right)}^{2}-\left(1+b\right)/\left(2+b\right)\\ =& 2x-1-\frac{2b+{b}^{2}-b-{b}^{2}+2+3b+{b}^{2}}{\left(2+b{\right)}^{2}}\\ =& 2x-1-\frac{2+4b+{b}^{2}}{\left(2+b{\right)}^{2}}\\ =& 2x-2+\frac{2}{\left(2+b{\right)}^{2}}\\ \end{array}$

which is negative as $b→\infty$ and zero at $b=\sqrt{1/\left(1-x\right)-2}$.

Thus $A\left(x\right)\prime =0$ iff $\left(2+b{\right)}^{2}=1/\left(1-x\right)$, or

$b=\sqrt{\frac{1}{1-x}-2}$

Unfortunately for Bob, $A\left(x\right)\prime >0$ at $b=0$ iff $2x-2+1/2>0$ iff $x>3/4$.  Thus Alice can beat $A=1/4$ by increasing her bet whenever $x>3/4$.  Abandoning hand calculation and turning to Mathematica, it turns out Alice’s optimal strategy achieves $A=7/24\approx .29$.

So, we haven’t found the Nash equilibrium yet, but at least our minimal poker game isn’t (necessarily) incredibly boring.  More analysis is required.  For pretty pictures’ sake, here’s a plot of Alice’s outcome as $x$ and $b$ vary:

Alice's expected outcome for hand x and bet b

### Cars and logs

Sunday, October 16th, 2011

Consider $n$ cars arranged in random order along a road, all moving in the same direction.  Each car has its own distinct ideal speed, and will travel at that speed unless blocked by another car ahead.  No passing is allowed (this is a windy, mountain road, say).  After some time, the cars will have settled into a series of chains of cars, each composed of a bunch of cars trapped behind a slower car in the front of the chain.  The chains will be sorted in order of speed with the fastest chain in front.

The question: what is the expected number of chains?

Rather than giving the answer directly, I’ll describe the steps it took me to get to the exact answer, because I think they’re an interesting example of how one moves through guessing to approximations to precision.

1. So, let’s see…what about the slowest car?  The slowest car will be exactly in the middle on average, and will turn all the cars behind it into a single, gigantic chain.  Only $n/2$ cars remain in front, so the answer should be $O\left(\mathrm{log}n\right)$.

2. What’s the base?  $\mathrm{log}$ is concave down, so it matters more if the slowest car happens to be closer to the front than it matters to be closer to the back.  Thus the number of chains is probably smaller than ${\mathrm{log}}_{2}n$, which is what we’d get if the slowest car was always exactly in the middle.  That means the base should be greater than 2.  There are only two sensible bases for logarithms: 2 and $e$, one of which is conveniently greater than 2.  So we expect the answer is about $\mathrm{log}n={\mathrm{log}}_{e}n$.

3. The answer also has to be a rational number, so we need something rational close to $\mathrm{log}n$.  That usually means harmonic numbers: ${H}_{n}=1+1/2+\cdots +1/n$.  To check: If there is one car, the number of chains is $1={H}_{1}$.  If there’s two cars, the faster car will be ahead half the time, so the expected number of chains is $1/2\left(1+2\right)=3/2={H}_{2}$.  So far so good.

4. Now to prove it.  This step took me a rather long time given the simplicity of the answer (still an enjoyable way to highlight part of a hike, though).  Here’s the proof: Consider the $k$th car, counting from 1.  This car will be the leader of a chain (a dubious distinction) if and only if it’s slower than all cars in front.  But the relative speed order of the cars from 1 to $k$ is purely random, so this happens with probability $1/k$.  Summing over all positions of cars, we get ${H}_{n}$.

Fun.  This is one of the nicest examples I’ve seen of how natural logs crop up (cough) naturally in purely combinatorial settings.  Another cool example is the treap data structure (a combination of an ordered binary search tree and a random heap), where the analysis actually involves both the harmonic numbers and the sums of the squares (which are asymptotic to ${\pi }^{2}/6=O\left(1\right)$).

Which raises the obvious followup question: is it possible to tweak the problem so that the expected number of chains is also asymptotic to ${\pi }^{2}/6$?  E.g., by allowing passing in certain situations?

### Tai chi and calculus

Monday, August 29th, 2011

Tai chi as a martial art seems far removed from actual fighting, since the motions are slow to the point of static.  However, a few years ago during a conversation with Debra Page we came up with a cool analogy to describe part of why studying tai chi is valuable to the full speed case.  I never got around to writing it up, so here goes: tai chi is like calculus.

The fundamental idea of calculus is that change is simpler when viewed in small amounts.  A smooth function, evaluated at a variety of points spaced a finite distance apart, can contain all manner of nonlinearities, jumps, wiggles, etc.  However, if you look close enough, every smooth function turns into the simplest possible case: a straight line.  The beauty is that knowledge gained by looking closely around each point can be extrapolated from infinitesimal up to finite to understand the global behavior of the function.

Now from calculus to physics: if we understand the behavior of a dynamical system for small intervals of time and space, calculus lets us jump from infinitesimal to finite to understand the behavior of the system for larger motions.  In fact, we can go one step further: d’Alembert’s principle tells us that in order to understand the infinitesimal behavior of a dynamical system, it suffices to understand the infinitesimal behavior of an equivalent static system.  To do this, we start with a snapshot of the position of a system at a given time.  In the general case our snapshot won’t be static since the forces won’t sum to zero, but we can compensate for this by subtracting so called “inertial forces” from our system to balance the rest of the forces.  Once everything in balance, and our snapshot has been reduced to a static system, we can read off the dynamical behavior immediately: the inertial forces turn out to be exactly mass times acceleration.

Now say we’re in a fast martial art, such as karate, and we’d like the understand the process of stepping from one stance to another.  We start with weight on two feet, and in one smooth motion remove the weight from one, move it through some chosen path to a new position, and redistribute weight into the new stance.  There are innumerable aspects we’d like to understand about this process: the energy required, its stability under unexpected external stimuli, the effort required to change direction if suddenly necessary, the amount of force that can be exerted in any particular direction at any point along the motion, etc.  Calculus teaches us that to gain this understanding, it suffices to study the behavior of the system over small periods in time and space.  Unfortunately, a human is a fairly high dimensional system, and the linear functions we have to understand are high dimensional as well.  It’s very difficult to analyze a high dimensional space while moving quickly between stances.

D’Alembert to the rescue: instead of applying calculus to the dynamical system, we move from one stance to another extremely slowly, taking time to analyze the linear space around each nearly static pose.  Additional muscle action may be required, corresponding to the inertial forces necessary to turn the dynamic case static.  Now that we have time, many of the questions we’d like to understand about the fast motion translate into experiments we’re almost forced to run: e.g., compensating for any wiggles from our path corresponds to running a stability analysis.  Want to know if it’s possible to change direction on a dime halfway through the motion?  Just stop in the middle, move a little bit in the given direction, and feel the floor to see if the required force is normal or tangential.  With enough repetition, and enough slowness, we’ll inevitably sample enough of each linear space to get an intuitive feel for the detailed structure of the motion, knowledge that can be applied directly if we ever need to scale back up to fast.

Caveat: my total hours of tai chi training can be counted on two hands, but Debra knows far more and liked this analogy, so hopefully this post is reasonable.

Tuesday, June 30th, 2009

Here’s a summary of the differences between typed functional languages and unsafe languages:

• Difficulty of easy things: Haskell ~ C
• Difficulty of hard things: Haskell < C
• Difficulty of impossible things: Haskell >> C

Kudos to anyone who knows what this means.

### A Wikipedia-Style Proof

Friday, April 3rd, 2009

We wish to show that for small $x$, $\sqrt{1+x}\approx 1+\frac{x}{2}$ To derive this formula, we use the well known fact that $\left(1+x{\right)}^{\alpha }=\sum _{n=0}^{\infty }\frac{{x}^{n}}{n!}\prod _{k=0}^{n-1}\left(\alpha -k\right)$ The result follows by substituting $\alpha =1/2$ and truncating the series after two terms.

### Coin Perfection

Saturday, March 28th, 2009

Say you want to flip a perfect coin, but the only coins available are imperfect. First, let’s consider the case where we don’t know the details of the available coins, and would prefer not to measure them.

The simplest strategy is to flip a bunch of coins and determine whether the total number of heads is even or odd. As long as there’s a reasonable number of both heads and tails, the probability of an even number of heads will be close to one half.

Here is a measure of “coin perfection” which behaves additively with respect to the parity algorithm:

$G\left(p\right)=-\mathrm{log}|2p-1|$

I.e., if you have two coins with probabilities of heads $p$ and $q$, and $r$ is the probability that flipping both coins gives the same result, then

$G\left(r\right)=G\left(p\right)+G\left(r\right)$

since $\begin{array}{rl}{e}^{G\left(r\right)}& =|2r-1|\\ & =|2\left(pq+\left(1-p\right)\left(1-q\right)\right)-1|\\ & =|2\mathrm{pq}+2-2p-2q+2\mathrm{pq}-1|\\ & =|4\mathrm{pq}-2p-2q+1|\\ & =|\left(2p-1\right)\left(2q-1\right)|\\ & ={e}^{G\left(p\right)}{e}^{G\left(q\right)}\end{array}$

In particular, $G\left(0\right)=G\left(1\right)=0$ and $G\left(1/2\right)=\infty$ as one would expect; flipping a deterministic coin gets you no closer to perfection, and flipping a single perfect coin makes the parity perfectly random regardless of the other coins.

If we know the exact probability of heads $p$, we can do better than this and produce a perfect coin in a finite expected number of flips. By information theory the best we can hope for is $1/H\left(p\right)$ where

$H\left(p\right)=-p\mathrm{lg}p-\left(1-p\right)\mathrm{lg}\left(1-p\right)$

is the entropy of the coin. Unfortunately determining the exact minimum appears rather complicated, so I’m not going to try to finish the analysis right now.

### Multigrid is the future

Wednesday, March 11th, 2009

Currently, we (or at least I) don’t know how to do multigrid on general problems, so we’re stuck using conjugate gradient. The problem with conjugate gradient is that it is fundamentally about linear systems: given $\mathrm{Ax}=b$, construct the Krylov subspace $b,\mathrm{Ax},{A}^{2}x,\dots$ and pick out the best available linear combination. It’s all in terms of linear spaces.

Interesting human scale physics is mostly not about linear spaces: it’s about half-linear subspaces, or linear spaces with inequality constraints. As we start cramming more complicated collision handling into algorithms, these inequalities play a larger and larger role in the behavior, and linearizing everything into a CG solve hurts.

Enter multigrid. Yes, it’s theoretically faster, but the more important thing is that the intuition for why multigrid works is less dependent on linearity: start with a fine grid, smooth it a bit, and then coarsen to handle the large scale cheaply. Why shouldn’t this work on a system that’s only half linear? There are probably a thousand reasons, but happily I haven’t read enough multigrid theory yet to know what they are.

So, I predict that eventually someone will find a reasonably flexible approach to algebraic multigrid that generalizes to LCPs, and we’ll be able to advance beyond the tyranny of linearization.

### Plates

Sunday, March 1st, 2009

Here’s a combinatorial problem I found recently: given a 100 story building and two identical plates, what is the worst case number of drops required to determine the lower floor from which the plates break when dropped (if a plate drops and does not break, it is undamaged). The answer is 14, which I’ll state without proof.

In the interest of completely useless generalization, we can ask what happens if we add more plates. Specifically, what is the cost $C\left(s\right)$ to analyze an $s$ story building if we can buy as many plates as we like? Say each plate costs as much as one drop.

First we need to figure out how many drops it takes for a given number of plates, or (roughly) equivalently the number of floors we can analyze for a given number of plates and drops. If we count the zeroth story as a floor to make the formulas slightly prettier, the answer is that $n$ drops and $k$ plates can analyze a building with $S\left(n,k\right)$ floors, where $S\left(n,k\right)=\left(\genfrac{}{}{0}{}{n}{0}\right)+\left(\genfrac{}{}{0}{}{n}{1}\right)+\cdots +\left(\genfrac{}{}{0}{}{n}{k}\right)=\sum _{p\le k}\left(\genfrac{}{}{0}{}{n}{k}\right)$ (This can be proved by induction on the number of plates.) Thus, $S\left(n,k\right)$ is the number of subsets of [1..n] with at most $k$ elements, which turns out to have no closed form expression (Graham et al.). Therefore, we’ll have to settle for asymptotics. The best estimates I could find are given in Worsch 1994. Since they details vary depending on the relative growth of $n$ and $k$, we first need a rough idea of the growth rate of $n/k$.

For a building of $s$ floors, brute force binary search requires $n=k=\mathrm{lg}s+1$, so $C\left(s\right)<2+2\mathrm{lg}s$ and $n,k\le n+k=C\left(s\right)<2\mathrm{lg}s$. Since at least $\mathrm{lg}s$ tests are required regardless of the number of plates, we have $\mathrm{lg}s\le n<2+2\mathrm{lg}s$. Lacking an obvious lower bound for $k$, I’ll assume for the moment that $k=\Theta \left(\mathrm{lg}s\right)=\Theta \left(n\right)$. Numerical evidence indicates that $n/k$ is in fact between 2 and 3.

In the case where $n/k$ is a constant of at least 2, Worsch 1994 gives $S\left(n,k\right)=\left(C\left(x\right)+O\left(1\right){\right)}^{n}\approx C\left(x{\right)}^{n}$ where $C\left(n/k\right)=C\left(x\right)={x}^{\frac{1}{x}}{\left(\frac{x}{x-1}\right)}^{\frac{x-1}{x}}$ We can now use this approximation to find the optimal ratio $x$ by optimizing $n+k$ subject to $S=s$. For convenience, let $D\left(x\right)=C\prime \left(x\right)/C\left(x\right)$ be the logarithmic derivative of $C$. Applying Lagrange multipliers, we have $\begin{array}{rl}E=& n+k-\lambda \left(C{\left(\frac{n}{k}\right)}^{n}-s\right)\\ \frac{\partial E}{\partial n}=& 1-\lambda \left(n{C}^{n-1}CD\frac{1}{k}+{C}^{n}\mathrm{log}C\right)\\ =& 1-\lambda {C}^{n}\left(Dx+\mathrm{log}C\right)\\ \frac{\partial E}{\partial k}=& 1-\lambda n{C}^{n-1}CD\frac{n}{{k}^{2}}\\ =& 1-\lambda {C}^{n}D{x}^{2}\end{array}$ Equating derivatives to zero and solving gives $\mathrm{log}C+\mathrm{Dx}+{\mathrm{Dx}}^{2}=0$. Filling in the details, $\begin{array}{rl}\mathrm{log}C=& \frac{1}{x}\mathrm{log}x+\frac{x-1}{x}\mathrm{log}\frac{x}{x-1}\\ =& \mathrm{log}x+\frac{1}{x}\mathrm{log}\left(x-1\right)-\mathrm{log}\left(x-1\right)\\ D=& \frac{C\prime }{C}=\frac{1}{x}-\frac{1}{{x}^{2}}\mathrm{log}\left(x-1\right)+\frac{1}{x\left(x-1\right)}-\frac{1}{x-1}\\ =& -\frac{1}{{x}^{2}}\mathrm{log}\left(x-1\right)\\ \mathrm{log}C+\mathrm{Dx}+{\mathrm{Dx}}^{2}=& \mathrm{log}x+\frac{1}{x}\mathrm{log}\left(x-1\right)-\mathrm{log}\left(x-1\right)-\frac{1}{x}\mathrm{log}\left(x-1\right)-\mathrm{log}\left(x-1\right)\\ =& \mathrm{log}x-2\mathrm{log}\left(x-1\right)=0\\ \mathrm{log}x=& 2\mathrm{log}\left(x-1\right)\\ x=& \left(x-1{\right)}^{2}\\ 0=& {x}^{2}-3x+1\\ x=& \frac{3+\sqrt{5}}{2}\end{array}$ Thus, $n\approx 2.62k$, $S\left(n,n/2.62\right)\approx {1.94}^{n}$, and $C\left(s\right)\approx \left(1+1/x\right)\mathrm{log}s/\mathrm{log}x\approx 1.44\mathrm{lg}s$.

### Sieving Primes 2

Saturday, January 10th, 2009

Success! I found a reference to Mertens’ theorem in a paper by Adam Kalai, which is exactly what I needed to analyze the complexity of the prime sieve from the previous post.

Alas, my previous complexity guess was wrong. The actual complexity is $O\left({n}^{3/2}/{\mathrm{log}}^{3}n\right)$. Apparently my numerical estimates didn’t go out far enough (also I was biased towards finding $O\left(n{\mathrm{log}}^{m}n\right)$).

From before, the cost of the algorithm is $O\left(C\left(n\right)\right)$ where $C\left(n\right)=\sum _{c where $c means all composite numbers up to $n$. Note that any number with $\mathrm{lpf}\left(c\right)=p$ can be written as $\mathrm{pm}$, where $m$ is not divisible by any prime smaller than $p$. By Mertens’ theorem, the density of numbers not divisible by primes less than $p$ is $\begin{array}{rlr}\prod _{q We can now estimate $C\left(n\right)$ as $\begin{array}{rl}C\left(n\right)=& \sum _{c Since the integrand is $O\left(n/{\mathrm{log}}^{3}n\right)$ for $x>{n}^{1/3}$, we get $C\left(n\right)=O\left({n}^{3/2}/{\mathrm{log}}^{3}n\right)$. On the other hand, the integrand is $\omega \left(n/{\mathrm{log}}^{3}n\right)$ for $x<\sqrt{n/4}$, so in fact $C\left(n\right)=\Theta \left({n}^{3/2}/{\mathrm{log}}^{3}n\right)$.

This adds another entry to the set of listful Sieve of Eratostheness-like algorithms that turn out to cost as much as trial division (up to a polylog factor). It’d be cool if a faster list based algorithm existed, but I expect that the answer is probably no.

### Sieving Primes

Saturday, January 3rd, 2009

Here’s an interesting lazy variant of the sieve of Eratosthenes by Nick Chapman, from a discussion on LTU:

``````    primes = 2 : diff [3..] (composites primes)

composites (p:ps) = cs
where cs = (p*p) : merge (map (p*) (merge ps cs)) (composites ps)

merge (x:xs) (y:ys) | x<y           = x : merge xs (y:ys)
| otherwise     = y : merge (x:xs) ys

diff (x:xs) (y:ys)  | x==y          = diff xs ys
| x<y           = x : diff xs (y:ys)
| x>y           = diff (x:xs) ys
``````

In words, the algorithm computes the list of primes as all numbers which are not composites, and the computes the list of composites by extremely constructing and merging all prime factorizations.

As the discussion shows, this is also a great example of why laziness hasn’t caught on: no one could figure out how fast it was.

To see that the algorithm computes each prime $p$ after finite time, note that all recursive calls are “upwards” towards higher prime factors. If ${q}^{2}>p$, $\mathrm{composites}\left(q:...\right)$ won’t be expanded beyond the first term ${q}^{2}$, and therefore for $r>q$ $\mathrm{composites}\left(r:...\right)$ will never be called.

Only $O\left(n\right)$ multiplications are used (one for each composite), so the cost is the construction and merging of the partial lists of composite numbers. The total runtime complexity of the algorithm is equal to the total number of terms expanded out of the following lists:

1. $\mathrm{composites}\phantom{\rule{thinmathspace}{0ex}}\left(p:\mathrm{ps}\right)$ for each prime $p$.
2. Each $\mathrm{merge}\mathrm{ps}\mathrm{cs}$.
3. Each $\mathrm{map}\phantom{\rule{thinmathspace}{0ex}}\left(p*\right)\phantom{\rule{thinmathspace}{0ex}}\left(\mathrm{merge}\mathrm{ps}\mathrm{cs}\right)$.

Say we evaluate all primes less than some integer $n$. The cost of (2) and (3) are at most a constant factor greater than the corresponding (1), so we need to compute the number of terms computed from each $\mathrm{composites}\phantom{\rule{thinmathspace}{0ex}}\left(p:...\right)$. At most one number $>n$ will appear in each, so we can restrict considering to numbers $. Since a given composite $k appears in $\mathrm{composites}\phantom{\rule{thinmathspace}{0ex}}\left(p:...\right)$ iff all primes dividing $k$ are $\ge p$, the total cost due to $k$ is roughly $\pi \left(\mathrm{lpf}\left(k\right)\right)$ where $\mathrm{lpf}\left(k\right)$ is the least prime factor of $k$.

Thus the complexity of the algorithm is given by the partial sums of $\pi \left(\mathrm{lpf}\left(\cdot \right)\right)$ over the composite numbers. That is,

$C\left(n\right)=\sum _{n>k\notin ℙ}\pi \left(\mathrm{lpf}\left(k\right)\right)$

I can’t find much information about the $\mathrm{lpf}$ function online, and I didn’t manage to estimate it analytically. Numerically, it looks like $C\left(n\right)$ is between $O\left(n{\mathrm{log}}^{2}n\right)$ and $O\left(n{\mathrm{log}}^{3}n\right)$.

So, not $O\left(n\right)$, but $\stackrel{˜}{O}\left(n\right)$ is pretty good, and it’s cool that you can do it with lists alone.

Update: my complexity estimate is wrong. Here’s the correct analysis.