Archive for January, 2009

Sieving Primes 2

Saturday, January 10th, 2009

Success! I found a reference to Mertens’ theorem in a paper by Adam Kalai, which is exactly what I needed to analyze the complexity of the prime sieve from the previous post.

Alas, my previous complexity guess was wrong. The actual complexity is O(n 3 /2 /log 3 n). Apparently my numerical estimates didn’t go out far enough (also I was biased towards finding O(nlog mn)).

From before, the cost of the algorithm is O(C(n)) where C(n)= c<nπ(lpf(c)) where c<n means all composite numbers up to n. Note that any number with lpf(c)=p can be written as pm, where m is not divisible by any prime smaller than p. By Mertens’ theorem, the density of numbers not divisible by primes less than p is q<p(1 1 q) e γlogp We can now estimate C(n) as C(n)= c<nπ(lpf(c)) = p<nπ(p) k=p n/p1 ifq<p,qk p<nπ(p)(npp) q<p(1 1 q) e γ p<nπ(p)(npp)1 logp e γ 2 nπ(x)(nxx)1 logxdxlogx e γ 2 nxlog 3 x(nxx) e γ 2 nnx 2 log 3 x Since the integrand is O(n/log 3 n) for x>n 1 /3 , we get C(n)=O(n 3 /2 /log 3 n). On the other hand, the integrand is ω(n/log 3 n) for x<n/4 , so in fact C(n)=Θ(n 3 /2 /log 3 n).

This adds another entry to the set of listful Sieve of Eratostheness-like algorithms that turn out to cost as much as trial division (up to a polylog factor). It’d be cool if a faster list based algorithm existed, but I expect that the answer is probably no.

Rmvng ll vwls

Wednesday, January 7th, 2009

whl g w wr dscssng th vrs trnsfrmtns y cn pply t txt wtht mkng t nrdbl.  Fr xmpl,  sm t rcll xmpls shwng tht txt wth ll bt th frst nd lst lttr f ch wrd rrrngd s stll rdbl.  Smn ls rclld sng rdbl xmpl wth ll vwls rmvd, bt thght t mght hv bn spclly cnstrctd fr tht prps.  Thrfr,  md  nt t try strppng th vwls t f  lrg blck f txt.  Ths s tht txt.

Cnclsn: nt vry rdbl t ll.

Sieving Primes

Saturday, January 3rd, 2009

Here’s an interesting lazy variant of the sieve of Eratosthenes by Nick Chapman, from a discussion on LTU:

    primes = 2 : diff [3..] (composites primes)

    composites (p:ps) = cs
        where cs = (p*p) : merge (map (p*) (merge ps cs)) (composites ps)

    merge (x:xs) (y:ys) | x<y           = x : merge xs (y:ys)
                        | otherwise     = y : merge (x:xs) ys

    diff (x:xs) (y:ys)  | x==y          = diff xs ys
                        | x<y           = x : diff xs (y:ys)
                        | x>y           = diff (x:xs) ys

In words, the algorithm computes the list of primes as all numbers which are not composites, and the computes the list of composites by extremely constructing and merging all prime factorizations.

As the discussion shows, this is also a great example of why laziness hasn’t caught on: no one could figure out how fast it was.

To see that the algorithm computes each prime p after finite time, note that all recursive calls are “upwards” towards higher prime factors. If q 2 >p, composites(q:...) won’t be expanded beyond the first term q 2 , and therefore for r>q composites(r:...) will never be called.

Only O(n) multiplications are used (one for each composite), so the cost is the construction and merging of the partial lists of composite numbers. The total runtime complexity of the algorithm is equal to the total number of terms expanded out of the following lists:

  1. composites(p:ps) for each prime p.
  2. Each mergepscs.
  3. Each map(p*)(mergepscs).

Say we evaluate all primes less than some integer n. The cost of (2) and (3) are at most a constant factor greater than the corresponding (1), so we need to compute the number of terms computed from each composites(p:...). At most one number >n will appear in each, so we can restrict considering to numbers <n. Since a given composite k<n appears in composites(p:...) iff all primes dividing k are p, the total cost due to k is roughly π(lpf(k)) where lpf(k) is the least prime factor of k.

Thus the complexity of the algorithm is given by the partial sums of π(lpf()) over the composite numbers. That is,

C(n)= n>kπ(lpf(k))

I can’t find much information about the lpf function online, and I didn’t manage to estimate it analytically. Numerically, it looks like C(n) is between O(nlog 2 n) and O(nlog 3 n).

So, not O(n), but O˜(n) is pretty good, and it’s cool that you can do it with lists alone.

Update: my complexity estimate is wrong. Here’s the correct analysis.