This page contains a few theory/algorithm questions, some of which have answers.
If you have firefox or another MathML capable browser and you are not already viewing it, here is a much prettier version of this page generated with itex2MML.
The intersection of a regular language and a context free language is always context free. The question is whether the converse of this is true: if $L$ is a language such that the intersection of $L$ with every CFL is a CFL, is $L$ necessarily regular?
I believed this is true, but did not know how to prove it. $L$ must be a CFL since it is the intersection of itself and the language of all strings. Intuitively, a language which is a nonregular CFL must require some particular use of the stack for correct recognition, and two stacks cannot be simulated with one stack except in very special cases...
While writing the above paragraph, I just figured out one of the special cases: $L = \{a^n b^n | n \ge 0\}$. All intersections of $L$ and a CFL are equivalent to intersections of $L$ and regular languages, since PDAs are equivalent to finite automata when applied to alphabets with only one element. So the answer is no.
Theorem. Let $L$ be a language recognized by a Turing machine which always halts in time linear in the input size. Then $L$ is regular.
Proof:
Let $M$ be a TM, $L = L(M)$, $T(w)$ the time to halt on input $w$, and assume $T(w) \le c\left|w\right|$ for some integer $c$. Here we count only the number of
tape moves required, so recognizing or not recognizing the empty string takes zero time. We will construct an NFA $N$ which simulates the
action of $M$ on any substring where $M$ visits each cell at most $k$ times for some integer $k$ which will be specified later. The states of $N$ will
include $k$ copies of the states of $M$, corresponding to the state of the Turing machine all the times it visited the current cell, and $k$ bits
describing which direction the machine moved to or from the left each time. At each moment, $N$ nondeterministically guesses the values of the tape
during each visitation, checks that all transitions to and from the left cell over the history are consistent, and predicts the transitions to
the cell to the right. Some of the state copies may be the halt state if $N$ guesses that $M$ will visit a cell fewer than $k$ times.
Let q(M) be the number of states of $M$, so that $q = q(N) = 2^k q(M)^k$ is the number of states of $N$. Given an input string $w$, we can label each position in w visited at most $k$ times with the corresponding state of $N$. For any two positions with matching states, we can remove that substring of $w$ without affecting the execution history on the rest of the tape. By the pidgeon hole principle, any string $w$ containing at least $q$ positions visited at most $k$ times can be decomposed as $w = xaybz$, where $a$ and $b$ are single letters visited at most $k$ times with matching histories, and each letter in $x$ and $z$ is visited more than $k$ times except for up to $q$ positions total between $x$ and $z$. In the first case, $T(w) \ge (k+1)\left|w\right| - kq$, and in the second case we can remove $yb$ to get $w'=xaz$ with $T(w') \ge (k+1)\left|w'\right| - q - k$. In either case, any string $w$ can be reduced to a string $w'$ with $T(w') \ge (k+1)\left|w'\right| - q - k$ where $w$ and $w'$ have the same execution history at their left and right ends.
If the number of visits $M$ makes to a cell is bounded, we can exactly simulate $M$ with $N$ for some $k$. Otherwise, set $k = c$ and choose a string w = uav with some position a visited more than $2q+2c$ times. Applying the above reduction to $u$ and $v$, we get $w' = u'av'$ with
\[T(w') \ge (c+1) \left|u'\right| - q - c + 2q + 2c + (c+1) \left|v'\right| - q - c \ge (c+1)\left|w'\right| \]which is a contradiction. Thus $M$ can be simulated by a DFA, and $L$ is regular.
Say I have a large amount of information, $d$, and would like to give the information to someone else to store for me. I want to be able to compute a small hash value $h = H(d)$ that allows me to make an exponential number of challenge response queries proving the other party is still storing all the data (with large probability).
Note that if only a few challenge response iterations are required, it suffices to compute several cryptographic hashes of the data combined with a few random seed values, and store the hash values and seeds. If the hashed result of each hash was used as the next seed, and the hashes were used as challenges in reversed order, the cost of the storing the seeds is constant. A megabyte of this hash data would be sufficient to ask challenge questions every day for over 100 years if 160-bit SHA1 hashes were used, so this question is probably of purely academic interest.
Consider a set of points in the plane. We want to find two orthogonal lines which partition the set into four nearly equal
pieces (differing in size by at most 1). To avoid degeneracies, we assign points lying on the chosen lines to whichever side is convenient.
The following result can be found in
S. Roy and W. Steiger. Combinatorial and Algorithmic Implications of the Borsuk-Ulam
Theorem. In Fall Workshop on Computational Geometry, 2005:
Theorem. Let $P$ be a set of $n$ points in $\mathbb{R}^2$. Then an orthogonal equipartition of $P$ exists, and can be found in $O(n log n)$ time.
Proof:
Let $L$ be the set of oriented lines in the plane, and consider the function $f : S^1 \to L$ mapping an angle $\alpha$ to the line of angle $\alpha$
dividing $P$ into equal subsets. For any line $l$, let $P_L(l)$ and $P_R(l)$ be the sets of points to the left and right of $l$, and define
a function
The two lines $f(\alpha)$ and $f(\alpha + \pi/4)$ form an orthogonal equipartition iff $\left|g(\alpha)\right| \le 1$. Since $g(\alpha+\pi/4) = -g(\alpha)$ and $g$ has jumps of at most 2, such an $\alpha$ always exists by discrete continuity.
To find an orthogonal equipartition in $O(n log n)$ time, we perform a binary search on the set of angles formed by all pairs of points in the set. This takes $O(log n^2) = O(log n)$ iterations each of which costs $O(n)$ time to find the two median lines and evaluate $g$, for a total of $O(n log n)$ time.
The same paper also shows that any algorithm which operates by testing whether there exists orthgonal equipartitions through specific points must take at least $\Omega(n log n)$ time. However, this result does not apply to all algorithms, or even to the algorithm given in the above proof. In fact, the expensive part of the above algorithm, computing values of $g$, seems to involve a large amount of redundant work. In order to speed up this algorithm, we require the following conjecture, which is hopefully true.
Conjecture. Let $P$ be a set of $n$ points in $\mathbb{R}^2$ in general position, and $L$ the set of $n(n-1)/2$ lines between the points. Then for sufficiently small $r$, say $r = O(log n)$, the arrangement formed by choosing $O(r log r)$ random lines from $L$ has at most $n/r^2$ points of $P$ in each face of the arrangement, with probability at least $1/2$.
This is plausible since if we divide the partition the set of points into subsets of size $O(n/r^2)$, the random set of lines will hit each one of these subsets several times with probability at least $1/2$ (with appropriately chosen constants).
Assuming this conjecture is true, we can speed up the previous algorithm as follows. Set $r = log n$, $s = O(r log r)$, and choose an arrangement $A$ of $m$ lines between points of $P$ such that each face of A contains at most $n/r^2$ points in $P$. Guessing such a set of lines and computing the arrangement $A$ takes $O(m^2) = O(\polylog n)$ time, and locating each point of $P$ in the arrangement costs $O(n log m) = O(n log log n)$. This process can be repeated until the desired bound is achieved. We also compute and store the number of points in $P$ contained in each face.
By the zone theorem, every line touches at most $O(m)$ faces of $A$, containing at most $O(m n/r^2) = O(n log log n/log n)$ points of $P$. Thus, given any line, we can compute the number of points in $P$ above the line to a guaranteed accuracy of $O(n log log n/log n)$ in $O(m^2)$ time. Moreover, given a fixed angle, performing this approximate count for the line of that angle through every vertex in the arrangement takes $O(m^4) = O(\polylog n)$ time and gives us two lines bracketing the median line with at most $O(n log log n/log n)$ points between them. The exact median line can then be found using kth element selection, for a total cost of $O(n log log n/log n)$. Counting the number of points in a quadrant formed by two lines (i.e., computing $g(\alpha)$) has the same cost, so each step of the binary search in the previous algorithm takes time $O(n log log n/log n)$, for a total cost of $O(n log log n)$.
Thus, if the conjecture is true, an orthogonal equipartition can be found in $O(n log log n)$ time using a randomized algorithm.
Assume stars of radius $r$ are distributed throughout $\mathbb{R}^3$ with independent uniform density $\rho$. I.e., the expected number of star centers within a set $U$ is $\rho \left|U\right|$, and the distributions of stars in disjoint sets are independent.
Let $d_n = k n log n$, with the constant $k$ to be chosen later, and consider the spherical shell A centered at the origin with radii between $d_{n-1}$ and $d_n$. The set of stars within this shell will be opaque if their projection onto to the unit sphere centered at the origin is the entire sphere. With probability $\Omega(1)$, the number of stars in the shell is
\[\begin{aligned} c =& \rho \left|A\right| = \Omega(\rho (d_n^3 - d_{n-1}^3)) = \Omega(\rho (d_n-d_{n-1}) d_{n-1}^2) \\ =& \Omega(\rho (d_n - d_{n-1}) d_n^2) \end{aligned}\]The projection of each star has area $\Omega(r^2 / d_n^2)$ on the sphere, and we can pick $m = O(d_n^2 / r^2)$ points on the sphere s.t. the projection covers the sphere if there is a star center within $r/{2d_n}$ of each point. Since the probability of containing any such point is $\Omega(r^2 / d_n^2)$, the expected number of randomly placed stars required to cover all such points is $O(m log m)$. Thus with probability $\Omega(1)$, the sphere will be covered and the shell will be opaque if there are at least $O(m log m)$ stars in the shell. This requires
\[\begin{aligned} O(m log m) \lt& \Omega(c) \\ O(d_n^2/r^2 log (d_n/r)) \lt& \Omega(\rho (d_n-d_{n-1}) d_n^2) \\ O(r^{-2} log d_n) \lt& \Omega(\rho (d_n-d_{n-1})) \end{aligned}\]Plugging in $d_n = k n log n$ gives
\[\begin{aligned} O(r^{-2} log n) \lt& \Omega(k \rho log n) \\ k \lt& O(\rho / r^2) \end{aligned}\]A sufficiently large $k$ ensures that the probability that each such shell is opaque is $\Omega(1)$ (i.e., is bounded away from zero), so the probability that some shell is opaque is 1.
The assumptions on the distributions can probably be weakened to the assumption that the correlation between sets goes to zero as the separation between them increases (for example, this is true of a Poisson disk distribution where stars are not allowed to overlap), since projecting onto a sphere reduces this correlation as much as desired for sufficiently large log n.
Thus, in a static homogenous universe, you can only see finitely many stars.