Near optimal holdem play with one round of betting

Mon, Dec 26, 2011 code, economics, games, poker, duality, linear program, Nash equilibria, poker

Consider the following simplified version of Texas holdem, with two players Alice and Bob:

Alice and Bob are each dealt two private cards.
Alice posts a small blind of 1, Bob posts a big blind of 2.
Alice either folds, calls, or raises by any amount $\ge 2$.
Bob either calls or folds.
Five more shared cards are dealt, and the winner is determined as usual.
Both Alice and Bob have infinite stack sizes, so only expected value matters.

Here’s Alice’s and Bob’s near optimal strategies if Alice only raises integers $\le 40$. It turns out Alice can leverage her small blind, better advantage to gain an average of at least 0.39776:

one-round-strategy.txt

For Alice, the strategy is a linear combination of actions (‘f’ for fold or ‘b’ for raise by n) for each hand. For Bob, the strategy is the probability of calling with each hand and each bet by Alice. These results are definitely optimal for integer bets $\le 40$ to within an expected equity of 1e-4, which I can say with reasonable confidence since it’s much easier to check Nash equilibrium computations then it is to perform them. It’s also unlikely that Alice could gain all that much more by varying her bet further; she rarely uses the full range of betting options (raising by 40 only with AA and even then only 0.1% of the time, which is likely a numerical error), and I doubt that using fractional bets would dramatically increase her equity.

A few interesting aspects of the results:

Alice never folds. This is clearly an artifact of Bob’s inability to reraise.
We have crazy bluffing! With 63s Alice raises by 15 70% of the time! 63s is slightly worse equity-wise than 85o, but apparently the flush possibilities the bluff worthwhile.
Alice’s strategy is extremely nonsmooth, presumably to hide information. For example, with 87s Alice’s most likely raises are 13 (40% of the time), 16 (10% of the time), and 22 (29% of the time), with small probabilities in between and on either side. In other words, remembering anything like the optimal strategy looks pretty hopeless (but let me know if you think otherwise!).

Overall, Alice tends to either flat call (around 43% of the time), raise a moderate amount between 4 and 7 (trailing off up to 16), or go all out with a raise of 20 or 21. The plot follows. Alice almost never raises by 2 or 3 or more than 23.

It turns out to be possible to compute the optimal strategy for both players by solving a single (though fairly large) linear program. The case of a simple zero sum game with outcomes given by a payoff matrix is described on Wikipedia; the key is to let one of the players (Alice) rescale their probability vector so that their “expected” outcome ignoring the rescale is at least 1, then minimize the amount of scaling necessary (the sum of the elements of the vector). A given strategy of Alice’s has utility at least 1 if the utility is at least one for each of Bob’s pure strategies, which gives one linear constraint for each of Bob’s strategies.

In our simplified poker game, the strategies for Alice and Bob can be expressed as a huge Cartesian product over the situations in which they have to decide. Alice has a choice of bet for each hand she receives, and Bob has a choice of call/fold for each hand and value of Alice’s bet. The rescaling trick still works in this more general context; we simply let Alice’s probability vector in each situation scale and constrain them to all have the same sum to provide a single global scale. The constraint of “utility at least 1” is more complicated, though, since the total number of Bob’s pure strategies is exponential due to the Cartesian product. To avoid the exponential blowup, we formulate Bob’s response to Alice as its own nested linear program, which gives us a problem of the form

$$\min_x c^T x \quad s.t. \quad Ax=b, \forall y . C y = d \implies x^T G y \ge 1$$

Here $x$ and $y$ are Alice’s and Bob’s strategies, $Ax=b$ and $Cy = d$ enforce that probabilities sum to a fixed scaled amount or one, respectively, and $x^T G y$ is Alice’s payoff for a given pair of strategies. See the code for more details if you’re curious; it’s fairly well commented.

Now we get to the cool part: we can apply linear programming duality to turn the universal quantification in the above problem into an existential inequality, which turns the whole thing into a standard linear program. Thanks to Tomas Tinoco De Rubira for this idea, after I naively asked if it was possible to express nested linear programs in his general convex programming package cvxpy. Flipping back and forth between universal and existential quantification is the whole point of convex duality: you can prove that a linear program can’t possibly have value greater than $v$ by exhibiting a single feasible solution to the dual program. Strong duality has to hold to get an exact match, which we have in our case since the subprograms are always feasible. Again, I’ll leave all the details to the comments in the code.

The unfortunate part is lack of sparsity: the payoff matrix $G$ in the above convex program (which remains after we dualize into a linear program) has $O(n^2 m)$ nonzeros, where $n = 169$ is the number of holdem hands and $m$ is the number of Alice’s possible bets. It’s possible to store the relevant information in only $O(n^2 + m)$ space, but impossible to compute a matrix vector multiply with it faster than $O(n^2 m)$ without using asymptotically faster (meaning slow) matrix multiplication. If I write a bunch of structure-aware code it’s possible to take advantage of structure to solve the resulting KKT systems required for linear programming in low constant $O(n^2 m)$ time and $O(n^2 + m)$ space as well, but write now I’m expanding the whole thing out into an explicit sparse matrix and passing it to cvxopt, which causes a bunch of extra blowup via sparse factorization. Bad.

Thus, without (fun but time consuming) structure awareness, my laptop is unable to scale beyond 40 bet options for Alice, and more importantly to the more interesting case of Bob retaliating with another round of betting. However, the current structure wouldn’t suffer any additional cost for two rounds of betting each with $\lfloor \sqrt{40} \rfloor = 6$ rounds of betting, so that’s something to try next.