Archive for the ‘poker’ Category

Near optimal holdem play with one round of betting

Sunday, December 25th, 2011

Consider the following simplified version of Texas holdem, with two players Alice and Bob:

  1. Alice and Bob are each dealt two private cards.

  2. Alice posts a small blind of 1, Bob posts a big blind of 2.

  3. Alice either folds, calls, or raises by any amount 2.

  4. Bob either calls or folds.

  5. Five more shared cards are dealt, and the winner is determined as usual.

  6. Both Alice and Bob have infinite stack sizes, so only expected value matters.

Here’s Alice’s and Bob’s near optimal strategies if Alice only raises integers 40.  It turns out Alice can leverage her small blind, better advantage to gain an average of at least 0.39776:

one-round-strategy.txt

For Alice, the strategy is a linear combination of actions (‘f’ for fold or ‘b<n>’ for raise by n) for each hand.  For Bob, the strategy is the probability of calling with each hand and each bet by Alice.  These results are definitely optimal for integer bets 40 to within an expected equity of 1e-4, which I can say with reasonable confidence since it’s much easier to check Nash equilibrium computations then it is to perform them.  It’s also unlikely that Alice could gain all that much more by varying her bet further; she rarely uses the full range of betting options (raising by 40 only with AA and even then only 0.1% of the time, which is likely a numerical error), and I doubt that using fractional bets would dramatically increase her equity.

A few interesting aspects of the results:

  1. Alice never folds.  This is clearly an artifact of Bob’s inability to reraise.

  2. We have crazy bluffing!  With 63s Alice raises by 15 70% of the time!  63s is slightly worse equity-wise than 85o, but apparently the flush possibilities the bluff worthwhile.

  3. Alice’s strategy is extremely nonsmooth, presumably to hide information.  For example, with 87s Alice’s most likely raises are 13 (40% of the time), 16 (10% of the time), and 22 (29% of the time), with small probabilities in between and on either side.  In other words, remembering anything like the optimal strategy looks pretty hopeless (but let me know if you think otherwise!).

Overall, Alice tends to either flat call (around 43% of the time), raise a moderate amount between 4 and 7 (trailing off up to 16), or go all out with a raise of 20 or 21.  The plot follows.  Alice almost never raises by 2 or 3 or more than 23.

It turns out to be possible to compute the optimal strategy for both players by solving a single (though fairly large) linear program.  The case of a simple zero sum game with outcomes given by a payoff matrix is described on Wikipedia; the key is to let one of the players (Alice) rescale their probability vector so that their “expected” outcome ignoring the rescale is at least 1, then minimize the amount of scaling necessary (the sum of the elements of the vector).  A given strategy of Alice’s has utility at least 1 if the utility is at least one for each of Bob’s pure strategies, which gives one linear constraint for each of Bob’s strategies.

In our simplified poker game, the strategies for Alice and Bob can be expressed as a huge Cartesian product over the situations in which they have to decide.  Alice has a choice of bet for each hand she receives, and Bob has a choice of call/fold for each hand and value of Alice’s bet.  The rescaling trick still works in this more general context; we simply let Alice’s probability vector in each situation scale and constrain them to all have the same sum to provide a single global scale.  The constraint of “utility at least 1″ is more complicated, though, since the total number of Bob’s pure strategies is exponential due to the Cartesian product.  To avoid the exponential blowup, we formulate Bob’s response to Alice as its own nested linear program, which gives us a problem of the form

min xc Txs.t.Ax=b,y.Cy=d&Rightarrow;x TGy1

Here x and y are Alice’s and Bob’s strategies, Ax=b and Cy=d enforce that probabilities sum to a fixed scaled amount or one, respectively, and x TGy is Alice’s payoff for a given pair of strategies.  See the code for more details if you’re curious; it’s fairly well commented.

Now we get to the cool part: we can apply linear programming duality to turn the universal quantification in the above problem into an existential inequality, which turns the whole thing into a standard linear program.  Thanks to Tomas Tinoco De Rubira for this idea, after I naively asked if it was possible to express nested linear programs in his general convex programming package cvxpy.  Flipping back and forth between universal and existential quantification is the whole point of convex duality: you can prove that a linear program can’t possibly have value greater than v by exhibiting a single feasible solution to the dual program.  Strong duality has to hold to get an exact match, which we have in our case since the subprograms are always feasible.  Again, I’ll leave all the details to the comments in the code.

The unfortunate part is lack of sparsity: the payoff matrix G in the above convex program (which remains after we dualize into a linear program) has O(n 2m) nonzeros, where n=169 is the number of holdem hands and m is the number of Alice’s possible bets.  It’s possible to store the relevant information in only O(n 2+m) space, but impossible to compute a matrix vector multiply with it faster than O(n 2m) without using asymptotically faster (meaning slow) matrix multiplication.  If I write a bunch of structure-aware code it’s possible to take advantage of structure to solve the resulting KKT systems required for linear programming in low constant O(n 2m) time and O(n 2+m) space as well, but write now I’m expanding the whole thing out into an explicit sparse matrix and passing it to cvxopt, which causes a bunch of extra blowup via sparse factorization.  Bad.

Thus, without (fun but time consuming) structure awareness, my laptop is unable to scale beyond 40 bet options for Alice, and more importantly to the more interesting case of Bob retaliating with another round of betting.  However, the current structure wouldn’t suffer any additional cost for two rounds of betting each with 40=6 rounds of betting, so that’s something to try next.

Exact poker win probabilities

Monday, December 5th, 2011

Brief summary in case you just want the data: here’s a table of exact win/loss/tie probabilities for every pair of two card preflop hands in Texas holdem:

exact.txt

Eugene d’Eon and I have been playing around computing Nash equilibria for extremely simplified versions of heads-up Texas holdem poker.  For those who don’t know the details, in Texas holdem each player is dealt two cards face down, followed by five cards face up which are shared between all players (with betting at various points in between these cards).  More details are available on Wikipedia.

To keep the problem manageable, we considered only “preflop” betting, after the private hands have been dealt but before the shared cards are known.  As in the previous post, consider two players Alice and Bob, where Bob posts a blind of 1.  Alice looks at her two cards, and decides whether to bet or fold.  If Alice bets, Bob then looks at his cards and decides in turn.

In order to compute the Nash equilibrium, we need to know the probability of Alice or Bob winning for each pair of possible poker hands (there are 169 unique two card hands).  Initially, we were using the table from pokerstove.com for this purpose.  However, their data only has six significant digits, which, frankly, is not cool.  If we’re going to fritter away time analyzing simplified poker games, we might as well do an excellent job while we’re at it.

Therefore, I set out to compute the exact answers.  There are only a finite number of possible hands, after all, so each probability is a rational number.  And the number of hands isn’t too large: there are 48 cards left out of which we choose 5, so for each set of private cards we need to consider

(485)=1712304

possible sets of shared cards.  We also need to consider various ways the suits of the preflop cards can overlap.  For example, if you have a king and the opponent has an ace, it’s better to have different suits so that you win if your king hits a flush.  (This doesn’t matter too much in practice, but is everything if you want the exact numbers.)  Taking preflop suits into account adds a factor of between 1 and 7 depending on the pair of hands (1 for AKs vs. KQs, 7 for AKo vs. QJo).  There are (1692)=14196 pairs of hands, for a total of 20 to 170 billion games to loop over (the exact number turns out to be 87303531744 = 87.3 billion).  Assuming we can optimize comparison of two hands down to a few hundred cycles (say 400), this would require on the order of an hour on my 2.3 GHz laptop with four hyper-threaded cores.

That’s a while, so initially I thought I could massively speed up the computation by decoupling computation of flushes and straight flushes, the hands involving suit, from the remaining suit-agnostic hands.  An initial computation ignoring suits entirely (parallelized using OpenMP) managed all 14196 pairs of hands in only 1.5 seconds.  Unfortunately, when I went to add suit support, I realized that at the exact level suit isn’t decoupled from card value after all.  For example, the probability of a flush tie depends on the value of each players’ cards since it determines whether their cards will be higher than the five suited cards that appear.  Dealing with such intricacies with decoupled value and suit is possible, but would end up dramatically increasing the complexity of the code, raising the chance of a bug and lack of our treasured exactness.  Therefore, I gave up on decoupling and went back to full suit aware traversal.

52 cards cry out for packing into 64-bit integers as a bit set, providing a fast way to represent and compute with poker hands.  For example, straights can be detected by shifting our bit set four times and or’ing to get the set of values ignoring suit, then shifting 5 times and and’ing to check if we have 5 consecutive cards.  In order to decouple the evaluation of Alice and Bob, the core routine takes one player’s 7 total cards as input and produces an integer score representing the value of the hand, taking into account the relevant number of kickers for each.  This requires lexicographic comparison of the type of hand first, followed by one or more different aspects depending on the type of hand.  For examples, full houses are compared first by triple then by pair, and two pair is compared first by pair value and then by the value of the kicker.

Parallelized with OpenMP to take advantage of all available CPU cores, the full computation of exact win probabilities for all hands (“exact.txt”) took about 45 minutes on my laptop.

My initial code stored the bit vector in value-suit major order (four aces followed by four kings and so on down to twos). Unfortunately, value-major order prevented certain optimizations, such as detecting flushes in all suits using a simultaneous parallel prefix circuit instead of 4 separate ones, and also forced the use of 128 bits for the score value computed for each hand since the score needed two blocks of 52 bits each to represent two levels of lexicographic comparison (trips followed by pairs or pairs followed by kickers).  Switching to suit-major order (4 blocks of 13 bits), fixed both these problems, allowing the use of 32 bit ints for most of the score computation and the final result, and reduced the time required for exact.txt to 42.5 minutes.

Next up, GPUs.  Translating the score routine into naive OpenCL, I got only mild speedups of about 17% compared with 8 CPU cores.  Replacing all branching with OpenCL comparison, select, and max intrinsics reduced this by a further 21%.  Replacing scalar computation with four wide vector operations (easy given branch free code) gave another 17%.  exact.txt now took 16 minutes to compute (these numbers don’t quite add up, so my git logs are missing something).

The minor annoyance of the OpenCL translation was choosing how to do the reduction (summing up the number of wins and losses over all hands).  I eventually settled on the simple two phase scheme of reducing blocks of 256 entries down to 1 on the GPU, then completing the reduction on the CPU.  The major annoyance is the complete lack of OpenCL profiling tools for ATI chips on Mac, which means it’s highly likely further optimization possibilities exist.  One additional weirdness: if I do a bunch of OpenCL kernel invocations while simultaneously printing only a few hundred total lines to standard out, I get a 30% performance hit.  This is likely due to context switching on the GPU, but I have no evidence for such.

To shorten the rest of the story, remaining optimizations included memoizing over the various combinations of preflop suits, avoiding an unnecessary extra kernel invocation to deal with the fact that 1712304 isn’t a multiple of 256 (small kernel invocations turn out to be quite expensive), and shaving cycles off the hand score routine down to a total of about 174 operations per 7 card hand (ignoring vectorization).  The final bit twiddling, branch free, vectorized poker hand evaluation routine is here.  The full git repository is

https://github.com/girving/poker

5 minutes on my laptop to evaluate 175 billion 7 card hards means roughly 1/2 billion hand evaluations per second.  At roughly 175 operations per hand, our utilization is under 100 Gops.  AMD advertises 480-696 single precision Gflops for my card, so we’re significantly under peak.  Many of our operations are 64 bit, and they’re all integers, but this still hints at possible further speedup.  Nvidia seems to have way better profiling tools, so I may see if I can get those running on some of Amazon’s EC2 GPU instances.

Throughout the process of rewriting and optimizing the code, exact.txt never changed, which provides some confidence that it’s correct.  exact.txt also agrees with the pokerstove results to within their sadly lacking finite precision.

As an example of fun things we can say armed with exact win probabilities, Alice’s optimal equity in the simplified game mentioned above with a fixed bet level of 2 is exactly

630077851725712719561173829500.231684

Good to know.

Minimal continuous poker

Sunday, November 13th, 2011

Consider the following poker-like game, played with two players: Alice and Bob.  Bob posts a blind of 1.  Both players are dealt a single, continuous hand chosen uniformly at random from [0,1].  Alice can fold, call, or raise any amount b>0 (calling means b=0).  Bob either calls or folds.

My original plan was to work out the Nash equilibrium for this game, and therefore derive interesting smooth curves describing the optimal way for Alice to bluff and generally obscure her hand.  It didn’t quite work out that way, but the result is still interesting.

Let A be the optimal expected payoff for Alice, A(x) the optimal expected payoff given Alice’s specific hand x.  Clearly A(x) is a weakly increasing function of x; if x’ > x but A(x’) < A(x), Alice can improve A(x’) to A(x) by pretending she has hand x and using the same strategy.  If A(x) > 0 for some x, so that Alice’s optimal strategy is better than folding, then Alice will never fold since any nonzero probability of folding would reduce her utility.  Thus there is a threshold hand x t below which Alice always folds (technically, always “might as well” fold, but we’ll gloss over this detail for the moment), and above which Alice always calls but might vary the amount she raises by.  A similar argument applied to Bob gives a threshold function y t(b) s.t. Bob folds if y<y t(b) and calls if y>y t(b).

These and other similar preliminaries aside, I spent a while banging my head against the wall before temporarily giving up and picking a simpler game.  Specifically, make the bet b a fixed constant.  If b=0, Bob always checks, and Alice’s optimal strategy is (unsurprisingly) to call if x>1/2.  This gives a payoff of A=1/4.  If b>0, Bob’s function y t(b) is a constant. The values of x t, y t, and A at the Nash equilibrium turn out to be

x t= 2+2b+b 2(2+b) 2 y t= 1+b2+b A= 2+2b(2+b) 3

A few notes: for any b>0, x t<y t, since Bob knows Alice’s threshold and gains nothing by calling with hands only slightly over x t.  Interestingly, Alice’s payoff remains unchanged if she varies her threshold anywhere satisfying x t<y t; the particular value of x t is only prevents Bob from increasing his utility with a different value of y t.  However, the most important property of this result is that A is a decreasing function of b.  If the bet is fixed, Alice would prefer it to be zero so that A=1/4.

Now back the general game, where Alice chooses the bet.  We have A1/4, since Alice can choose to always call or fold.  Can she do better, varying the bet in some exotic fashion so as to milk more money out of Bob while keeping her hand disguised?  Well, no.  Update: After disproving a conjecture (see below), the answer is maybe.

Let’s see what happens if Bob follows the optimal strategy from the fixed b game, calling if y>(1+b)/(2+b) and folding otherwise. Since Bob’s strategy is fixed, Alice can maximize her utility without any extra randomness, and we need only consider bets which are a deterministic function of x.  For x<x t she folds and A(x)=0, for x>x t she bets b=b(x) with payoff

A(x)= 1Pr(y t>y)+(1+b)Pr(x>y>y t)(1+b)Pr(y>x,y t) = y t+(1+b)max(0,xy t)(1+b)(1max(x,y t)) = y t+(1+b)max(0,xy t)(1+b)+(1+b)x+(1+b)max(0,y tx) = y t+(1+b)(x1)+(1+b)|xy t|

Differentiating w.r.t. b,

A(x)= x1+y t+|xy t|+(1+b)sgn(y tx)y t y t= 12+b1+b(2+b) 2=1y t2+b

If x<y t, this becomes

A(x)= x1+y t+y tx+(1+b)y t = 1+(2+b)(1y t)/(2+b)+y t = 1+1y t+y t=0

so if x<y t, Alice’s bet doesn’t matter.  If x>y t, we have

A(x)= x1+y t+xy t(1+b)y t = 2x1by ty t = 2x1b(1y t)/(2+b)y t = 2x1b/(2+b)+b(1+b)/(2+b) 2(1+b)/(2+b) = 2x12b+b 2bb 2+2+3b+b 2(2+b) 2 = 2x12+4b+b 2(2+b) 2 = 2x2+2(2+b) 2

which is negative as b&rightarrow; and zero at b=1/(1x)2.

Thus A(x)=0 iff (2+b) 2=1/(1x), or

b=11x2

Unfortunately for Bob, A(x)>0 at b=0 iff 2x2+1/2>0 iff x>3/4.  Thus Alice can beat A=1/4 by increasing her bet whenever x>3/4.  Abandoning hand calculation and turning to Mathematica, it turns out Alice’s optimal strategy achieves A=7/24.29.

So, we haven’t found the Nash equilibrium yet, but at least our minimal poker game isn’t (necessarily) incredibly boring.  More analysis is required.  For pretty pictures’ sake, here’s a plot of Alice’s outcome as x and b vary:

Alice's expected outcome for hand x and bet b

Alice's expected outcome for hand x and bet b