An even better strategy would be to weight your guesses towards the most informa...

Natsu · on March 29, 2012

Doesn't this all assume that the opponent is choosing a random word? If they're trying to be evil, I'm sure they can pick words where each letter you get gives little information.

For example, which letter do you guess if the puzzle is _og? You still have to go through bcdfhjln even after you've burned however many guesses getting that o & g.

ars · on March 30, 2012

I once wrong a hangman game that did this in reverse. The computer made sure to make you be as wrong as possible, while still ensuring your guesses were "correct".

It had one problem in that it always tried to make sure you guessed wrong, but by carefully choosing your letters you could cause it to exclude tons of words because they contained the letter you guessed, leaving only a small number of possible words.

It needed a refinement to occasionally allow you to guess correctly if that maximized the number of possible words remaining that still fit the prior guesses.

dlss · on March 29, 2012

This is a good insight. To formalize it: each question segments the words into 2^max_word_length groups of various sizes, and we seek to minimize the worst case number of questions still needed, regardless of which answer we get back.

Which, for the ispell words list, appears to be 'e' :( edit: which was the solution found for the suboptimal method in the article, and which is not present for words in the corpus 25% of the time

code at http://pastie.org/3693750

kd5bjo · on March 29, 2012

Not quite; you don't want to minimize the worst case, but the expected case. That comes from maximizing the entropy of the result set ( https://en.wikipedia.org/wiki/Entropy_(information_theory) ).

To calculate this, take the sum of -p * log(p) for all patterns of the guessed letter, where p is the proportion of words in the candidate set that have this pattern.

eru · on March 29, 2012

> Not quite; you don't want to minimize the worst case, but the expected case.

That depends on your opponent. In game theory you often assume that your opponent plays against you and as good as they can, so you'd go with the worst case.

kd5bjo · on March 29, 2012

That's only true if the worst case still lets you win the game. Otherwise, your opponent will pick the worst case word every time and you lose. As a zero-sum game, if the players agree on a dictionary beforehand, there is some Nash Equilibrium; I don't have the skill to figure out what it would be, but there would have to be some nonzero probability of guessing every word in the dictionary before running out of guesses.

dlss · on March 30, 2012

Interesting. Thanks for your comment here and the one above. Although I firmly believe that hangman is a game, and hence the assumptions of game theory are in full effect, I had not considered the information theory take on things :)

Since you asked about the Nash equilibrium for this case, here it is:

In the event that you did not have enough questions to uniquely identify the possible words (and assuming that words with more letters than guesses are not allowed ie every word can be guessed with the right questions), the game becomes a matching game [http://en.wikipedia.org/wiki/Matching_pennies].

First both players will work out the minimum number of question paths that can identify every allowed word. Any word in more than one set will be ignored as a dominated strategy by the hosting player. Then the guessing player will pick randomly between question paths, and the hosting player will pick randomly among the disjoint sets of words at the end of those question paths, with the specific word not mattering.

You can see that once both players are picking between several options, with the entire game riding on their choice, the actual letter guessing becomes unimportant. The game itself resembles an unfair rock paper scissors.

If the players were betting $1 per round, the expected value to the guesser is ($2 divided by the number of disjoint sets) - $1.

kd5bjo · on March 30, 2012

How does this work if there are multiple covering sets with the same (minimal) number of question paths? Does it matter which minimal covering set you evaluate, or does your strategy need to be some kind of superposition derived from all the minimal sets?

Also, if there are some words that are never picked by your strategy, doesn't that open the door for an alternative set of question paths that only covers the now-reduced dictionary? (cue reference to the Unexpected Hanging Paradox)

In many ways, the entropy is a heuristic for finding a good question path given the known probabilities for each word in the dictionary. The formula that I quoted above can be trivially extended for nonuniform distributions of the chosen words. It sounds like the Nash equilibrium would be designed such that (after eliminating dominated strategies), the expected information gain at each step would be the same for all question paths. I wouldn't want to have to prove it, though.

dlss · on March 30, 2012

I buy that information gain is the same as the optimal strategy, so long as you are taking into account your opponent's optimal strategy with a weighted probability distribution.

The unexpected hanging paradox is a great link, thanks :) awesomely enough though, game theory handles this problem! This is the "why" behind minimax. If John Nash were sentenced by that judge, Nash would decide what day to expect by flipping a coin, and the judge would segfault :p

To answer your question though, if there are multiple paths to the same words (eg Tree(1) leads to Union(a,b), Tree(2) leads to Union(b,c), and Tree(3) leads to Union(a,c)), that surprisingly doesn't make the solution any harder... though my links above aren't particularly useful for it.

To find the optimal strategy for the player choosing the word: for every possible set of words (so all 2^words-1 options), calculate what the optimal strategy would be for your opponent if they were playing against someone picking randomly from that group, and then choose the group causing the lowest expected value for your opponent.

Make sense? [edit: NO! I didn't show that optimal would necessarily come from a uniform random selection from the word bag... and a proof doesn't jump to mind, though my intuition tells me so]

[edit 2: YES! Just assume that the strategy doesn't need to be uniform [though I still maintain it will be], and use gradient descent to find the optimal weighting]

You seem like the sort of person who would LOVE reading about the computational roshambo competition run out of UBC a while back (specifically the iocaine powder overview at http://webdocs.cs.ualberta.ca/~darse/rsb-results1.html). You might also like http://wiki.mafiascum.net/index.php?title=WIFOM

Also, if you haven't been corrupted by LessWrong yet I highly recommend it :p

jules · on March 30, 2012

That's not even close to giving a practical algorithm because (1) the set of words is huge, the powerset of words is huger (2) you didn't describe how to find the optimal counter strategy. And actually I don't think it's necessary to consider any subset of words.

The usual way to find a Nash equilibrium is with linear programming. But for that you have to construct a matrix of size #(deterministic strategies for player 1) * #(deterministic strategies for player 2). Player 1 does not have so many stategies (just the number of words in the dictionary) but player 2 has an enormous number of strategies.

So something else is needed. We can try to write an algorithm for finding expected value of the the optimal counter strategy to when player 1 is playing a mixed strategy, let's call it ev(s) where `s` is a game state. Then we would have ev(s) = max_{s' in successors of s} ev(s'). Perhaps this can be done using dynamic programming, but it could lead to far too many states being evaluated. If you could give an approximated upper bound then perhaps some of the paths could be discarded early. Anyway then the Nash equilibrium is to minimize this ev over all mixed stategies of player 1, i.e. all probability distributions over the words. How one would do that is another tricky question.

One positive news is that we can consider each word length separately, as the final Nash equilibrium would always choose a word of the same length for player 1, i.e. the optimal strategy for hangman would always choose a word of size n (but we don't know n yet, and n will depend on the dictionary).

dlss · on March 31, 2012

Sorry about that. I got too caught up in showing that there was in fact an optimal strategy, and there were no hangman paradoxes. In response to (2), that was done in the GGP -- you are selecting the question that for all words gives you the least distance from the answer (and if that doesn't cover every answer, then the minimal number of questions to do so). I posted ruby code in the GGP for a heuristic version, the non heuristic version would look similar.

What follows is my solution to this game after having a night to sleep on this. It's O(26^26 times words) [because it requires you construct the optimal move for the guesser], so perhaps only of interest to myself.

Problem restatement: Imagine instead that we are playing a game where, for doors 1 through m, we must pick a door and hide some prize behind that door. Our opponent must then choose from a list of moves that open ranges of doors.

To solve that problem, it is relatively easy. First form the set of transitive closures on moves, such that any move that shares a door with another move is in the same closure. Next, for each closure, mark every door by the number of moves that can reach it, and progressively remove the highest marked number until doing so would then leave no doors. If at any point this process led to those moves no longer sharing a closure, reform the closures, add them to the list, and process them again in this manner.

Observe that for any closure remaining, for every pair of doors (a,b) reachable by the same moves, we can discard the second without changing the optimal strategy (because any move the guesser made would be the same to us for both a and b).

Now observe that for any closure remaining, if the dictionary only included words reachable from that closure, picking with a uniform randomness among the doors not removed by counting or pairing would be optimal (as the same number of moves can reach any remaining door). To prove that moves with higher marks would only decrease our EV, simply observe that if we did include any door with more marks, then those moves could be favored by the guesser to increase the guessers chance of winning.

To complete the solution, we work out the the expected value of playing each closure independently, and select between them by the ratio of their inverse EV with the whole. (So if we had two closures, one with EV -1 [if we could only choose from words behind this door we would always loose], and the other with -4/5 [all doors can can be reached by 4 out of the 5 moves in this closure], we would pick between the two with a ratio of 4:5).

For a proof that the above mixing is optimal, see the analysis for the game "Bluff".

Anyone who understood all that, lives in San Fransisco, Santa Barbara, or San Diego, and would like to meet up to discuss economics, game theory, algorithms, machine learning, computer science, or world domination over coffee is cordially invited to email me at dls@lithp.org (yes, that's "lisp with a lisp", perhaps puns should have been on that topic list as well :p)

Too · on March 30, 2012

That would be trading risk for gain.

Remember, if you pick a correct letter you will not loose any guesses, in other words you get another guess if your guess is right. It's not about finding the word in a fixed number of guesses.

But otherwise you could also extend your method to increase the weight of letters appearing more than once in a word since that gives even more information gain.

tsewlliw · on March 30, 2012

I won't call out the company because at least until now they were still using the question (I got accidentally cc'ed on a subsequent candidates email), but last summer I got an interview homework question from a "who is hiring" post that was to solve hangman for a given dictionary. I implemented all of the strategies outlined here, and while they seemed impressed with the execution speed of my solution, they implied it could have scored better, saying it was only "average" in that regard.

The thing is, at least for the dictionaries I had available, trying to weight for information gain only considering the guess correctness wasn't all that great a strategy. Even if you know 100% certain that the letter will be correct, its very likely to still cut the search space down by virtue of the pattern of letters, and in aggregate my strategy made no statistical difference. (+/- 1% depending on random seed).

If I were going to try and improve my scores, I guess the next thing would be look at the distribution of letter-patterns for each remaining letter, but my intuition is that it just wouldn't be a big deal.