I've been playing a lot of the card game Dominion recently and it has got me thinking about some statistics problems. The details don't matter, but in the game you always play with 10 out of the 25 possible action cards. These are chosen at random and so make it a different game every time you play.
When someone is first introduced to the game they will always be playing with 10 cards that they have never met before, but in their second game they will probably be playing with some cards that they met in the first. After relatively few games the new player will have seen most of the 25, but it usually takes quite a few games to see their last card. I spent the morning wondering about the average number of games until you have seen them all.
Even dealing with scaled down versions of the problem quickly leads to large probability trees, but after a bit of research I found out about a related problem called the Coupon Collector's Problem. In it a person is trying to collect k things from a collection of n things, but they come randomly. This has well worked out probabilities and for k = 25 and n = 25 it takes 96 cards, i.e. 10 games. As n grows the number of draws until you have them all is of the order nlog(n). This is actually really useful to know and it has lots of applications.
However it isn't quite the situation for Dominion since in every game you have 10 cards which are guaranteed to be distinct. This means that the 10 games suggested by the solution to the classic solution is only an upper bound. However a statistician named Wolfgang Stadje found an analytical solution to the problem in the 1990s and published a general result. He was considering buying packets of stickers for a collection where each packet was guaranteed to not contain duplicates. As someone who collects magic cards I think I'm going to make some use out of knowledge of this formula.
For the dominion case in particular, the probability that you get all 25 cards in k games is:
We want the value of k which gives at least 0.5 as a probability. Using Wolfram Alpha I evaluated this awful expression and the first value of k that brings it above that probability is 8. So 8 games are needed (and result in a 64% chance of having seen them all), which is indeed under the upper bound we layed out above. The larger the "packets of stickers" relative to the total number you are trying to collect, the further the actual the number of games you need is from the upper bound. To be 90% sure you have seen all of them you need 11 games and for 99% you need 16 games.