Copernican Principle
Copernican principle
Problem
Sampling \(m\) numbers \(X = \{x_1, \dots, x_m\}\)from a discrete uniform distribution \(U(1,N)\) where \(N\) is unknown. Let \(k\) be the largest number of the \(m\) samples. The problem is estimating \(N\).
Solution under the Copernican principle
Basic idea
To explain the basic idea, let \(m = 1\), then \(k = x_1\). Because \(P(n/4 < k < 3n/4) = 1/2\), we will get \(P(4k/3 < n < 4k) = 1/2\).
An another example, \(P(0 < k < n/2) = 1/2\), then \(P(n > 2k) = 1/2\).
Generalize the idea
The probability of all samples are not greater than \(kn/y\),
\[P(X \le kn/y) = P(k/y)^m = (k/y)^m = P(k \le kn/y)\]where \(k\) is the largest number in \(X\), and \(y\) is an arbitrary number greater than \(k\). Then,
\[P(N \ge y) = (k/y)^m\]Also, \(P(n = y) = P(n \ge y) - P(n \ge y + 1)\), then,
\[P(n = y | m, k) = (k/y)^m - (k/(y+1))^m, \quad y \ge k\] \[\begin{eqnarray} \mathrm{E}(n | m, k) &=& \sum_{i=k}^{\infty} i((\frac{k}{i})^m - (\frac{k}{i+1})^m) \\ &=& k^m\zeta(m, k) + k-1 \\ \mathrm{Var}(n | m, k) &=& \mathrm{E}(n^2 | m, k) - \mathrm{E}(n | m, k)^2 \\ &=& k^2 + 2k^m\zeta(m-1,k+1) - k^m\zeta(m, k+1) - \mathrm{E}(n | m, k)^2 \end{eqnarray}\]where
\[\zeta(s,a) = \sum_{k=0}^\infty \frac{1}{(k+a)^s}\]Examples of expectations and variances:
\[\begin{eqnarray} \mathrm{E}(n | 2, 100) \approx 199.502 &\quad& \mathrm{Var}(n | 2, 100) = \infty \\ \mathrm{E}(n | 3, 100) \approx 149.502 &\quad& \mathrm{Var}(n | 3, 100) \approx 7499.833 \\ \mathrm{E}(n | 6, 100) \approx 119.505 &\quad& \mathrm{Var}(n | 6, 100) \approx 599.883 \\ \mathrm{E}(n | 9, 100) \approx 112.007 &\quad& \mathrm{Var}(n | 9, 100) \approx 200.789 \\ \mathrm{E}(n | 100, 100) \approx 100.592 &\quad& \mathrm{Var}(n | 100, 100) \approx 0.960 \\ \end{eqnarray}\]The wolfram code to calculate the expectation and the variance.