# Rao-Blackwell theorem

Post-publication activity

Rao-Blackwell Theorem provides a process by which a possible improvement in efficiency of an estimator can be obtained by taking its conditional expectation with respect to a sufficient statistic.

## Definition

Let $$x=(x_1,\ldots,x_n)$$ be a random sample from a probability distribution $$p(x,\theta)$$ where $$\theta=(\theta_1,\ldots,\theta_q)$$ is an unknown vector parameter. Consider an estimator $$t(x)=(t_1(x),\ldots,t_q(x))$$ of $$\theta$$ and the $$q\times q$$ mean square and product matrix $$C(t)=(c_{ij})\ ,$$ where

$c_{ij}=E\left[(t_i(x)-\theta_i)(t_j(x)-\theta_j)\right].$

Let $$S$$ be a statistic, which may be vector valued, such that the conditional expectation, $$E(t|S)=T(x)\ ,$$ is independent of $$\theta\ .$$ A general version of Rao-Blackwell Theorem (or Rao-Blackwellization) is

$C(t)-C(T)\,\,\text{is nonnegative definite}$

where $$C(T)$$ is the corresponding $$C$$ matrix for $$T\ .$$

If $$S$$ is a sufficient statistic, the condition that $$T$$ is independent of $$\theta$$ is automatically satisfied. A consequence of this result is $$E(t_r-\theta_r)^2\ge E(T_r-\theta_r)^2$$ so that there is an improvement in efficiency in terms of mean squared error loss by replacing $$t_r$$ by $$T_r\ .$$ A more general version of the Theorem is

$E\left[\Phi(t-\theta)\right]\ge E\left[\Phi(T-\theta)\right]$

where $$\Phi$$ is a convex function. In particular

$E\left[\Phi(t_r-\theta_r)\right]\ge E\left[\Phi(T_r-\theta_r)\right],\,r=1,\ldots,q$

for any univariate convex function $$\Phi\ .$$ If $$t_r$$ is unbiased for $$\theta_r\ ,$$ $$T_r=E(t_r|S)$$ is unbiased for $$\theta_r$$ and has smaller variance. $$T_r$$ will be a unique minimum variance unbiased estimator of $$\theta$$ if no function of $$S$$ has expectation zero (Rao, 1946).

## History

The result on one parameter appeared in Rao (1945) and in Blackwell (1947). Lehmann and Scheffè (1950) called the result as Rao-Blackwell theorem (RBT), and the process is described as Rao-Blackwellization (RB) by Berkson (1955). In computational terminology it is called Rao-Blackwellized Filter (RBF).

## Applications

If $$x_1,\ldots,x_n$$ are independent observations from any distribution, the order statistics $$x_{(1)},\ldots,x_{(n)}$$ are sufficient. If there is an estimator $$f(x_1)$$ which depends on say only on the first observation, its expectation conditional on the order statistics is $$[f(x_1)+\ldots+f(x_n)]/n\ ,$$ which has smaller variance than $$f(x_1)\ .$$ This also shows that a good estimator should be a symmetric function of the observations. An example given by Kolmogorov is the estimation of probability of an observation from a distribution not exceeding a certain value, say $$c\ .$$ An estimate is

$\begin{matrix} f(x,c) =\left \{ \begin{array}{ll} 1, & \textrm{if } \ x_1 \le c\\ 0, & \textrm{otherwise} \end{array} \right. \end{matrix}$

Taking the average over the order statistics, we have the estimate

$f(x,c)=\frac{N_<}{n}$ where $$N_<$$ is the number of observations not exceeding $$c\ .$$ This estimate is the empirical distribution function.

Rao-Blackwellization is extensively used in making estimates from sample survey data as illustrated in papers by Casella and Roberto (1996) and Thompson (2002). For instance, in adaptive sampling, observations are obtained in a sequence with the selection of each unit depending on some characteristics of previous observations. It may be simpler to make an estimate which depends on the order of observations drawn. In such a case, the Rao-Blackwellized estimator is obtained by averaging the estimator over all possible orders, resulting in an improved estimator. An interesting application is due to Efron (2004), where a Rao-Blackwellized version of cross-validation is used to estimate prediction error. Further applications can be found in engineering literature of switching dynamic systems where Rao-Blackwellization is used (as Rao-Blackwell Filter) in conjunction with Markoff Chain Monte Carlo (MCMC) methods.