考虑一组独立同分布的样本 \({x^{(1)},...,x^{(m)}\) 服从高斯分布 \(p(x^{(i)})=\mathcal{N}(x^{(i)};\mu,\sigma^2)\),其中 \(i \in \{1,...,m\}\)

1. 样本方差(sample variance)

样本方差定义为

\[ \hat{\sigma}^{2}_{m} = \frac{1}{m}\sum_{i=1}^{m}(x^{(i)}-\hat{\mu}_{m})^2 \]

其中 \(\hat{\mu}_{m}\) 是样本均值。

那么它的偏差为:\(bias(\hat{\sigma}^{2}_{m})=\mathbb{E}[\hat{\sigma}^{2}_{m}]-\sigma^2\)

以知的条件有:

  • \(\mathbb{E}(x^{(i)})=\mu\)
  • \(D(x^{(i)})=\sigma^2\)
  • \(D(x^{(i)})=\sigma^2=\mathbb{E}[(x^{(i)})^2]-\mathbb{E}[x^{(i)}]^2\)
  • \(\mathbb{E}(\hat{\mu}_{m})=\mu\)
  • \(D(\hat{\mu}_{m})=D(\frac{1}{n}\sum_{i=1}^{m}x^{(i)})=\frac{1}{n^2}D(\sum_{i=1}^{m}x^{(i)})=\frac{\sigma^2}{n}\)

那么有:

\[ \begin{aligned} \mathbb{E}[\hat{\sigma}^{2}_{m}] &=\mathbb{E}[\frac{1}{m}\sum_{i=1}^{m}(x^{(i)}-\hat{\mu}_{m})^2]\\ &=\frac{1}{m}\mathbb{E}[\sum_{i=1}^{m}((x^{(i)})^2-2x^{(i)}\hat{\mu}_m+\hat{\mu}^2_{m})]\\ &=\frac{1}{m}\mathbb{E}[\sum_{i=1}^{m}(x^{(i)})^2]-\frac{\hat{2\mu_m}}{m}\mathbb{E}[\sum_{i=1}^{m}x^{(i)}]+\frac{1}{m}\mathbb{E}[\sum_{i=1}^{m}\hat{\mu}^{2}_{m}]\\ &=\frac{1}{m}\mathbb{E}[\sum_{i=1}^{m}(x^{(i)})^2]-\frac{1}{m}\mathbb{E}[2m\hat{\mu}^{2}_{m}]+\frac{1}{m}\mathbb{E}[m{\mu}^{2}_{m}]\\ &=\frac{1}{m}\mathbb{E}[\sum_{i=1}^{m}(x^{(i)})^2]-\frac{1}{m}\mathbb{E}[m\hat{\mu}^{2}_{m}]\\ &=\frac{1}{m}\sum_{i=1}^{m}(\mathbb{E}[(x^{(i)})^2]-\mathbb{E}[\hat{\mu}^{2}_{m}])\\ &=\frac{1}{m}\sum_{i=1}^{m}(D(x^{(i)})+\mathbb{E}^2(x^{(i)})-D(\hat{\mu}_m)-\mathbb{E}^2(\hat{\mu}_m))\\ &=\frac{1}{m}\sum_{i=1}^{m}(\sigma^2+\mu^2-\frac{\sigma^2}{m}-\mu^2)\\ &=\frac{m-1}{m}\sigma^2 \end{aligned} \] 因此前面的偏差\(bias(\hat{\sigma}^{2}_{m})=-\frac{\sigma^2}{m}\)

2. 无偏样本方差(unbiased sample variance)

因此,下面这种估计是无偏的:

\[ \tilde{\sigma}_m^2=\frac{1}{m-1}\sum_{i=1}^{m}(x^{(i)}-\hat{\mu}_m)^2 \]

可以知道 \(\tilde{\sigma}_m^2=\sigma^2\)