Statistics Note3

1 minute read

Statistics

This is note3 based on Linear Algebra And Learning From Data by Gilbert Strang.


1.Covariance Matrices and Joint Probabilities

Linear algebraenters When we run M different experiments. For example, we measure age and height and weight to get a vector $m=(m_1,m_2,m_3)$ in which $m=3$.

A matrix becomes involved when we look at variances. Each experiment (dimension) will have a sample variance or expected variance. If we measure age and height and weight for children, the results will be strongly correlated. The connection of variables is denoted by covariance.

\(\sigma_{12}=E[(m_1-\mu_1)(m_2-\mu_2)]\)
also: \(\sigma_{xy}=E[(X-\bar X)(Y-\bar Y)]\)
Joint probability $p_{ij}$ : experiment1 produces i and experiment2 produces j.
\(\sigma_{12}=\sum_{i}\sum_{j}p_{ij}(x_i-m_1)(y_j-m_2)\)

Probability Matrix:
\(\begin{pmatrix} p_{11} & p_{12} \\ p_{21} & p_{22} \\ \end{pmatrix}\)

For independent trials we have $\sigma_{ij}(i\ne j) = 0$ , because $p_{ij}=p_ip_j$

Covariance Matrix (By definition):
\(V = \sum \sum p_{ij} \begin{pmatrix} (x_i-m_1)^2 & (x_i-m_1)(y_j-m_2) \\ (x_i-m_1)(y_j-m_2) & (y_j-m_2)^2 \\ \end{pmatrix}\)

On the diagonal, we are getting the ordinary variances $\sigma_1^2, \sigma_2^2$ :
\(V_{11} = \sum\sum p_{ij} (x_i-m_1)^2=\sum p_i (x_i-m_1)^2=\sigma_{1}^2\)

We can write V in vector multiplication form:

$ \begin{bmatrix} (x_i-m_1)^2 & (x_i-m_1)(y_j-m_2)
(x_i-m_1)(y_j-m_2) & (y_j-m_2)^2
\end{bmatrix} $ $=\begin{bmatrix} x_i-m_1
y_j-m_2
\end{bmatrix} \begin{bmatrix} x_i-m_1 & y_j-m_2
\end{bmatrix}$

$=(X-\bar X)(X-\bar X)^T$

We know that V is sum of rank 1 matrices and symmetric matrix (Positive Semidefinite also).
\(V=\sum p_{ij}(X-\bar X)(X-\bar X)^T\)
for continous r.v.:
\(V=\int p_{ij}(X-\bar X)(X-\bar X)^T\)

The covariance matrix for $Z=AX$

$Z=AX$, $V_z=AV_XA^T$

Example:
$A=\begin{bmatrix} 1 & 1
\end{bmatrix}$ , $Z=X+Y$

The variance of $Z$, denoted as $\sigma_z^2$, can be computed as:

\(\begin{bmatrix}1&1 \end{bmatrix}\begin{bmatrix}\sigma_x^2&\sigma_{xy}\\\sigma_{xy}& \sigma_y^2 \end{bmatrix}\begin{bmatrix}1\\1 \end{bmatrix}=\sigma_x^2+\sigma_y^2+2\sigma_{xy}\)

The Correlation $\rho$

The correlation coefficient between $X$ and $Y$ is denoted as $\rho_{xy}$, and it satisfies the following inequality: \(-1\le \rho _{xy}=\frac{\sigma_{xy}}{\sigma_x\sigma_y} \le1\) We can standardize $X$ and $Y$ by dividing them with their respective standard deviations, i.e., $X=\frac{x}{\sigma_x}$ and $Y=\frac{y}{\sigma_y}$, so that they have zero mean and unit variance. The correlation coefficient $\rho_{xy}$ remains the same after standardization.

Hanqing Shi

Hanqing Shi

Curently study in University of Electronic Science and Technology of China(UESTC).

Comments

  Write a comment ...