Before we look at the concept of statistical distance, it is worth pointing out first what is meant by distance in mathematics.

**Definition**

Suppose that V is a non-empty set and d is a function from V×V to . Then d is a distance function or a metric if for every p, q, r ∈ V the following conditions hold:

- d(p,q) ≥ 0
- d(p,q) = 0 ⇔ p = q
- d(p,q) = d(q,p)
- d(p,q) ≤ d(p,r) + d(r,q)

For instance, in we can define a distance function *d* where for every p, q ∈ . (Equivalently, ). It can be shown that for every p, q, r ∈ the following hold: 1) , 2) , 3) , and 4) .

As another example, in we can define a distance function as follows. The distance between A(a_{1},a_{2}) and B(b_{1},b_{2}) is . It can be proved that *d* as defined in this fashion also satisfies the four conditions 1), 2), 3), and 4) above.

The distance function in above can be extended to where the distance between and is defined by . It can be proved that *D* satisfies the four conditions 1), 2), 3), and 4) above.

In statistics, quantitative data can be plotted on a coordinate plane. Univariate data can be plotted on an axis (e.g. the x-axis or the real number line). Bivariate data can be plotted on a coordinate plane with two axes perpendicular to each other (e.g. the x and y axes on the xy-plane). As an example of bivariate data, consider the following sample data.

The data can be plotted on the xy-plane as follows.

**Figure 1**

The figure above demonstrates how bivariate data are plotted on the xy-plane. For bivariate data, such presentation is possible. But we cannot do so for data with more than 2 variates. Therefore, rather than stating that every bivariate data can be plotted on the xy-plane, it will be more fruitful to assert that there is a one-to-one correspondence between bivariate data with their coordinate vectors relative to an orthonormal basis for . By asserting the relationship this way, we can generalize it to the data with more than 2 variates as follows: “There is a one-to-one correspondence between p-variate data with their coordinate vectors relative to an orthonormal basis for .”

Now, how to define the statistical distance referred to in this post? The distance between two points in (i.e. between two data with p variates) defined by as above does not take into account the variance of each variable and does not take into account the covariance between the variables. On the contrary, statistical distance “compensates” for the variance and covariance in the multivariate data. Moreover, the statistical distance *d* between the data vectors and is defined as follows.

where

A = a positive definite symmetric matrix of order p

The matrix A above defines a statistical distance. In the principal component analysis, A is a variance-covariance matrix.

**Example 1**

Given a positive definite matrix , find the statistical distance between K(2,1) and L(-1,0).

**Answer**

Let and . Accordingly, and . From how the statistical distance is defined, , we get:

Thus, the statistical distance between K and L is .

**The Shape of “Circle” by Statistical Distance**

In general, a circle is defined as a set of all points that are equidistant from a certain fixed point. The fixed point is called the center of the circle and the equal distance is called the radius of the circle. It follows from the definition that the resulting circle depends on the domain of the distance function and the way how we define the distance function. To illustrate this, let the distance function *d* be defined on such that the distance between A(a_{1},a_{2}) ∈ and B(b_{1},b_{2}) ∈ is . By this definition, the shape of a circle with center O and radius 1 is as follows.

**Figure 2**

But **how would the circle look like if the statistical distance was applied?** Look at the following example.

**Example 2**

Find the equation of a circle with center O and radius 1 if the statistical distance is applied with . Sketch the graph of the circle.

**Answer**

A circle with center O and radius 1 satisfies the equation . If then the equation can be expressed as follows:

The locus of the points with the equation is sketched as follows.

**Figure 3**

Note that the resulting circle has the form of an ellipse.

How to determine **the directions and the length of the major and minor axes of the ellipse** if the statistical distance function is given? The answer can be inferred from Theorem 1 below.

**Theorem 1**

If A is a positive definite symmetric matrix with spectral decomposition then:

- the set of points at a distance of c from the origin O has the equation , which is equivalent to ,
- satisfies ; i = 1, 2, …, p, and
- the ‘s, where i = 1, 2, 3, …, p, are the direction vectors of the axes of the hyperellipsoids .

In Example 2 above, where λ_{1} = 10, , λ_{2} = 5, . Half the length of the axis of the ellipse in the direction is and half the length of the axis of the ellipse in the direction is . This situation is depicted as follows.

**Figure 3**

As Figure 3 shows, and . As a consequence, the minor axis is in length and has the same direction as determined above. On the other hand, the major axis is in length and has the same direction as .

**Example 3**

In a bivariate population, a statistical distance is defined by the variance-covariance matrix .

- Find the distance from any point with the coordinates (x
_{1},x_{2}) to the origin O in the form of . - Find the distance between and O.
- Let the distance of T from O be
*c*. Find the equation of the ellipse corresponding to the locus of the points that are at a distance of*c*from O and sketch the ellipse. - Let the spectral decomposition of Σ be with λ
_{1}> λ_{2}. Determine and draw the new coordinate axes and on conditions that the direction vector of the axis is and the direction vector of axis is . - Express the equation of the ellipse in part 3 of this example in and .
- Determine the coordinates of T relative to the ordered basis .
- Let the coordinates of T in part 6 of this example be (k
_{1},k_{2}). Verify that and satisfy the equation of the ellipse in part 5.

Answer to part 1

Answer to part 2

Substituting and into d as obtained in the answer to part 1, we have the following.

Therefore, the distance from T to O is 50.

Answer to part 3

The desired ellipse equation is:

The characteristic equation of Σ is:

This yields the eigenvalues λ_{1} = 100 and λ_{2} = 25.

λ_{1} = 100 gives the eigenvector .

λ_{2} = 25 gives the eigenvector .

According to Theorem 1, and are the direction vectors of the axes of the ellipse. From part 2 of Theorem 1, it can be inferred that half the length of the axis in the direction is and half the length of the axis in the direction is . This situation can be depicted as follows.

**Figure 4**

Answer to part 4

**Figure 5**

Answer to part 5

To express the ellipse equation in and , we diagonalize Σ. If then the diagonal matrix produced is .

In this case, . Therefore,

Thus, the required equation is , which is equivalent to .

Answer to part 6

To determine the coordinates of T relative to which is used as the ordered basis for the -plane, we can use the formula where . Here is the coordinate matrix of T relative to basis B’ and is the coordinate matrix of T relative to basis B.

Accordingly, the coordinates of T relative to the ordered basis is .

Answer to part 7

Substitute and into the equation . This results in the following.

2500 = 2500 (a true statement)

So, and satisfy the ellipse equation in part 5. (See Figure 6 below.)

**Figure 6**

**Example 4**

A random vector has a bivariate normal density with . Sketch a constant density ellipse . Find the length of the major and minor axes of the ellipse. Determine its principal components.

**Answer**

It can be shown that the spectral decomposition of Σ is with λ_{1} = 10, , λ_{2} = 5, .

Consequently, the spectral decomposition of Σ^{-1} is:

By part 2 of Theorem 1, it can be concluded that half the length of the axis of the ellipse in the direction is and half the length of the axis of the ellipse in the direction is . Consequently, the length of the major and minor axes are and , respectively.

The first principal component:

The second principal component:

Example 4 illustrates an application of the generally accepted theorem on a multivariate population involving p variates as follows.

**Theorem 2**

If the random vector has a multivariate normal distribution with the mean and the covariance matrix Σ then the density of is constant on the centered ellipsoids which have axes , i = 1, 2, …, p, where the are the eigenvalue-eigenvector pairs of Σ.