Computational mathematics, assignment help

Anonymous
timer Asked: May 22nd, 2016
account_balance_wallet $20

Question Description

Its a computational mathematics assignment. Attached is the question file.

Unformatted Attachment Preview

Final Exam CSC I6731: Methods in Computational Science May 17, 2016 1. The Mean-Value Theorem for Derivatives states: If f (x) is continuous function on the (closed and finite) interval [a, b] and differentiable on (a, b), then f (b) − f (a) = f 0 (ξ), for some ξ ∈ (a, b). b−a √ Let f (x) = x and [a, b] = [0, 4]. Find the point(s) ξ specified by the theorem. Find the second Taylor polynomial P2 (x) for this function about x0 = 1. Give an upper bound on the absolute value of interpolating error. 2. Show that the following inequalities hold for any vector x: √ 1 √ ||x||2 ≤ ||x||∞ ≤ ||x||2 ≤ ||x||1 ≤ n||x||2 ≤ n||x||∞ . n Hint: Use the Cauchy-Schwartz inequality. 3. Consider the sample version of the Principal Component Analysis: Y = (X − 1xT )G, where X (n × p) is the data matrix, 1 is a vector of length n consisting of ones, and G is the orthogonal matrix containing the standardized eigenvectors corresponding to the eigenvalues l1 ≥ · · · ≥ lp of S, the sample matrix of X. (a) Prove that the columns of Y have mean zero; (b) Prove that the sample variance of the ith column of Y equals li ; (c) Prove that the sample correlation between any two columns of Y is zero. 4. Given the following histogram (r = Gray level, n = number of occurrences), r n 0 400 1/7 700 2/7 800 3/7 900 4/7 500 5/7 400 6/7 196 1 200 (a) Perform histogram equalization; (b) Perform histogram matching using the specified histogram in the following table (r = Gray level, p = probability of occurrences): r p 0 0.05 1/7 0.05 2/7 0.1 3/7 0.1 1 4/7 0.15 5/7 0.2 6/7 0.25 1 0.1 5. Explain why the Sobel and the Prewitt matrices compute horizontal and vertical gradients. Which of the two masks evaluate vertical gradients? 6. Derive the normal equations for the c∗ = [c∗1 , c∗2 ] that minimize the approximation error in the least square sense ∗ c = argmax( c N X [fn − F (xn , c)]2 ) n=1 in case of F (x, c) = c1 ec2 x . Are the normal equations still linear? 2 ...
Purchase answer to see full attachment

Tutor Answer

nyajust
School: UIUC

1, 4 & 5 first

1.)
There are some cases to follow: a) f(x)=k, a constant; thus f’(x)=0, hence c can be any number
in (a, b).
b) f(x)> f(a) for some x in (a, b) this shows that f is maximum at some point in [a, b]. Since
f(a)=f(b), thus f is in differentiable at c.
c) f(x)"" Xnl be n measurements on the first variable. Then the arithmetic average of these measurements is

1~(

= - "'" Xjl -

(1-3)

The square root of the sample variance, ~, is known as the sample standard
deviation. This measure of variation uses the same units as the observations.
Consider n pairs of measurements on each of variables 1 and 2:

[xu],
X12

[X2l], •.. , [Xnl]
X22
Xn2

That is, Xjl and Xj2 are observed on the jth experimental item (j = 1,2, ... , n). A
measure of linear association between the measurements of variables 1 and 2 is provided by the sample covariance

8

f
if

Chapter 1 Aspects of Multivariate Analysis

or the average product of the deviations from their respective means. If large values for
one variable are observed in conjunction with large values for the other variable, and
the small values also occur together, sl2 will be positive. If large values from one variable occur with small values for the other variable, Sl2 will be negative. If there is no
particular association between the values for the two variables, Sl2 will be approximately zero.
The sample covariance

1

Sik

n

= -:L
n

_

~

i

(Xji - Xi)(Xjk - Xk)

= 1,2, ... ,p,

k

=

1,2, ... ,p (1-4)

j=l

measures the association between the ·ith and kth variables. We note that the covariance reduces to the sample variance when i = k. Moreover, Sik = Ski for all i and k ..
The final descriptive statistic considered here is the sample correlation coefficient (or Pearson's product-moment correlation coefficient, see [14]). This measure
of the linear association between two variables does not depend on the units of
measurement. The sample correlation coefficient for the ith and kth variables is
defined as

j

The Organization of Data, 9

The ~u~ntities Sik and rik do not, in general, convey all there is to know about
the aSSOCIatIOn between two variables. Nonlinear associations can exist that are not
revealed .by these ~es~riptive statistics. Covariance and corr'elation provide measures of lmear aSSOCIatIOn, or association along a line. Their values are less informative ~~r other kinds of association. On the other hand, these quantities can be very
sensIttve to "wild" observations ("outIiers") and may indicate association when in
fact, little exists. In spite of these shortcomings, covariance and correlation coefficien~s are routi':lel.y calculated and analyzed. They provide cogent numerical summan~s ~f aSSOCIatIOn ~hen the data do not exhibit obvious nonlinear patterns of
aSSOCIation and when WIld observations are not present.
. Suspect observa.tions must be accounted for by correcting obvious recording
mIstakes and by takmg actions consistent with the identified causes. The values of
Sik and rik should be quoted both with and without these observations.
The sum of squares of the deviations from the mean and the sum of crossproduct deviations are often of interest themselves. These quantities are
n

Wkk

n

:L (Xji j=l

=

2: (Xjk -

Xk)2

k = 1,2, ... ,p

(1-6)

1,2, ... ,p,

(1-7)

j=I

x;) (Xjk - Xk)

and
(1-5)

n

Wik =

= 1,2, ... , p and k = 1,2, ... , p. Note rik = rki for all i and k.
The sample correlation coefficient is a standardized version of the sample covariance, where the product of the square roots of the sample variances provides the
standardization. Notice that rik has the same value whether n or n - 1 is chosen as
the common divisor for Sii, sa, and Sik'
The sample correlation coefficient rik can also be viewed as a sample co variance.
Suppose the original values 'Xji and Xjk are replaced by standardized values
for i

2: (Xji j=l

x;) (Xjk - Xk)

1. The value of r must be between -1 and +1 inclusive.
2. Here r measures the strength of the linear association. If r = 0, this implies a
lack of linear association between the components. Otherwise, the sign of r indicates the direction of the association: r < 0 implies a tendency for one value in
the pair to be larger than its average when the other is smaller than its average;
and r > 0 implies a tendency for one value of the pair to be large when the
other value is large and also for both values to be small together.
3. The value of rik remains unchanged if the measurements of the ith variable
are changed to Yji = aXji + b, j = 1,2, ... , n, and the values of the kth variable are changed to Yjk = CXjk + d, j == 1,2, ... , n, provided that the constants a and c have the same sign.

=

k = 1,2, ... ,p

The descriptive statistics computed from n measurements on p variables can
also be organized into arrays.

Arrays of Basic Descriptive Statistics

(Xji - xi)/~and(xjk - xk)/~.Thestandardizedvaluesarecommensurablebe­

cause both sets are centered at zero and expressed in standard deviation units. The sample correlation coefficient is just the sample covariance of the standardized observations.
Although the signs of the sample correlation and the sample covariance are the
same, the correlation is ordinarily easier to interpret because its magnitude is
bounded. To summarize, the sample correlation r has the following properties:

i

Sample means

Sample variances
and covariances

i~m
Sn =

Sample correlations

R

]
]

[u

Sl2

S~l

S22

S2p

Spl

sp2

spp

~ l~'
'pI

'"

r12

1

'"

'p2

1

r2p

(1-8)

lE

10

The Organization of Data

Chapter 1 Aspects of Multivariate Analysis
The sample correlation is

The sample mean array is denoted by X, the sample variance and covari~nce
array by the capital letter Sn, and the sample correlation array by R. The subscrIpt ~
on the array Sn is a mnemonic device used to remind you that n is employed as a divisor for the elements Sik' The size of all of the arrays is determined by the number
of variables, p.
The arrays Sn and R consist of p rows and p columns. The array x is a single
column with p rows. The first subscript on an entry in arrays Sn and R indicates
the row; the second subscript indicates the column. Since Sik = Ski and rik = rki
for all i and k, the entries in symmetric positions about the main northwestsoutheast diagonals in arrays Sn and R are the same, and the arrays are said to be

so

symmetric.

Graphical Techniques

Example 1.2 (The arrays ;c, SR' and R for bivariate data) Consider the data intro-

duced in Example 1.1. Each. receipt yields a pair of measurements, total dollar
sales, and number of books sold. Find the arrays X, Sn' and R.
Since there are four receipts, we have a total of four measurements (observations) on each variable.
The-sample means are
4

Xl

= 1 2:

Xjl

= 1(42 +

52 + 48

+ 58) = 50

j=l
4

X2

4
(Xjl -

= ~«42
S22 =

~ 2:

(Xj2 -

Variable 1
Variable2

1

-.36

-.3~J

lE

(Xl):

3
5

(X2):

4
5.5

6
7

2
4

8
10

2
5

5
7.5

Thes~

xd

XI)( Xj2

-

X2

+ (58 - 50)2) = 34

=

••

CS ••
Cl


.5

X2)

= ~«42 - 50)(4 - 4)

+ (52 - 50)(5 - 4)
+ (48 - 50)(4 - 4) + (58 - 50)(3 - 4»



!
:a'"

4

(Xjl -

vs;; VS;

rl2

xd

j=l
~ 1«4 - 4f + (5 - 4? + (4 - 4f + (3 - 4)2)

Sl2 = ~ 2:
j=l

=

.
= -.36

data ~re ?lotted as seven points in two dimensions (each axis representIll~ a vanable) III FIgure 1.1. The coordinates of the points are determined by the
patr~d measurements: (3,5), (4,5.5), ... , (5,7.5). The resulting two-dimensional
plot IS known as a scatter diagram or scatter plot.

- 50)2 + (52 - 50l + (48 - 50)2

4

r21

V34 v'3

are im~ortant, but frequently neglected, aids in data analysis. Although it is impossIble to simultaneously plot all the measurements made on several variables and
study ~he configurations, plots of individual variables and plots of pairs of variables
can stIll be very informative. Sophisticated computer programs and display equipn;tent al.low on~ the luxury of visually examining data in one, two, or three dimenSIOns WIth relatIve ease. On the other hand, many valuable insights can be obtained
from !he data by const~uctin~ plots with paper and pencil. Simple, yet elegant and
~ffectIve, met~ods for ~IsplaYIllg data are available in [29]. It is good statistical practIce to plot paIrs of varIables and visually inspect the pattern of association. Consider, then, the following seven pairs of measurements on two variables:

= 12: Xj2 = ~(4 + 5 + 4 + 3) = 4

The sample variances and covariances are
2:
j=l

= ---,=--

Plot~

.

~

-1.5

Sl2

r12

R _ [

j=l

Sll =

X2

10



10

8

8

6

6

4

4

2

2

• •

• • •



= -1.5
0

S21 = Sl2

4

• •

and
34

Sn = [ -1.5

II

-1.5J

5

!

!

2

4



6

8

!

!

8
6
Dot diagram

I ..

10

XI

Figure 1.1 A scatter plot
and marginal dot diagrams.


12 Chapter 1 Aspects of Multivariate Analysis

The Organization of Data

Also shown in Figure 1.1 are separate plots of the observed values of variable 1
and the observed values of variable 2, respectively. These plots are called (marginal)
dot diagrams. They can be obtained from the original observations or by projecting
the points in the scatter diagram onto each coordinate axis.
The information contained in the single-variable dot diagrams can be used to
calculate the sample means Xl and X2 and the sample variances SI 1 and S22' (See Exercise 1.1.) The scatter diagram indicates the orientation of the points, and their coordinates can be used to calculate the sample covariance s12' In the scatter diagram
of Figure 1.1, large values of Xl occur with large values of X2 and small values of Xl
with small values of X2' Hence, S12 will be positive.
Dot diagrams and scatter plots contain different kinds of information. The information in the marginal dot diagrams is not sufficient for constructing the scatter
plot. As an illustration, suppose the data preceding Figure 1.1 had been paired differently, so that the measurements on the variables Xl and X2 were as follows:
Variable 1
Variable 2

(Xl):

5

4

(X2):

5

5.5

6
4

2
7

2

8

10

5


••
• ••


10
8

10

6

4

4

2

2
0

40

8';,'

S,§
-

~

~

tE

~

Co]

f

••



30

0

~:::0

'-'

£~

20

,

10
0

1

I
I




8

6

X2



•• •

•• •

Dun & Bradstreet

Time Warner

-10
0
Employees (thousands)



Figure 1.3 Profits per employee
and number of employees for 16
publishing firms.

The sample correlation coefficient computed from the values of Xl and X2 is

r12

-.39
-.56
=
{ _ .39
-.50

for all 16 firms
for all firms but Dun & Bradstreet
for all firms but Time Warner
for all firms but Dun & Bradstreet and Time Warner

f

X2

X2

Example 1.3 (The effect of unusual observations on sample correlations) Some fi- .
nancial data representing jobs and productivity for the 16 largest publishing firms
appeared in an article in Forbes magazine on April 30, 1990. The data for the pair of
variables Xl = employees Gobs) and X2 = profits per employee (productivity) are
graphed in Figure 1.3. We have labeled two "unusual" observations. Dun & Bradstreet is the largest firm in terms of number of employees, but is "typical" in terms of
profits per employee. TIme Warner has a "typical" number of employees, but comparatively small (negative) profits per employee.

3
7.5

(We have simply rearranged the values of variable 1.) The scatter and dot diagrams
for the "new" data are shown in Figure 1.2. Comparing Figures 1.1 and 1.2, we find
that the marginal dot diagrams are the same, but that the scatter diagrams are decidedly different. In Figure 1.2, large values of Xl are paired with small values of X2 and
small values of Xl with large values of X2' Consequently, the descriptive statistics for
the individual variables Xl, X2, SI 1> and S22 remain unchanged, but the sample covariance S12, which measures the association between pairs of variables, will now be
negative.
The different orientations of the data in Figures 1.1 and 1.2 are not discernible
from the marginal dot diagrams alone. At the same time, the fact that the marginal
dot diagrams are the same in the two cases is not immediately apparent from the
scatter plots. The two types of graphical procedures complement one another; they
are nqt competitors.
The next two examples further illustrate the information that can be conveyed
by a graphic display.

13






••
4

2



t

2

It is clear that atypical observations can have a considerable effect on the sample
correlation coefficient.



t
4







6

8

10

t

t

6

8

I
10

XI

... XI

Figure 1.2 Scatter plot
and dot diagrams for
rearranged data.

Example 1.4 (A scatter plot for baseball data) In a July 17,1978, article on money in
sports, Sports Illustrated magazine provided data on Xl = player payroll for National League East baseball teams.
We have added data on X2 = won-lost percentage "for 1977. The results are
given in Table 1.1.
The scatter plot in Figure 1.4 supports the claim that a championship team can
be bought. Of course, this cause-effect relationship cannot be substantiated, because the experiment did not include a random assignment of payrolls. Thus, statistics cannot answer the question: Could the Mets have won with $4 million to spend
on player salaries?

14 Chapter 1 Aspects of Multivariate Analysis

The Organization of Data

Table 1.1 1977 Salary and Final Record for the National League East

won-lost
percentage

Table 1.2 Paper-Quality Measurements

Strength

X2=

Team

Xl =

Philadelphia Phillies
Pittsburgh Pirates
St. Louis Cardinals
Chicago Cubs
Montreal Expos
New York Mets

player payroll
3,497,900
2,485,475
1,782,875
1,725,450
1,645,575
1,469,800

.623
.593
.512
.500
.463
.395

Specimen

Density

Machine direction

1
2
3
4
5
6
7
8
9
10

.801

121.41
127.70
129.20
131.80
135.10
131.50
126.70
115.10
130.80
124.60
118.31
114.20
120.30
115.70
117.51
109.81
109.10
115.10
118.31
112.60
116.20
118.00
131.00
125.70
126.10
125.80
125.50
127.80
130.50
127.90
123.90
124.10
120.80
107.40
120.70
121.91
122.31
110.60
103.51
110.71
113.80

11



•••



o
Player payroll in millions of dollars



Figure 1.4 Salaries
and won-lost
percentage from
Table 1.1.

To construct the scatter plot in Figure 1.4, we have regarded the six paired observations in Table 1.1 as the coordinates of six points in two-dimensional space. The
figure allows us to examine visually the grouping of teams with respect to the variables total payroll and won-lost percentage.
-

Example I.S (Multiple scatter plots for paper strength measurements) Paper is manufactured in continuous sheets several feet wide. Because of the orientation of fibers
within the paper, it has a different strength when measured in the direction produced by the machine than when measured across, or at right angles to, the machine
direction. Table 1.2 shows the measured values of

X2

= density (grams/cubic centimeter)
= strength (pounds) in the machine direction

X3

= strength (pounds) in the cross direction

Xl

A novel graphic presentation of these data appears in Figure 1.5, page' 16. The
scatter plots are arranged as the off-diagonal elements of a covariance array and
box plots as the diagonal elements. The latter are on a different scale with this

12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41

.~24

.841
.816
.840
.842
.820
.802
.828
.819
.826
.802
.810
.802
.832
.796
.759
.770
.759
.772
.806
.803
.845
.822
.971
.816
.836
.815
.822
.822
.843
.824
.788
.782
.795
.805
.836
.788
.772
.776
.758

Source: Data courtesy of SONOCO Products Company.

Cross direction
70.42
72.47
78.20
74.89
71.21
78.39
69.02
73.10
79.28
76.48
70.25
72.88
68.23
68.12
71.62
53.10
50.85
51.68
50.60
53.51
56.53
70.70.
74.35
68.29
72.10
70.64
76.33
76.75
80.33
75.68
78.54
71.91
68.22
54.42
70.41
73.68
74.93
53.52
48.93
53.67
52.42

15

=
The Organization of Data

17

16 Chapter 1 Aspects of Multivariate Analysis

0.97

Max

·i"
0

Med
Min

~

0.81

..

.. ... .....
.e' . .:-

OIl

"~
'"

......... .....

....

...r

..

.. ;-

0.76

Med

Min

r
r

T

.. ...
.... ....

135.1

I

I

121.4

...
..

.. .. ..........:

••••*'

.. 4-*.:.*
~:\.

: :

'.

..

103.5

-'--

Max

. :....

on the jth item represent the coordinates of a point in p-dimensional space. The coordinate axes are taken to correspond to the variables, so that the jth point is Xjl
units along the first axis, Xj2 units along the second, ... , Xjp units along the pth axis.
The resulting plot with n points not only will exhibit the overall pattern of variability, but also will show similarities (and differences) among the n items. Groupings of
items will manifest themselves in this representation.
The next example illustrates a three-dimensional scatter plot.

.::..:.:. '..
....-...

-:

~.:

Max

~-S

n Points in p Dimensions (p-Dimensional Scatter Plot). Consider the natural extension of the scatter plot to p dimensions, where the p measurements

Strength (CD)

Strength (MD)

Density

...

T

80.33

Med

70.70

Min

48.93

. .. ... .

Example 1.6 (Looking for lower-dimensional structure) A zoologist obtained measurements on n = 25 lizards known scientifically as Cophosaurus texanus. The
weight, or mass, is given in grams while the snout-vent length (SVL) and hind limb
span (HLS) are given in millimeters. The data are displayed in Table 1.3.
Although there are three size measurements, we can ask whether or not most of
the variation is primarily restricted to two dimensions or even to one dimension.
To help answer questions regarding reduced dimensionality, we construct the
three-dimensional scatter plot in Figure 1.6. Clearly most of the variation is scatter
about a one-dimensional straight line. Knowing the position on a line along the
major axes of the cloud of poinfs would be almost as good as knowing the three
measurements Mass, SVL, and HLS.
However, this kind of analysis can be misleading if one variable has a much
larger variance than the others. Consequently, we first calculate the standardized
values, Zjk = (Xjk - Xk)/~' so the variables contribute equally to the variation

Figure 1.5 Scatter plots and boxplots of paper-quality data from Thble 1.2.
software so we use only the overall shape to provide information on symme~ry
and possible outliers for each individual characteristic. The scatter plots can be mspected for patterns and unusual observations. In Figure 1.5, there is one unusual
observation: the density of specimen 25. Some of the scatter plots have patterns
suggesting that there are two separate clumps of observations.
These scatter plot arrays are further pursued in our discussion of new software
graphics in the next section.

-

In the general multiresponse situation, p variables are simultaneously rec~rded
oon items. Scatter plots should be made for pairs of. important variables and, If the
task is not too great to warrant the effort, for all pairs.
.
Limited as we are to a three:dimensional world, we cannot always picture an
entire set of data. However, two further geom7tri~ repres~nta~ions of t?e. data provide an important conceptual framework for Vlewmg multIvanable statlstlc~l methods. In cases where it is possible to capture the essence of the data m three
dimensions, these representations can actually be graphed.

Table 1.3 Lizard Size Data
Lizard

Mass

SVL

HLS

Lizard

Mass

SVL

HLS

1
2
3
4
5
6
7
8
9
10
11
12
13

5.526
10.401
9.213
8.953
7.063
6.610
11.273
2.447
15.493 .
9.004
8.199
6.601
7.622

59.0
75.0
69.0
67.5
62.0
62.0
74.0
47.0
86.5
69.0
70.5
64.5
67.5

113.5
142.0
124.0
125.0
129.5
123.0
140.0
97.0
162.0
126.5
136.0
116.0
135.0

14
15
16
17
18
19
20
21
22
23
24
25

10.067
10.091
10.888
7.610
7.733
12.015
10.049
5.149
9.158
12.132
6.978
6.890

73.0
73.0
77.0
61.5
66.5
79.5
74.0
59.5
68.0
75.0
66.5
63.0

136.5
135.5
139.0
118.0
133.5
150.0
137.0
116.0
123.0
141.0
117.0
117.0

Source: Data courtesy of Kevin E. Bonine.

Data Displays and Pictorial Representations
1 AspectS 0
IS

19

f Multivariate Analysis

Cbapter

Figure 1.8 repeats the scatter plot for the original variables but with males
marked by solid circles and females by open circles. Clearly, males are typically larger than females.

....
...


15
~
~

\oTl
~



••

10

15

5

o· ~.

155
135
50

60

70
SVL

115
80

90

95

HLS

Figure 1.6 3D scatter
plot of lizard data from
Table 1.3.

X2) and are a
constant squared distance c2 from the origin must satisfy

02

(0,1)

=

1

12

-+-= 1
4
1

0 2 (-1)2
-+--=1

(0,-1)

4

(2,0)
12

(1, \/3/2)

(1-14) .
Equation (1-14) is the equation of an ellipse centered at the origin whose major and
minor axes coincide with the coordinate axes. That is, the statistical distance in
(1-13) has an ellipse as the locus of all points a constant distance from the origin.
This general case is shown in Figure 1.21.

.
XI + -x~
DIstance'. -4
1

Coordinates: (Xl, X2)

4" +

1

22
02
-+ -=1
4
1
(\/3/2)2
1

= 1

. A pl?t ?f the equation xt/4 + xVI = 1 is an ellipse centered at (0,0) whose
major. aXIS he~ along the Xl coordinate axis and whose minor axis lies along the X2
coordmate aXIS. The half-lengths of these major and minor axes are v'4 = 2 and
VI = 1, :espectively. The ellipse of unit distance is plotted in Figure 1.22. All points
on the ellIpse are regarded as being the same statistical distance from the origin-in
this case, a distance of 1.

x,

--__~----------~4-----------~r_~~~X,

cJs;:

--_-z::r-----J'--------j-----L..---+----*x,
-I

Z

Figure 1.22 Ellipse of unit

Figure 1.21 The ellipse of constant

.

statistical distance
d 2(O,P) = xI!sll + X~/S22 = c 2.

Example 1.14 (Calculating a statistical distance) A set of paired measurements
(Xl, X2) on two variables yields Xl = X2 = 0, Sll = 4, and S22 = 1. Suppose the Xl

measurements are unrelated to the x2 measurements; that is, measurements within a
pair vary independently of one another. Since the sample variances are unequal, we
measure the square of the distance of an arbitrary point P = (Xl, X2) to the origin
0= (0,0) by

All points (Xl, X2) that are a constant distance 1 from the origin satisfy the equation
x2

x2

4

1

--.!.+2= 1

The coordinates of some points a unit distance from the origin are presented in the
following table:

xi

distance, 4 +

-I

1x~

=

1.

The expression in (1-13) can be generalized to accommodate the calculation of
statistical distance from an arbitrary point P = (Xl, X2) to any fIXed point
Q = (YI, )'z). ~f we assume that .the coordinate variables vary independently of one
another, the dIstance from P to Q is given by
d(P, Q) =

I

(Xl -

\.j

Sl1

YI)2

+

(X2 -

)'z)2

S22

'(1-15)

.The extension of this statistical distance to more than two dimensions is
straIghtforward. Let the points P and Q have p coordinates such that
P = (x~, X2,···, xp) and Q = (Yl,)'z, ... , Yp). Suppose Q is a fixed point [it may be
the ongm 0 = (0,0, ... , O)J and the coordinate variables vary independently of one
another. Let Su, s22,"" spp be sample variances constructed from n measurements
on Xl, X2,"" xp, respectively. Then the statistical distance from P to Q is
d(P,Q) =

~(XI sll

Yl? + (X2 - )'z)2 + ... + (xp - Yp)2
s22
spp

(1-16)

bapter 1

34 C

Distance

Aspects of Multivar iate Analysis

All points P that are a constan t squared distance from Q rle on ad'hyperellipsoid
t
d at Q whose major and minor axes are parallel to the coor ma e ax es. We
centere
.
note th~ followmg:
1. The distance of P to the origin 0 is obtained by setting Yl = )'2 = ... = YP
=

in (1-16). -

Z If Sll

_

-

_ .,. =

S22 -

spp'

• The distance in (1-16) still does not include most of the i~porta~cases
~erSphl~!
f the assumption of indepen dent coordmates. e sca e
enc~unteri ~;c::;~~ a two-dimensional situation in which the xl ~easur~m~nts ~o
io FIgure. .
f h X measurements. In fact, the coordmates 0 t e p~Irs
o.ot vary mdepen dently 0 t e 2
mall together and the sample correlatIOn
) h'b't a tendenc y to b e 1arge or s
'
h
(.~lf~~ie~ i~ ;ositive . Moreov er, the variability in the
X2 direction is larger than t e
e
co
.
d' f
variability.m the Xl . Ifgfec ::~asure of distance when the variability in
the Xl direcWhat IS a meamn u
. bles X and X
.
h variability in the X2 directio n an d t h e vana
1
2
tion is dl~~r~~t :~~a:lyewe can use what we have already intro~uced,
provided t~at
are corre a e . . .
'.
wa From Fi ure 1.23, we see that If we rotate the ong;,e
ihe angle: while keeping the scatter fixed and
lOa) cO
~
d
the scatter in terms of the new axes looks very ~uc
.
the r?tat~d axe;
ou 2~ay wish to turn the book to place the Xl and X2 a.xes m
tha~ 10 FIgure . ~sitions.) This suggests that we calculate the .sample vananc~
s
theIr cust~mar~
coordin ates and measure distance as in EquatIOn (1-13). That.Is,
using the Xl an 2 h ~
d X axes we define the distance from the pomt
'th reference to t e Xl an
2
'
;
=' (Xl, X2) to the origin 0 = (0,0) as

lOO~:;i:I~g:;~:!: :~;~~gh
x

~~~:~

;0 c;.
f

d(O, P) =

The relation between the original coordin ates (Xl' Xz) and the rotated
coordinates (Xl, X2) is provide d by

Xl = Xl cos (0) + x2sin(0 )

(1-17)

Given the relation s in (1-18), we can formally substitu te for Xl and
X2 in (1-17)
and express the distance in terms of the original coordinates.
After some straight forward algebraic manipul ations, the distance
from
P = (Xl, X2) to the origin 0 = (0,0) can be written in terms of the
original coordinates Xl and X2 of Pas
d(O,P) = Val1x1 + 2al2xlx2 + a22x~
(1-19)
where the a's are number s such that the distance is nonnega tive for
all possible values of Xl and X2. Here all, a12, and a22 are dete,rmined by the angle
8, and Sll, s12,
and S22 calculat ed from the original data. 2 The particul ar forms for
all, a12, and a22
are not importa nt at this point. What is importa nt is the appeara nce
of the crossproduct term 2a12xlxZ necessit ated by the nonzero correlat ion r12'
Equatio n (1-19) can be compar ed with (1-13). The expressi on in (1-13)
can be
regarde d as a special case of (1-19) with all = 1/s , a22 = 1/s , and
a12 = O.
ll
22
In general, the statistic al distance ofthe point P = (x], X2) from the
fvced point
Q = (Yl,)'2) for situatio ns in which the variable s are correlat ed has
the general
form
d(P,Q) = Val1(X I -

denote the sample variances comput ed with the Xl arid X2
where Sl1 and sn
measurements.
X2

Xl
~
1




.,:~

.,

.

. ,..
I.
I

1

2adxI - YI)(X2 -

)'2)

+ a22(x2 -

)'2)2 =

c2

2Specifically,
cos2(8)

•• I



yd 2 +

- YI)(XZ -

(1-21)
By definition, this is the equatio n of an ellipse centere d at Q. The graph
of such an
equatio n is displayed in Figure 1.24. The major (long) and minor (short)
axes are indicated. They are parallel to the Xl and 1'2 axes. For the choice of all,
a12, and a22 in
footnote 2, the Xl and X2 axes are at an angle () with respect to the Xl
and X2 axes.
The general ization of the distance formula s of (1-19) and (1-20)
to p dimensions is straight forward . Let P = (Xl,X2 ,""X ) be a point whose
coordin ates
p
represe nt variable s that are correlat ed and subject to inheren t
variability. Let

8

__--------~~~~----~--~Xl


yd + 2adxI

)'2) + azz(x2 -)'2?
(1-20)
and can always be comput ed once all, a12, and a22 are known. In addition
, the coordinates of all points P = (Xl, X2) that are a constan t squared distance 2
c from Q
satisfy

al1(xl -

~

(1-18)

X2= -Xl sin (8) + X2 cos (8)

0

.
).
. t
the Euclidean distance formula m
(1-12 IS appropna e.

Figure 1.23 A scatter plot for
positively correlated
measurements and a rotated
coordinate system.

35

sin2(6)
all = coS1(O)SIl + 2sin(6)co s(/I)SI2 + sin2(O)s12 + cos2(8)S22 - 2sin(8)oo
s(8)sl2 + sin2(8}slI
2
sin2(/I}
oos (8)
a22 = cos2(8}SII + 2 sin(lI}cOS(8}SI2 + sin2(6)S22 + cos2(9)sn - 2sin(8)oos
(/I}SI2 + sin2(8)sll

and

cos(lI) sin(/I}
sin(6} oos(/I}
al2 = cos2(II)SIl + 2 sin(8) cos(8)sl2 + sin2(8)~2 - cog2(/J)S22 - 2 sin(/J}
ooS(6)812 + sin2(/I}sll

36

Exercises 37

Chapter 1 Aspects of Multivariate Analysis
X2

• • •••.
.
..
.
.
...
..••.... •.
..
••••••
-... ...

/

.•....• :•••®:..- .

••

• ••••

P@ ••• :.-. -••

• • •

""

/

/
/

"

"

Figure 1.24 Ellipse of points
a constant distance from the
point Q.

(0 0

________________~________~______~__~----~~
allx1 + a22x~ + ... + appx~ + 2a12xlx2 + 2a13Xlx3 + ... + 2a p_l,px p_IX p
(1-22)

forms.~

d(O,P) =

and
[aJ1(xI d(P,Q)

yd + a22(x2 +

Y2)2 + .. , + app(xp Yp)2 + 2an(xI
YI)(X 2__ Y2)
2a13(XI - YI)(X3 - Y:l) + ... + 2ap-l,p(xp-1 - Yp-I)(X p Yp)]
(1-23)
.

r::: ::~ :::]
la]p a2p

d(P, Q) = d(Q, P)

*

d(P,Q) > OifP
Q
d(P,Q) = OifP = Q
d(P,Q) :5 d(P,R) + d(R,Q)

3

where the a's are numbers such that the distances are always nonnegatIve. .
We note that the distances in (1-22) and (1-23) are completely dete~~llned by
.)
. - 1, 2 , ... , p, k. -- 1,'2 , ... , P. These coeffIcIents can
.
the coeffiCIents
(weIghts
aik> I be set out in the rectangular array

(triangle inequality)

We have attempted to motivate the study of multivariate analysis and to provide
you with some rudimentary, but important, methods for organizing, summarizing,
and displaying data. In addition, a general concept of distance has been introduced
that will be used repeatedly in later chapters.

a: p

*

lJbe 81 ebraic expressions for the squares of the distances in ,- Compute the X, Sn, and R arrays. Notice the magnitudes of the correlation
coefficients as you go from the shorter (lOO-meter) to the longer (marathon) ruHning
distances. Interpret ihese pairwise correlations.
1.18. Convert the national track records for women in Table 1.9 to speeds measured in meters
per second. For example, the record speed for the lOO-m dash for Argentinian women is
100 m/1l.57 sec = 8.643 m/sec. Notice that the records for the 800-m, 1500-m, 3000-m
and marathon runs are measured in minutes. The marathon is 26.2 miles, or 42,195
meters, long. Compute the X, Sn, and R arrays. Notice the magnitudes of the correlation
coefficients as you go from the shorter (100 m) to the longer (marathon) running distances.
Interpret these pairwise correlations. Compare your results with the results you obtained
in Exercise 1.17.
1.19. Create the scatter plot and boxplot displays of Figure l.5 for (a) the mineral-content
data in Table 1.8 and (b) the national-track-records data in Table 1.9.

Exercises 45

44 Chapter 1 Aspects of Multivariate Analysis

Table 1.9 National Track Records for Women
Country

lOOm
(s)

200 m
(s)

400 m
(s)

800 m
(min)

1500 m
(min)

3000 m
(min)

Marathon
(min)

Argentina
Australia
Austria
Belgium
Bermuda
Brazil
Canada
Chile
China
Columbia
Cook Islands
Costa Rica
Czech Republic
Denmark
Dominican Republic
Finland
France
Germany
Great Britain
Greece
Guatemala
Hungary
India
Indonesia
Ireland
Israel
Italy
Japan
Kenya
Korea, South
Korea, North
Luxembourg
Malaysia
Mauritius
Mexico
Myanmar(Burma)
Netherlands
New Zealand
Norway
Papua New Guinea
Philippines
Poland
Portugal
Romania
Russia
Samoa

11.57
11.12
11.15
11.14
11.46
11.17
10.98
11.65
10.79
11.31
12.52
11.72
11.09
11.42
11.63
11.13
10.73
10.81
11.10
10.83
11.92
11.41
11.56
11.38
11.43
11.45
11.14
11.36
11.62
11.49
11.80
11.76
11.50
11.72
11.09
11.66
11.08
11.32
11.41
11.96
11.28
10.93
11.30
11.30
10.77
12.38

22.94
-22.23
22.70
22.48
23.05
22.60
22.62
23.84
22.01
22.92
25.91
23.92
21.97
23.36
23.91
22.39
21.99
21.71
22.10
22.67
24.50
23.06
23.86
22.82
23.02
23.15
22.60
23.33
23.37
23.80
25.10
23.96
23.37
23.83
23.13
23.69
22.81
23.13
23.31
24.68
23.35
22.13
22.88
22.35
21.87
25.45

52.50
48.63
50.62
51.45
53.30
50.62
49.9153.68
49.81
49.64
61.65
52.57
47.99
52.92
53.02
50.14
48.25
47.60
49.43
50.56
55.64
51.50
55.08
51.05
51.07
52.06
51.31
51.93
51.56
53.67
56.23
56:07
52.56
54.62
48.89
52.96
51.35
51.60
52.45
55.18
54.75
49.28
51.92
49.88
49.11
56.32

2.05
1.98
1.94
1.97
2.07
1.97
1.97
2.00
1.93
2.04
2.28
2.10
1.89
2.02
2.09
2.01
1.94
1.92
1.94
2.00
2.15
1.99
2.10
2.00
2.01
2.07
1.96
2.01
1.97
2.09
1.97
2.07
2.12
2.06
2.02
2.03
1.93
1.97
2.03
2.24
2.12
1.95
1.98
1.92
1.91
2.29

4.25
4.02
4.05
4.08
4.29
4.17
4.00
4.22
3.84
4.34
4.82
4.52
4.03
4.12
4.54
4.10
4.03
3.96
3.97
4.09
4.48
4.02
4.36
4.10
3.98
4.24
3.98
4.16
3.96
4.24
4.25
4.35
4.39
4.33
4.19
4.20
4.06
4.10
4.01
4.62
4.41
3.99
3.96
3.90
3.87
5.42

9.19
8.63
8.78
8.82
9.81
9.04
8.54
9.26
8.10
9.37
11.10
9.84
8.87
8.71
9.89
8.69
8.64
8.51
8.37
8.96
9.71
8.55
9.50
9.11
8.36
9.33
8.59
8.74
8.39
9.01
8.96
9.21
9.31
9.24
8.89
9.08
8.57
8.76
8.53
10.21
9.81
8.53
8.50
8.36
8.38
13.12

150.32
143.51
154.35
143.05
174.18
147.41
148.36
152.23
139.39
155.19
212.33
164.33
145.19
149.34
166.46
148.00
148.27
141.45
135.25
153.40
171.33
148.50
154.29
158.10
142.23
156.36
143.47
139.41
138.47
146.12
145.31
149.23
169.28
167.09
144.06
158.42
143.43
146.46
141.06
221.14
165.48
144.18
143.29
142.50
141.31
191.58
(continues)

Country
Singapore
Spain
Sweden
Switzerland
Taiwan
. Thailand
Thrkey
U.S.A.

lOOm
(s)

200 m
(s)

400 m
(s)

BOOm
(min)

1500 m
(min)

3000 m
(min)

Marathon
(min)

12.13
11.06
11.16
11.34
11.22
11.33
11.25
10.49

24.54
22.38
22.82
22.88
22.56
23.30
22.71
21.34

55.08
49.67
51.69
51.32
52.74
52.60
53.15
48.83

2.12
1.96
1.99
1.98
2.08
2.06
2.01
1.94

4.52
4.01
4.09
3.97
4.38
4.38
3.92
3.95

9.94
8.48
8.81
8.60
9.63
10.07
8.53
8.43

154.41
146.51
150.39
145.51
159.53
162.39
151.43
141.16

Source: IAAFIATFS T,ack and Field Ha])dbook fo, Helsinki 2005 (courtesy of Ottavio Castellini).

1.20. Refer to the bankruptcy data in Table 11.4, page 657, and on the following website
www.prenhall.com/statistics.Using appropriate computer software,
(a) View the entire data set in Xl, X2, X3 space. Rotate the coordinate axes in various
directions. Check for unusual observations.
(b) Highlight the set of points corresponding to the bankrupt firms. Examine various
three-dimensional perspectives. Are there some orientations of three-dimensional
space for which the bankrupt firms can be distinguished from the nonbankrupt
firins? Are there observations in each of the two groups that are likely to have a significant impact on any rule developed to classify firms based on the sample mearis,
variances, and covariances calculated from these data? (See Exercise 11.24.)
1.21. Refer to the milk transportation-cost data in Thble 6.10, page 345, and on the web at
www.prenhall.com/statistics.Using appropriate computer software,

(a) View the entire data set in three dimensions. Rotate the coordinate axes in various
directions. Check for unusual observations.
(b) Highlight the set of points corresponding to gasoline trucks. Do any of the gasolinetruck points appear to be multivariate outliers? (See Exercise 6.17.) Are there some
orientations of Xl, X2, X3 space for which the set of points representing gasoline
trucks can be readily distinguished from the set of points representing diesel trucks?
1.22. Refer to the oxygen-consumption data in Table 6.12, page 348, and on the web at
www.prehhall.com/statistics.Using appropriate computer software,
(a) View the entire data set in three dimensions employing various combinations of
. three variables to represent the coordinate axes. Begin with the Xl, X2, X3 space.
(b) Check this data set for outliers.
1.23. Using the data in Table 11.9, page 666, and on the web at www.prenhall.coml
statistics, represent the cereals in each of the following ways.
(a) Stars.
(b) Chemoff faces. (Experiment with the assignment of variables to facial characteristics.)
1.24. Using the utility data in Table 12.4, page 688, and on the web at www.prenhalI.
cornlstatistics, represent the public utility companies as Chemoff faces with assignments of variables to facial characteristics different from those considered in Example 1.12. Compare your faces with the faces in Figure 1.17. Are different groupings
indicated?

46

Chapter 1 Aspects of Multivariate Analysis

References 47

1.25. Using the data in Table 12.4 and on the web at www.prenhall.com/statistics.represent the
22 public utility companies as stars. Visually group the companies into four or five
clusters.
1.26. The data in Thble 1.10 (see the bull data on the web at www.prenhaIl.com!statistics) are
the measured characteristics of 76 young (less than two years old) bulls sold at auction.
Also included in the taBle are the selling prices (SalePr) of these bulls. The column headings (variables) are defined as follows:
I Angus
Breed = 5 Hereford
{
8 Simental

Y rHgt = Yearling height at
shoulder (inches)

FtFrBody = Fat free body
(pounds)

PrctFFB = Percent fat-free
body

Frame = Scale from 1 (small)
to 8 (large)

BkFat = Back fat
(inches)

SaleHt = Sale height at
shoulder (inches)

SaleWt = Sale weight
(pounds)

Table 1.10 Data on Bulls

1
1
1
1
1

SalePr
2200
2250
. 1625
4600
2150

YrHgt

FtFrBody

PrctFFB

Frame

BkFat

SaleHt

SaleWt

51.0
51.9
49.9
53.1
51.2

1128
1108
1011
993
996

70.9
72.1
71.6
68.9
68.6

7
7
6
8
7

.25
.25
.15
.35
.25

54.8
55.3
53.1
56.4
55.0

1720
1575
1410
1595
1488

.10
.15

55.2
54.6
53.9
54.9
55.1

1454
1475
1375
1564
1458

:

8
8
8
8
8

1450
1200
1425
1250
1500

51.4
49.8

SO.O
50.1
51.7

997
991
928
990
992

(c) Would the correlation in Part b change if you measure size in square miles instead of
acres? Explain.
Table 1.11 Attendance and Size of National Parks
N ationaI Park

(a) Compute the X, Sn, and R arrays. Interpret the pairwise correlations. Do some of
these variables appear to distinguish one breed from another?
(b) View the data in three dimensions using the variables Breed, Frame, and BkFat. Rotate the coordinate axes in various directions. Check for outliers. Are the breeds well
separated in this coordinate system?
(c) Repeat part b using Breed, FtFrBody, and SaleHt. Which-three-dimensionaI display
appears to result in the best separation of the three breeds of bulls?

Breed

(b) Identify the park that is unusual. Drop this point andrecaIculate the correlation
coefficient. Comment on the effect of this one point on correlation.

73.4
70.8
70.8
71.0
70.6

7
6
6
6
7

.10
.10
.15

:

Source: Data courtesy of Mark EIIersieck.
1.27. Table 1.11 presents the 2005 attendance (millions) at the fIfteen most visited national
parks and their size (acres).

(a) Create a scatter plot and calculate the correlliltion coefficient.

Arcadia
Bruce Canyon
Cuyahoga Valley
Everglades
Grand Canyon
Grand Teton
Great Smoky
Hot Springs
Olympic
Mount Rainier
Rocky Mountain
Shenandoah .
Yellowstone
Yosemite
Zion

Size (acres)

Visitors (millions)

47.4
35.8
32.9
1508.5
1217.4
310.0
521.8
5.6
922.7
235.6
265.8
199.0
2219.8
761.3
146.6

2.05
1.02
2.53
1.23
4.40
2.46
9.19
1.34
3.14
1.17
2.80
1.09
2.84
3.30
2.59

References
1. Becker, R. A., W. S. Cleveland, and A. R. Wilks. "Dynamic Graphics for Data Analysis."
Statistical Science, 2, no. 4 (1987),355-395.

2. Benjamin, Y, and M. Igbaria. "Clustering Categories for Better Prediction of Computer
Resources Utilization." Applied Statistics, 40, no. 2 (1991),295-307.
3. Capon, N., 1. Farley, D. Lehman, and 1. Hulbert. "Profiles of Product Innovators among
Large U. S. Manufacturers." Management Science, 38, no. 2 (1992), 157-169.
4. Chernoff, H. "Using Faces to Represent Points in K-Dimensional Space Graphically."
Journal of the American Statistital Association, 68, no. 342 (1973),361-368.
5. Cochran, W. G. Sampling Techniques (3rd ed.). New York: John Wiley, 1977.
6. Cochran, W. G., and G. M. Cox. Experimental Designs (2nd ed., paperback). New York:
John Wiley, 1992.
7. Davis, J. C. "Information Contained in Sediment Size Analysis." Mathematical Geology,
2, no. 2 (1970), 105-112.
8. Dawkins, B. "Multivariate Analysis of National Track Records." The American Statistician, 43, no. 2 (1989), 110-115.
9. Dudoit, S., 1. Fridlyand, and T. P. Speed. "Comparison of Discrimination Methods for the
Classification ofThmors Using Gene Expression Data." Journal of the American Statistical Association, 97, no. 457 (2002),77-87.
10. Dunham, R. B., and D. 1. Kravetz. "Canonical Correlation Analysis in a Predictive System."
Journal of Experimental Education, 43, no. 4 (1975),35-42.

48

Chapter 1 Aspects of Multivariate Analysis
11. Everitt, B. Graphical Techniques for Multivariate Data. New York: North-Holland, 1978.
12. Gable, G. G. "A Multidimensional Model of Client Success when Engaging External
Consultants." Management Science, 42, no. 8 (1996) 1175-1198.
13. Halinar, 1. C. "Principal Component Analysis in Plant Breeding." Unpublished report
based on data collected by Dr. F. A. Bliss, University of Wisconsin, 1979.
14. Johnson, R. A., and 6. K. Bhattacharyya. Statistics: Principles and Methods (5th ed.).
New York: John Wiley, 2005.
15. Kim, L., and Y. Kim. "Innovation in a Newly Industrializing Country: A Multiple
Discriminant Analysis." Management Science, 31, no. 3 (1985) 312-322.
16. Klatzky, S. R., and R. W. Hodge. "A Canonical Correlation Analysis of Occupational
Mobility." Journal of the American Statistical Association, 66, no. 333 (1971),16--22.
17. Lee, 1., "Relationships Between Properties of Pulp-Fibre and Paper." Unpublished
doctoral thesis, University of Toronto. Faculty of Forestry (1992).
18. MacCrimmon, K., and D. Wehrung. "Characteristics of Risk Taking Executives."
Management Science, 36, no. 4 (1990),422-435.
19. Marriott, F. H. C. The Interpretation of Multiple Observations. London: Academic Press,
1974.
20. Mather, P. M. "Study of Factors Influencing Variation in Size Characteristics in FIuvioglacial Sediments." Mathematical Geology, 4, no. 3 (1972),219-234.
21. McLaughlin, M., et al. "Professional Mediators' Judgments of Mediation Tactics: Multidimensional Scaling and Cluster Analysis." Journal of Applied Psychology, 76, no. 3
(1991),465-473.
22. Naik, D. N., and R. Khattree. "Revisiting Olympic Track Records: Some Practical Considerations in the Principal Component Analysis." The American Statistician, 50, no. 2
(1996),140-144.
23. Nason, G. "Three-dimensional Projection Pursuit." Applied Statistics, 44, no. 4 (1995),
411-430.
24. Smith, M., and R. Taffler. "Improving the Communication Function of Published
Accounting Statements." Accounting and Business Research, 14, no. 54 (1984), 139...:146.
25. Spenner, K.1. "From Generation to Generation: The nansmission of Occupation." Ph.D.
dissertation, University of Wisconsin, 1977.
26. Tabakoff, B., et al. "Differences in Platelet Enzyme Activity between Alcoholics and
Nonalcoholics." New England Journal of Medicine, 318, no. 3 (1988),134-139.
27. Timm, N. H. Multivariate Analysis with Applications in Education and Psychology.
Monterey, CA: Brooks/Cole, 1975.
28. Trieschmann, J. S., and G. E. Pinches. "A Multivariate Model for Predicting Financially
Distressed P-L Insurers." Journal of Risk and Insurance, 40, no. 3 (1973),327-338.
29. Thkey, 1. W. Exploratory Data Analysis. Reading, MA: Addison-Wesley, 1977.
30. Wainer, H., and D. Thissen. "Graphical Data Analysis." Annual Review of Psychology,
32, (1981), 191-241.
31. Wartzman, R. "Don't Wave a Red Flag at the IRS." The Wall Street Journal (February 24,
1993), Cl, C15.
32. Weihs, C., and H. Schmidli. "OMEGA (On Line Multivariate Exploratory Graphical
Analysis): Routine Searching for Structure." Statistical Science, 5, no. 2 (1990), 175-226.

MATRIX ALGEBRA
AND RANDOM VECTORS
2.1 Introduction
We saw in Chapter 1 that multivariate data can be conveniently displayed as an
array of numbers. In general, a rectangular array of numbers with, for instance, n
rows and p columns is called a matrix of dimension n X p. The study of multivariate
methods is greatly facilitated by the use of matrix algebra.
The matrix algebra results presented in this chapter will enable us to concisely
state statistical models. Moreover, the formal relations expressed in matrix terms
are easily programmed on computers to allow the routine calculation of important
statistical quantities.
We begin by introducing some very basic concepts that are essential to both our
geometrical interpretations and algebraic explanations of subsequent statistical
techniques. If you have not been previously exposed to the rudiments of matrix algebra, you may prefer to follow the brief refresher in the next section by the more
detailed review provided in Supplement 2A.

2.2 Some Basics of Matrix and Vector Algebra
Vectors
An array x of n real numbers

x =

Xl, X2, • •. , Xn

lrx:.:n:J

is called a vector, and it is written as

or x' =

(Xl> X2, ... ,

x ll ]

where the prime denotes the operation of transposing a column to a row.
49

Some Basics of Matrix and Vector Algebra 51

50 Chapter 2 Matrix Algebra and Random Vectors

1\vo vectors may be added. Addition of x and y is defined as
2 _________________
~,,/
;__

'
I
I

x+y=

:
I
I
I

I

I

I

I

I

:

I

X2

:

+

[.

,

l' __________________ ,,!,'

Figure 2.1 The vector x' = [1,3,2].

A vector x can be represented geometrically as a directed line in n dimensions
with component along the first axis, X2 along the second axis, .,. , and Xn along the
nth axis. This is illustrated in Figure 2.1 for n = 3.
A vector can be expanded or contracted by mUltiplying it by a constant c. In
particular, we define the vector c x as

XI

cx

=

.:

=

Yn

Xn

OI~~----------,i~3~1--~~
I

XI] [YI]
[XI ++ Y2YI]
Y2

~

,/'

X2

:

.
xn

+ Yn

so that x + y is the vector with ith element Xi + Yi'
The sum of two vectors emanating from the origin is the diagonal of the parallelogram formed with the two original vectors as adjacent sides. This geometrical
interpretation is illustrated in Figure 2.2(b).
A vector has both direction and length. In n = 2 dimensions, we consider the
vector
x =

[:J

The length of x, written L., is defined to be
L. =

v'xI + x~

Geometrically, the length of a vector in two dimensions can be viewed as the
hypotenuse of a right triangle. This is demonstrated schematicaIly in Figure 2.3.
The length of a vector x' =
X2,"" xn], with n components, is defined by

[XI,

CXI]'
CX2

Lx =

.

[ CXn

v'xI

+ x~ + ... + x~

(2-1)

Multiplication of a vector x by a scalar c changes the length. From Equation (2-1),

Le. = v'c2xt + c2X~ + .. , + c2x~
That is, cx is the vector obtained by multiplying each element of x by c. [See
Figure 2.2(a).]

= Ic Iv'XI + x~ + ... + x~ = Ic ILx
Multiplication by c does not change the direction of the vector x if c > O.
However, a negative value of c creates a vector with a direction opposite that of x.
From

2

Lex
2

=

/elL.

(2-2)

it is clear that x is expanded if I cl> 1 and contracted -if 0 < Ic I < 1. [Recall
Figure 2.2(a).] Choosing c = L;I, we obtain the unit vector
which has length 1
and lies in the direction of x.

L;IX,

2

(a)

Figure 2.2 Scalar multiplication and vector addition.

(b)

Figure 2.3

Length of x = v'xi + x~.

Cbapte r2

Some Basics of Matrix and Vector Algebra ,53

Matrix Algebra and Random Vectors

Using the inner product, we have the natural extension of length and angle to
vectors of n components:

52
2

Lx
cos (0)

= length ofx = ~
x'y
= --

LxLy

x

(2-5)

x/y

= -=-cc-=-~
W; -vy;y

(2-6)

Since, again, cos (8) = 0 only if x/y = 0, we say that x and y are perpendicular
whenx/y = O.

Figure 2.4 The angle 8 between
x' = [xI,x21andy' = [YI,YZ)·

A second geometrical conc~pt is angle. Consider. two vectors in a plane and the
le 8 between them, as in Figure 2.4. From the figure, 8 can be represented. as
ang difference between the angles 81 and 82 formed by the two vectors and the fITSt
the inate axis. Since,
.
b d f· ..
y e ImtJon,
coord
YI
COS(02) = L

Example 2.1 (Calculating lengths of vectors and the angle between them) Given the
vectors x' = [1,3,2) and y' = [-2,1, -IJ, find 3x and x + y. Next, determine
the length of x, the length of y, and the angle between x and y. Also, check that
the length of 3x is three times the length of x.
First,

y

sin(02)

=~

y

and

cos(o)

le
the ang

= cos(Oz -

°

1) =

cos (82) cos (0 1 ) + sin (02 ) sin (oil

°between the two vectors x' = [Xl> X2) and y' = [Yl> Y2] is specified by

cos(O)

=

cos (02 - oil

=

(rJ (~J (Z) (Z)
+

= XIY~:L:2Y2

(2-3)

We find it convenient to introduce the inner product of two vectors. For n
dimensions, the inner product of x and y is
x'y = XIYl

=

2

Next, x'x = l z + 32 + 22 = 14, y'y
1(-2) + 3(1) + 2(-1) = -1. Therefore,

Lx

=

= (-2)Z + 12 +

Wx = v'I4 = 3.742

Ly

=

cos(O)

x'y
LxLy

= -- =

-1

.

3.742 X 2.449

CIX
Since cos(900) = cos (270°) = 0 and cos(O) = 0 only if x'y = 0, x and y are
e endicular when x'y = O.
.
P rpFor an arbitrary number of dimensions n, we define the Inner product of x
andyas

1be inner product is denoted by either x'y or y'x.

2.449

= -.109

3L x = 3 v'I4 = v126

A pair of vectors x and y of the same dimension is said to be linearly dependent
if there exist constants Cl and C2, both not zero, such that

x'y
x'y
cos(O) = L L =. ~. ~
x.y
vx'x vy'y

x/y = XIYI + XzY2 + ... + xnYn

=



showing L 3x = 3L x.

Wx

and x'y

so 0 = 96.3°. Finally,

With this definition and Equation (2-3),

Lx =

-vy;y = V6 =

= 6,

and

L 3x = V3 2 + 92 + 62 = v126 and

+ XzY'2

(-1)2

(2-4)

+ C2Y

= 0

A set of vectors Xl, Xz, ... , Xk is said to be linearly dependent if there exist constants
Cl, Cz, ... , Cb not all zero, such that
(2-7)
Linear dependence implies that at least one vector in the set can be written as a
linear combination of the other vectors. Vectors of the same dimension that are not
linearly dependent are said to be linearly independent.

54

Some Basics of Matrix and Vector Algebra 55

Chapter 2 Matrix Algebra and Random Vectors

Example 2.2 (Identifying linearly independent vectors) Consider the set of vectors

Many of the vector concepts just introduced have direct generalizations to matrices.
The transpose operation A' of a matrix changes the columns into rows, so that
the first column of A becomes the first row of A', the second column becomes the
second row, and so forth.
Example 2.3 (The transpose of a matrix) If

Setting

A_[3

+

Cl': C2
2Cl

-

+

Cl - C2

C3

=0

2C3

= 0

-1

1

(2X3)

implies that

2J

5 4

then

C3 = 0

A'
(3X2)

with the unique solution Cl = C2 = C3 = O. As we cannot find three constants Cl, C2,
and C3, not all zero, such that Cl Xl + C2 X2 + C3 x3 = 0, the vectors Xl, x2, and X3 are
linearly independent.


=

[-~ ~]
2

4



A matrix may also be multiplied by a constant c. The product cA is the matrix
that results from multiplying each element of A by c. Thus

The projection (or shadow) of a vector x on a vector y is
(x'y) 1

(x'y)

= -,-y = - L -L Y

Projectionofxony

YY

y

(2-8)

cA =
(nXp)

y

where the vector L~ly has unit length. The length of the projection is

..

Length of projectIOn =

I x'y I = Lx ILx'yL
--z:-

I

x y

y

= Lxi cos (B) I

(2-9)

[

lP]

call

ca12

...

ca

C~2l

C~22

•..•

C~2P

:
:
'.
can 1 ca n 2 ...

:
ca np

1\vo matrices A and B of the same dimensions can be added. The sum A
(i,j)th entry aij + bij .

+ B has

where B is the angle between x and y. (See Figure 2.5.)
Example 2.4 (The sum of two matrices and multiplication of a matrix by a constant)
If

A

3
1 -1

_ [0

(2X3)

G:~)Y

4A = [0

Figure 2.5 The projection of x on y.

(2X3)

A + B

Matrices

(2X3)

A matrix is any rectangular array of real numbers. We denote an arbitrary array of n
rows and p columns by

A =
(nXp)
[

B _ [1
(2X3)
2

-2
5

-~J

then

• y

1--4 cos ( 9 ) - - l

~J

and

all
a21
.
:

a12
a22
.
:

anI

a n2

alP]
a2p
'"

anp

(2X3)

4

12
and
-4 :J

3-2 1-3J=[11
= [0 + 1
1 + 2 -1 + 5 1 + 1
3 4

-~J



It is also possible to define the multiplication of two matrices if the dimensions
of the matrices conform in the following manner: When A is (n X k) and B is
(k X p), so that the number of elements in a row of A is the same as the number of
elements in a column of B, we can form the matrix product AB. An element of the
new matrix AB is formed by taking the inner product of each row of A with each
column ofB.

56 Chapter 2 Matrix Algebra and Random Vectors

Some Basics of Matrix and Vector Algebra

The matrix product AB is
A

B

When a matrix B consists of a single column, it is customary to use the lowercase b vector notation.

the (n X p) matrix whose entry in the ith row
and jth column is the inner product of the ith row
of A and the jth column of B

=

(nXk)(kXp)

57

Example 2.6 (Some typical products and their dimensions) Let

or
k

(i,j) entry of AB

= ail blj +

ai2b 2j

+ ... + aikbkj =

L

a;cbtj

(2-10)

t=1

When k = 4, we have four products to add for each· entry in the matrix AB. Thus,
a12

A

.

[a"

B =

(at!
:

(nx4)(4Xp)

anI

a13

a,2

an2

ai3

a n3

b11 ...
...

b 1j

al~:

b 2j

b 41

b 4j

a; 4)

a n4

b 3j

Then Ab,bc',b'c, and d'Ab are typical products.

~'l

b 2p

...
...

b 3p

b 4p

Column
j

The product A b is a vector with dimension equal to the number of rows of A.

~ Row {- . (a" ~I + a,,1>,1 + a,,1>,1 + a"b,J.. -]

~ [7

b',

-3 6) [

-!J ~

1-13)

Example 2.5 (Matrix multiplication) If

The product b' c is a 1

X

1 vector or a single number, here -13.

3 -1 2J

A= [ 1

54'

bc' =

then
3
A B = [
(2X3)(3Xl)
1

-1 2J [-2] = [3(-2) + (-1)(7) + 2(9)J
5 4
~
1( -2) + 5(7)
+ 4(9)

[

7]

-3 [5 8 -4] =
6

[35 56
-15 -24
30
48

-28]
12
-24

The product b c' is a matrix whose row dimension equals the dimension of band
whose column dimension equals that of c. This product is unlike b' c, which is a
single number.

and

(2~2)(2~3)

-

G-~J[~ -! !J
+ 0(1)
1(3) - 1(1)

= [2(3)

=

[~

-2
4J
-6 -2
(2x3)

2(-1) + 0(5) 2(2) + 0(4)J
1(-1) - 1(5) 1(2) - 1(4)

The product d' A b is a 1



X

1 vector or a single number, here 26.



Square matrices will be of special importance in our development of statistical
methods. A square matrix is said to be symmetric if A = A' or aij = aji for all i
andj.

58 Chapter 2 Matrix Algebra and Random Vectors

Some Basics of Matrix and Vector Algebra 59
so

Example 2.1 (A symmetric matrix) The matrix

-.2
.8

[
is A-I. We note that

is symmetric; the matrix



is not symmetric.

When two square matrices A and B are of the same dimension, both products
AB and BA are defined, although they need not be equal. (See Supplement 2A.)
If we let I denote the square matrix with ones on the diagonal and zeros elsewhere,
it follows from the definition of matrix multiplication that the (i, j)th entry of
AI is ail X 0 + ... + ai.j-I X 0 + aij X 1 + ai.j+1 X 0 + .. , + aik X 0 = aij, so
AI = A. Similarly, lA = A, so
I

.4J
-.6

A

(kXk)(kxk)

=

A

I

(kxk)(kXk)

=

A

(kXk)

for any A

(2-11)

(kxk)

The matrix I acts like 1 in ordinary multiplication (1· a = a '1= a), so it is
called the identity matrix.
The fundamental scalar relation about the existence of an inverse number a-I
such that a-la = aa-I = 1 if a =f. 0 has the following matrix algebra extension: If
there exists a matrix B such that

implies that Cl = C2 = 0, so the columns of A are linearly independent. This

confirms the condition stated in (2-12).
A method for computing an inverse, when one exists, is given in Supplement 2A.
The routine, but lengthy, calculations are usually relegated to a computer, especially
when the dimension is greater than three. Even so, you must be forewarned that if
the column sum in (2-12) is nearly 0 for some constants Cl, .•. , Ck, then the computer
may produce incorrect inverses due to extreme errors in rounding. It is always good
to check the products AA-I and A-I A for equality with I when A-I is produced by a
computer package. (See Exercise 2.10.)
Diagonal matrices have inverses that are easy to compute. For example,

1
all

0

BA=AB=I

(kXk)(kXk)

(kXk)(kXk)

a22

(kXk)

then B is called the inverse of A and is denoted by A-I.
The technical condition that an inverse exists is that the k columns aI, a2, ... , ak
of A are linearly indeperident. That is, the existence of A-I is equivalent to

[1

0
0
0

0
0
a33

0
0

0
0
0
a44

0

~ 1h~mvm'

0

a55

~J

QQ' = Q'Q

you may verify that
[

-.2
.8

.4J [34

-.6

2J =
1

=

[(-.2)3
+ (.4)4
(.8)3 + (-.6)4
[~ ~J

o

o

1

o

o

o

1

o

o

0

0

0

0

o

1

o

o

o

o

o

1

if all the aH =f. O.
Another special class of square matrices with which we shall become familiar
are the orthogonal matrices, characterized by

Example 2.8 (The existence of a matrix inverse) For

A=[!

o

a22

(2-12)
(See Result 2A.9 in Supplement 2A.)

0

(-.2)2
(.8)2

+ (.4)1
+ (-.6)1

J

=I

or

Q'

= Q-I

(2-13)

The name derives from the property that if Q has ith row qi, then QQ' = I implies
that qiqi ;: 1 and qiqj = 0 for i =f. j, so the rows have unit length and are mutually
perpendicular (orthogonal).According to the condition Q'Q = I, the columns have
the same property.
We conclude our brief introduction to the elements of matrix algebra by introducing a concept fundamental to multivariate statistical analysis. A square matrix A
is said to have an eigenvalue A, with corresponding eigenvector x =f. 0, if

Ax

=

AX

(2-14)

,p
Positive Definite Matrices 61

60 Chapter 2 Matrix Algebra and Random Vectors

Ordinarily, we normalize x so that it has length unity; that is, 1 = x'x. It is
convenient to denote normalized eigenvectors bye, and we do so in what follows.
Sparing you the details of the derivation (see [1 D, we state the following basic result:
Let A be a k X k square symmetric matrix. Then A has k pairs of eigenvalues
and eigenvectors-namely,

multivariate analysis. In this section, we consider quadratic forms that are always
nonnegative and the associated positive definite matrices.
Results involving quadratic forms and symmetric matrices are, in many cases,
a direct consequence of an expansion for symmetric matrices known as the
spectral decomposition. The spectral decomposition of a k X k symmetric matrix
A is given by1

(2-15)
The eigenvectors can be chosen to satisfy 1 = e; el = ... = e"ek and be mutually
perpendicular. The eigenvectors· are unique unless two or more eigenvalues
are equal.

Example 2.9 (Verifying eigenvalues and eigenvectors) Let

-[1 -5J

A -

-.

-5

A
(kXk)

= Al e1

e;

(kX1)(lxk)

+ ..1.2 e2 ez + ... + Ak ek eA:
(kX1)(lXk)

(2-16)

(kx1)(lXk)

where AI, A2, ... , Ak are the eigenvalues of A and el, e2, ... , ek are the associated
normalized eigenvectors. (See also Result 2A.14 in Supplement 2A). Thus, eiei = 1
for i = 1,2, ... , k, and e:ej = 0 for i j.

*

Example 2.1 0 (The spectral decomposition of a matrix) Consider the symmetric matrix

1

Then, since

A =

[

13 -4 2]

-4
2

13
-2

-2
10

The eigenvalues obtained from the characteristic equation I A - AI I = 0 are
Al = 9, A2 = 9, and ..1.3 = 18 (Definition 2A.30). The corresponding eigenvectors
el, e2, and e3 are the (normalized) solutions of the equations Aei = Aiei for
i = 1,2,3. Thus, Ael = Ae1 gives

Al = 6 is an eigenvalue, and

or
is its corresponding normalized eigenvector. You may wish to show that a second
eigenvalue--eigenvector pair is ..1.2 = -4,
= [1/v'2,I/\I2].


ez

13ell - 4ell

+

2el1 -

A method for calculating the A's and e's is described in Supplement 2A. It is instructive to do a few sample calculations to understand the technique. We usually rely
on a computer when the dimension of the square matrix is greater than two or three.

2.3 Positive Definite Matrices
The study of the variation and interrelationships in multivariate data is often based
upon distances and the assumption that the data are multivariate normally distributed.
Squared distances (see Chapter 1) and the multivariate normal density can be
expressed in terms of matrix products called quadratic forms (see Chapter 4).
Consequently, it should not be surprising that quadratic forms play a central role in

4e21

+

13e21 -

2e21

2e31 = gel1

2e31 = ge21
= ge31

+ 10e31

Moving the terms on the right of the equals sign to the left yields three homogeneous
equations in three unknowns, but two of the equations are redundant. Selecting one of
the equations and arbitrarily setting el1 = 1 and e21 = 1, we find that e31 = O. Consequently, the normalized eigenvector is e; = [1/VI2 + 12 + 02, I/VI2 + 12 + 02,
0/V12 + 12 + 02] = [1/\12, 1/\12,0], since the sum of the squares of its elements
is unity. You may verify that ez = [1/v18, -1/v'I8, -4/v'I8] is also an eigenvector
for 9 = A2 , and e3 = [2/3, -2/3, 1/3] is the normalized eigenvector corresponding
to the eigenvalue A3 = 18. Moreover, e:ej = 0 for i j.

*

lA proof of Equation (2-16) is beyond the scope ofthis book. The interested reader will find a proof
in [6), Chapter 8.

62

Positive Definite Matrices 63

Chapter 2 Matrix Algebra and Random Vectors

The spectral decomposition of A is then

[

A = Alelel

or

[

13 -4
-4
13
2 -2

2
-2
10

= 9

J

[~l
_1_

Vi

Example 2.11 (A positive definite matrix and quadratic form) Show that the matrix

+ Azezez + A3 e 3e 3

for the following quadratic form is positive definite:
3xI

1
Vi

(XI

o
2
3
2
3
1
3

1

VIS
+9

-1

VIS

[~

-1

-4 ]
VIS vT8 + 18

-4

VIS
1
18
1
18
4
18

1
18
-1
18
4
18

~

[~

A

O.

= Aiel ej

(ZXZ)

+

(2XIJ(IXZ)

= 4el e;

= x/Ax

Azez

ei

(ZXIJ(JXZ)

+ e2 ei

(ZXI)(IX2)

(ZXIJ(IXZ)

where el and e2 are the normalized and orthogonal eigenvectors associated with the
eigenvalues Al = 4 and Az = 1, respectively. Because 4 and 1 are scalars, premuItiplication and postmultiplication of A by x/ and x, respectively, where x/ = (XI' xz] is
any non zero vector, give

18
4
18
16
18

x/

A

x

=

4x'

= 4YI

4
9
4
18 -9
2
9

4
-9
4
9
2
9

2
9
2
9
1
9

el

ej

x

+

(I XZ)(ZXI)(I X2)(ZX 1)

(I XZ)(2xZ)(ZXI)

·x/

ez

ei

x

(IXZ)(2XI)(1 X2)(ZXI)

+ y~;:,: 0

with
YI

= x/el

= ejx

and Yz

= x/ez

= eix

We now show that YI and Yz are not both zero and, consequently, that
x/ Ax = 4YI + y~ > 0, or A is positive definite.
From the definitions of Y1 and Yz, we have



for all x/ = (XI' Xz, ... , xd, both the matrix A and the quadratic form are said to be
nonnegative definite. If equality holds in (2-17) only for the vector x/ = (0,0, ... ,0],
then A or the quadratic form is said to be positive definite. In other words, A is
positive definite if
(2-18)
0< x/Ax
~

-vJ -V;] [;J

By Definition 2A.30, the eigenvalues of A are the solutions of the equation
- AI I = 0, or (3 - A)(2 - A) - 2 = O. The solutions are Al = 4 and Az = l.
Using the spectral decomposition in (2-16), we can write

The spectral decomposition is an important analytical tool. With it, we are very
easily able to demonstrate certain statistical results. The first of these is a matrix
explanation of distance, which we now develop.
Because x/ Ax has only squared terms xt and product terms XiXb it is caIled a
quadratic form. When a k X k symmetric matrix A is such that
(2-17)
Os x/A x

for all vectors x

XZ{

IA

4
--

+

as you may readily verify.

+ 2x~ - 2Vi XlxZ

To illustrate the general approach, we first write the quadratic form in matrix
notation as

or
y
(ZXI)

=

E X
(ZX2)(ZXI)

Now E is an orthogonal matrix and hence has inverse E/. Thus, x = E/y. But x is a
nonzero vector, and 0 ~ x = E/y implies that y ~ O.

Using the spectral decomposition, we can easily show that a k X k symmetric
matrix A is a positive definite matrix if and only if every eigenvalue of A is positive.
(See Exercise 2.17.) A is a nonnegative definite matrix if and only if all of its eigenvalues are greater than or equal to zero.
Assume for the moment that the p elements XI, Xz, ... , Xp of a vector x are
realizations of p random variables XI, Xz, ... , Xp. As we pointed out in Chapter 1,

A Square-Root Matrix 65

Chapter 2 Matrix Algebra and Random Vectors

64
we can regard these elements as the coordinates of a point in p-dimensional space,
and the "distance" of the point [XI> X2,···, xpJ' to the origin can, and in this case
should, be interpreted in terms of standard deviation units. In this way, we can
account for the inherent uncertainty (variability) in the observations. Points with the
same associated "uncertainty" are regarded as being at the same distance from
the origin.
If we use the distance formula introduced in Chapter 1 [see Equation (1-22»),
the distance from the origin satisfies the general formula
(distance)2 = allxI + a22x~

+ ... + appx~
+ 2(a12xlx2 + a13 x l x 3 + ... + ap-1.p x p-lXp)

provided
that (distance)2 > 0 for all [Xl, X2,···, Xp) ~ [0,0, ... ,0). Setting a··
= ti··
.
. . '
I)
Jl'
I ~ J, I = 1,2, ... ,p, ] = 1,2, ... ,p, we have

Figure 2.6 Points a
constant distance c
from the origin
(p = 2, 1 S Al < A2)·

a2p [Xl]
X2
.. alP]
.
.
..
..
.
... a pp
Xp
or
0< (distancef

= x'Ax

forx

~

0

(2-19)

From (2-19), we see that the p X P symmetric matrix A is positive definite. In
sum, distance is determined from a positive definite quadratic form x' Ax. Conversely, a positive definite quadratic form can be interpreted as a squared distance.
Com~~nt.

L~t the squ~re of the dista~ce from the point x' = [Xl, X2, ... , Xp)
to the ongm be gIven by x A x, where A IS a p X P symmetric positive definite

matrix. Then the square of the distance from x to an arbitrary fixed point
po I = [p.1> P.2, ... , p.p) is given by the general expression (x - po)' A( x - po).
Expressing distance as the square root of a positive definite quadratic form allows us to give a geometrical interpretation based on the eigenvalues and eigenvectors of the matrix A. For example, suppose p = 2. Then the points x' = [XI, X2) of
constant distance c from the origin satisfy
x' A x = a1lx1

+ a22~ + 2a12xIX2

=

Ifp > 2, the points x' = [XI,X2,.·.,X p ) a constant distancec = v'x'Axfrom
the origin lie on hyperellipsoids c2 = AI (x'el)2 + ... + A (x'e )2 whose axes are
.
b
.
PP'
gIven y the elgenvectors of A. The half-length in the direction e· is equal to cl Vi
. 1,2, ... , p, where AI, A , ... , Ap are the eigenvalues of A. . "
I =
2

2.4 A Square-Root Matrix
The spect.ral ~ecomposition allows us to express the inverse of a square matrix in
term~ of Its elgenvalues and eigenvectors, and this leads to a useful square-root
~~

.

Let A be a k X k positive definite matrix with the spectral decomposition
k

A =

2: Aieie;. Let the normalized eigenvectors be the columns of another matrix

.=1

P = [el, e2,.'·' ed. Then

2
k

By the spectr,al decomposition, as in Example 2.11,
A = Alelei

A
(kXk)

+ A2e2ez so x'Ax = AI (x'el)2 + A2(x'e2)2

Now, c2 = AIYI + A2Y~ is an ellipse in YI = x'el and Y2 = x'e2 because AI> A2 > 0
when A is positive definite. (See Exercise 2.17.) We easily verify that x = cA I l/2el
. f·Ies x 'A x = "l
' (Clll
' -1/2'
satIs
elel )2 = 2 . S·ImiI arIy, x = cA-1/2·
2 e2 gIves the appropriate
distance in the e2 direction. Thus, the points at distance c lie on an ellipse whose axes
are given by the eigenvectors of A with lengths proportional to the reciprocals of
the square roots of the eigenvalues. The constant of proportionality is c. The situation is illustrated in Figure 2.6.

where PP'

2: Ai
;=1

ei

ej

=

(kxl)(lXk)

P

A

pI

(kXk)(kXk)(kXk)

= P'P = I and A is the diagonal matrix

o
0J
•• :

~k

with A; > 0

(2-20)

66

Chapter 2 Matrix Algebra and Random Vectors

Random Vectors and Matrices 67

Thus,

where, for each element of the matrix,2

1:

(2-21)
E(X;j) =

= PAP'(PA-Ip') = PP' = I.
Next, let A 1/2 denote the diagonal matrix with VX; as the ith diagonal element.
k
.
The matrix L VX; eje; = P A l/2p; is called the square root of A and is denoted by

L

since (PA-Ip')PAP'

j=1

AI/2.

!

Xij/ij(Xij) dxij

Xi/Pi/(Xi/)

aJlxij

if Xij is a continuous random variable with
probability density functionfu(xij)
if Xij is a discrete random variable with
probability function Pij( Xij)

Example 2.12 (Computing expected values for discrete random variables) Suppose
P = 2 and,! = 1, and consider the random vector X' = [XI ,X2 ]. Let the discrete
random vanable XI have the following probability function:

The square-root matrix, of a positive definite matrix A,
k

AI/2

= 2: VX; eje; = P A l/2p'

o

1

.3

.4

(2-22)

i=1

ThenE(XI)

=

L

xIPI(xd

=

(-1)(.3) + (0)(.3) + (1)(.4) ==.1.

a!lx!

has the following properties:

1. (N/ 2 )' = AI/2 (that is, AI/2 is symmetric).

Similarly, let the discrete random variable X 2 have the probability function

2. AI/2 AI/2 = A.
3. (AI/2) -I =

±.~

eiej = P A-1/2p', where A-1j2 is a diagonal matrix with
vA j
1/ VX; as the ith diagorial element.
j=1

4. A I/2A- I/2

= A-I/2AI/2 = I, and A- I/2A- I/2 = A-I, where A-I/2 =

Then E(X2) ==

L
all

(AI/2rl.

X2P2(X2) == (0) (.8)

+ (1) (.2) == .2.

X2

Thus,



2.5 Random Vectors and Matrices
A random vector is a vector whose elements are random variables. Similarly, a
random matrix is a matrix whose elements are random variables. The expected value
of a random matrix (or vector) is the matrix (vector) consisting of the expected
values of each of its elements. Specifically, let X = {Xij} be an n X P random
matrix. Then the expected value of X, denoted by E(X), is the n X P matrix of
numbers (if they exist)

'!Wo results involving the expectation of sums and products of matrices follow
directly from the definition of the expected value of a random matrix and the univariate
properties of expectation, E(XI + Yj) == E(XI) + E(Yj) and E(cXd = cE(XI)'
Let X and Y be random matrices of the same dimension, and let A and B be
conformable matrices of constants. Then (see Exercise 2.40)
E(X + Y) == E(X) + E(Y)

(2-24)

E(AXB) == AE(X)B
E(XIP)]
E(X2p )

E(Xd

E(Xnp )

(2-23)
2If you are unfamiliar with calculus, you should concentrate on the interpretation of the expected
value and, ~ventu~lIy, variance. Our development is based primarily on the properties of expectation
rather than Its partIcular evaluation for continuous or discrete random variables.

68

Chapter 2 Matrix Algebra and Random Vectors

Mean Vectors and Covariance Matrices 69

for all pairs of values xi, Xk, then X; and X k are said to be statistically independent.
When X; and X k are continuous random variables with joint density fik(Xi, xd and
marginal densities fi(Xi) and fk(Xk), the independence condition becomes

2.6 Mean Vectors and Covariance Matrices
SupposeX' = [Xl, x 2, .. ·, Xp] isap x 1 random vector.TheneachelementofXisa
random variable with its own marginal probability distripution; (See Example 2.12.) The
marginal means JLi and variances (Tf are defined as JLi = E (X;) and (Tt = E (Xi - JLi)2,
i = 1, 2, ... , p, respectively. Specifically,

1

!1
!

00

-00

~=

L

fik(Xi, Xk) = fi(Xi)fk(Xk)
for all pairs (Xi, Xk)'
The P continuous random variables Xl, X 2, ... , Xp are mutually statistically
independent if their joint density can be factored as

x. [.( x-) dx. if Xi is a continuous random variable with probability
'"
'density function fi( x;)

(2-28)

.

XiPi(Xi)

for all p-tuples (Xl> X2,.'" xp).
Statistical independence has an important implication for covariance. The
factorization in (2-28) implies that Cov (X;, X k ) = O. Thus,

if Xi is a discrete random variable with probability
function p;(x;)

aUXi

00

(x. - JLlt..(x-) dx. if Xi is a continuous random vari.able
'"
'with probability density function fi(Xi)

(2-25)
if X; and X k are independent

-00'

(Tf

=

L (x; -

JL;)2 p;(x;)

if Xi is a discrete random variable
with probability function P;(Xi)

The converse of (2-29) is not true in general; there are situations where
Cov(Xi , X k ) = 0, but X; and X k are not independent. (See [5].)
The means and covariances of the P X 1 random vector X can be set out as
matrices. The expected value of each element is contained in the vector of means
/L = E(X), and the P variances (T;i and the pep - 1)/2 distinct covariances
(Tik(i < k) are contained in the symmetric variance-covariance matrix
.I = E(X - /L)(X - /L)'. Specifically,

alIxj

It will be convenient in later sections to denote the marginal variances by (T;; rather
and consequently, we shall adopt this notation ..
than the more traditional
The behavior of any pair of random variables, such as X; and Xb is described by
their joint probability function, and a measure of the linear association between
them is provided by the covariance

ut,

(Tik = E(X; - JL;)(Xk - JLk)
E(X)

L L
Xi

all

xk

(X; - JLi)(Xk - JLk)Pik(Xi, Xk)

E(XI)]

[JLI]

= E(~2) = ~2 = /L
[

if X;, X k are continuous
random variables with
the joint density
functionfik(x;, Xk)
all

(2-29)

E(Xp)

(2-30)

JLp

and

if X;, X k are discrete
random variables with
joint probability
function Pike Xi, Xk)
(2-26)

and JL; and JLk, i, k = 1,2, ... , P, are the marginal means. When i = k, the covariance becomes the marginal variance.
More generally, the collective behavior of the P random variables Xl, X 2, ... , Xp
or, equivalently, the random vector X' = [Xl, X 2, ... , Xp], is described by a joint
probability density function f(XI' X2,.'" xp) = f(x). As we have already noted in
this book,f(x) will often be the multivariate normal density function. (See Chapter 4.)
If the joint probability P[ Xi :5 X; and X k :5 Xk] can be written as the product of
the corresponding marginal probabilities, so that
(2-27)

= E

[

(Xl - JLd 2
(X2 - 1Lz):(XI -

JLI)

(Xl - JLI)(X2 - JL2)
(X2 - JL2)2

(Xp - JLp)(XI -

JLI)

(Xp - JLp)(X2 - JL2)

E(XI - JLI)2
E(X2 - ILz)(XI - ILl)
=

[

E(Xp - JLP:) (Xl -

JLI)

E(XI - JLI)(X2 - JL2)
E(Xz - JLz)Z

.. , (Xl - JLI)(Xp - JLP)]
.... (X2 - JL2);(Xp ~ JLp)
(Xp - JLp)
E(XI - JLl)(Xp - JLP)]
E(X2 - ILz)(Xp - JLp)
E(Xp - JLp)2

70

Chapter 2 Matrix Algebra and Random Vectors

Mean Vectors and Covariance Matrices

or

71

'Consequently, with X' = [Xl, X21,

1T11

l: = COV(X) = IT~I

J-L = E(X)

(2-31)

= [E(XdJ = [ILIJ = [.lJ
E(X2)

[
ITpl

IL2

.2

and

l: = E(X - J-L)(X - J-L)'
Example 2.13 (Computing the covariance matrix) Find the covariance matrix for

the two random variables XI and X 2 introduced ill Example 2.12 when their joint
probability function pdxJ, X2) "is represented by the entries in the body of the
following table:

=

>z
-1
0
1

P2(X2)

We have already shown that ILl
ple 2.12.) In addition,

= E(XI - ILl? =

2:

E(Xl - J-Llf
[ E(X2 - J-L2)(XI - J-Ld

= [ITIl

XI

1T11

- E[(Xl - J-Llf
(X2 - f-L2)(X I - J-Ld

0

1

Pl(xd

.24
.16
.40

.06
.14
.00

.3
.3
.4

.8

.2

1

1T21

IT12J = [ .69
1T22
- .08

(XI - J-LI)(X2 - f-L2)]
(X2 - f-L2)2
E(Xl - J-Ll) (X2 - f-L2)]
E(X2 - J-L2)2

-.08J
.16



We note that the computation of means, variances, and covariances for discrete
random variables involves summation (as in Examples 2.12 and 2.13), while analogous computations for continuous random variables involve integration.
Because lTik = E(Xi - J-Li) (Xk - J-Lk) = ITki, it is convenient to write the
matrix appearing in (2-31) as

= E(XI) = .1 and iL2 = E(X2) = .2. (See Exam-

l: = E(X - J-L)(X -

[UU
J-L)' = ITt2

1T12
1T22

ITlp 1T2p

(XI - .1)2pl(xd

...
.,.

u"

l

1T2p

(2-32)

ITpp

all Xl

= (-1 - .1)2(.3)

1T22 = E(X2 - IL2)2

=

+ (0 - .1)2(.3) + (1 - .1)\.4)

2:
all

= (0 - .2)2(.8)
=
1T12 =

= .69

(X2 - .2)2pix2)

X2

+ (1 - .2f(.2)

.16

E(XI - ILI)(X2 - iL2)

2:

=

(Xl -

.1)(x2 - .2)PdXI' X2)

all pairs (x j, X2)

= (-1 - .1)(0 - .2)(.24)

+ (-1 - .1)(1 - .2)(.06)

+ .. , + (1 - .1)(1 - .2)(.00)
1T21

Pi k =

= -.08

= E(X2 - IL2)(Xl - iLl) = E(XI - ILI)(X2 - iL2) =

We shall refer to J-L and l: as the population mean (vector) and population
variance-covariance (matrix), respectively.
The multivariate normal distribution is completely specified once the mean
vector J-L and variance-covariance matrix l: are given (see Chapter 4), so it is not
surprising that these quantities play an important role in many multivariate
procedures.
It is frequently informative to separate the information contained in variances lTii from that contained in measures of association and, in particular, the
measure of association known as the population correlation coefficient Pik' The
correlation coefficient Pik is defined in terms of the covariance lTik and variances
ITii and IT kk as

1T12

= -.08

lTik

---,=-:.::..",=

~~

(2-33)

The correlation coefficient measures the amount of linear association between the
random variables Xi and X k . (See,for example, [5].)

Mean Vectors and Covariance Matrices. 73

72 Chapter 2 Matrix Algebra and Random Vectors

Let the population correlation matrix be the p

p=

0"11

0"12

~~

~Yu;

0"12

0"22

~Yu;

vU;Yu;

O"lp

0"2p

X

Here

P symmetric matrix

Vl/2 =

[

vu:;-;

o

~

~

0] [2

H]

0-0
0

o

Vo);

and

~~ Yu;YU;;
Consequently, from (2-37), the correlation matrix p is given by
(2-34)

o!
3
o
and let the p

X

0] [4
0
15

1
1
9
2 -3

2] [!~ 0~ 0]

-3
25

0 0

0
~

P standard deviation matrix be

jJ

(2-35)

Partitioning the Covariance Matrix

Then it is easily verified (see Exercise 2.23) that

(2-36)
and
(2-37)
obtained
from
· "can be obtained from Vl/2 and p, whereas p can be
Th a t IS,.....
.
.'
II l:.
Moreover, the expression of these relationships in terms of matrIX operatIOns a ows
the calculations to be conveniently implemented on a computer.

Example 2.14 (Computing the correlation matrix from the covariance matrix)

Suppose

~ -~] = [::~

-3
Obtain Vl/2 and p.

25

0"13


Often, the characteristics measured on individual trials will fall naturally into two
or more groups. As examples, consider measurements of variables representing
consumption and income or variables representing personality traits and physical
characteristics. One approach to handling these situations is to let the characteristics defining the distinct groups be subsets of the total collection of characteristics. If the total collection is represented by a (p X 1)-dimensional random
vector X, the subsets can be regarded as components of X and can be sorted by
partitioning X.
In general, we can partition the p characteristics contained in the p X 1 random
vector X into, for instance, two groups of size q and p - q, respectively. For example, we can write

74

Chapter 2 Matrix Algebra and Random Vectors

Mean Vectors and Covarian ce Matrices

From the definitions of the transpose and matrix multiplication,

==

[~: ~ ~:]

Note that 1: 1z = 1: 21 , The covariance matrix of X(I) is 1: , that of
X(2) is 1:22 , and
11
that of element s from X(!) and X(Z) is 1:12 (or 1: ),
21
It is sometimes conveni ent to use the COy (X(I), X(Z» notation
where
COy

[Xq+l'- JLq+l> Xq+2 - JLq+2,"" Xp - JLp)

(Xq - JLq)(Xq+1 - JLq+l)

(Xq - JLq)(Xq+2 - ILq+2)

==:
[

=JL2)(X
JLI)(Xq+2 =JLq·d

(XI
(X2

q+2

(X:I

:::

ILq+2)

(X2

=

JLI)(Xp
IL2) (Xp

=

: ' :

JLP)]
JLp)

(Xq - JLq)(Xp - JLp)

Upon taking the expectation of the matrix (X(I) - JL(I»)(X(2) - ,.,.(2»',
we get
UI,q+1
E(X(l) - JL(I»)(X(Z) - JL(Z»'

=

UZt 1

lTI,q+2 ...
lTZt Z :..

lT~p

Uq,q+l

IT q,q+2

IT qP

The Mean Vector and Covariance Matrix
for linear Combinations of Random Variables
Recal1 that if a single random variable, such as XI, is multiplied by a
E(cXd

= 1: IZ (2-39)

(X - JL)(X - ,.,.)'

If X 2 is a second random variable and a and b are constants, then,
using addition al
Cov(aXI ,bX2)

(X(I) - r(!»(X( Z) - JL(2))'J
(qxl

Yar(aXI

+ bXz) = aE(XI ) + bE(X2) = aJLI + bJL2
+ bX2) = E[(aXI + bX2) - (aJLI + bIL2»)2

(IX(p-q»

,.,.(2)

((p-q)XI)

,

q
p-q

(X(Z) - JL (2»),
(IX(p-q»

= a2Yar(XI )

1:21

= a lTl1

p-q

[_~.1.!....+_ ..~.~~l
!

With e' = [a, b], aXI

+

lTl q

+ bX2 can be written as
[a b)

lTlp

!Uq,q+1

lTqp

Similarly, E(aXl

l :

Uql

Uqq

lTpl

Uq+l,q (q+l,q+ l
lTpq

j Up,q+1

lTq+l,p
lTpp

+ bX2)

= aJLI

If we let

[~~J

=

e'X

+ bJL2 can be expressed as
[a b]

------------------------------------1"-------------------.--.---.--.------.

lTq+I,1

+ bZYar( Xz) + 2abCov (X1,XZ)
+ 2ablT12

1:22J

i Ul,~+1

I

b2lT22

(pxp)
Uu

'

+ b(Xz - JLZ)]2
= E[aZ(X I - JLI)2 + bZ(Xz - ILZ)2 + 2ab(XI - JLd(X - JL2)]
2
2

=

I

= E[a(XI - JLI)

and consequently,

1: = E(X - JL)(X - JL)'

= E(aXI - aILIl(bXz - bILz)
=abE( XI - JLI) (X2 - JLz)
= abCov (XI,Xz ) = ablT12

Finally, for the linear combina tion aX1 + bX , we have
z
E(aXI

q

= cE(Xd = CJLI

and

properti es of expectation, we get

which gives al1 the covariances,lTi;, i = 1,2, ... , q, j = q + 1, q + 2, ...
, p, between
a compon ent of X(!) and a component of X(2). Note that the matrix
1:12 is not
necessarily symmetric or even square.
Making use of the partitioning in Equation (2-38), we can easily demons trate
that

(X(2) -

constan t c, then

lTIP]

[

(pxp)

(X(I),X(2) = 1:12

is a matrix containi ng all of the covariances between a compon ent
of X(!) and a
compon ent of X(Z).

Xq - JLq
(XI - JLd(Xq+1 - JLq+d
(X2 - JL2)(Xq+1 - JLq+l)

75

[~~J = e',.,.

(2-41)

------------....

76 "" x q]' and x(Z) = [Xq+b"" .xp]', re~pective!y; SII is the sample c~vari­
ance matrix computed from observatIOns x( ); SZ2 IS the sample covanance
matrix computed from observations X(2); and S12 = S:n is the sample covariance
matrix for elements of x(I) and elements of x(Z).

A simple, ~ut important, extension of the Cauchy-Schwarz inequality follows
directly.
Extended Cauchy-Schwarz Inequality. Let band
let B be a positive definite matrix. Then (pXl)

d

be any two vectors, and

(pXI)

(pxp)

(b'd/

(b'B b)(d'B- 1d)

$

(2-49)

with equality if and only if b = c B-1d (or d = cB b) for some constant c.
Proof. The inequality is obvious when b = 0 or d = O. For cases other than these,
consider the square-root matrix Bl/2 defined in terms of its eigenvalues A; and

2.1 Matrix Inequalities and Maximization
Maximization principles play an important role in several multivariate techniques.
Linear discriminant analysis, for example, is concerned with allocating observations
to predetermined groups. The allocation rule is often a linear function of measurements that maximizes the separation between groups relative to their within-group
variability. As another example, principal components are linear combinations of
measurements with maximum variability.
The matrix inequalities presented in this section will easily allow us to derive
certain maximization results, which will be referenced in later chapters.
Cauchy-Schwarz Inequality. Let band d be any two p
(b'd)2
with equality if and only if b

$

X

= cd (or d = cb) for some constant c.

2: VX; e;ej. If we set [see also (2-22)]
;=1

B- 1/ Z

=

±VX; e.e~
_1_

;=1

I

I

it follows that
b'd = b'Id = b'Blf2B-1/ 2d

=

(Bl/2b)' (B-1/2d)

and the proof is completed by applying the Cauchy-Schwarz inequality to the
vectors (Bl/2b) and (B-1/2d).


1 vectors. Then

(b'b)(d'd)

p

the normalized eigenvectors e; as B1/2 =

(2-48)

The extended Cauchy-Schwarz inequality gives rise to the following maximization result.

80

------------.....
Matrix Inequalities and Maximization 81

Chapter 2 Matrix Algebra and Random Vectors

Maximization Lemma . Let

B be positive definite and

(pxp)

d

(pXI)

be a given vector.

Setting x = el gives

Then, for an arbitrar y nonzero vector x ,
(pXl)
( 'd)2
max 2.....x>,o x'Bx
with the maximum attained when x
(pXI)

=

d' B-1d

cB-

=

1

(2-50)

d for any constan t c

(pxp)(px l)

* O.

since

, {I,

proof. By the extende d Cauchy-Schwarz inequality, (x'd)2

$: (x'Bx) (d'B-Id ).
Because x 0 and B is positive definite, x'Bx > O. Dividing both
sides of the
inequality by the positive scalar x'Bx yields the upper bound

*

'd)2 ::;
( __
_x
d'B-1d
x'Bx
Taking the maximum over x gives Equatio n (2-50) because the bound is
attained for
x = CB-Id.



A [mal maximization result will provide us with an interpretation
of

eigenvalues.

Maximization of Quadratic Forms for Points on the Unit Sphere. Let
B be a
(pXp)
positive definite matrix with eigenvalues Al ~ A2 ~ ... ~ Ap ~ 0 and
associated
normalized eigenvectors el, e2,' .. , e po Then
x'Bx
max- ,- == Al
x>'O x.x
x'Bx
min- -=A
x>'o x'x
p

(attaine d when x = ed
(attaine d when x

U2, ... , Ur], Vr = [VI' V2,"" Vr ], and Ar is an r X r diagonal matrix
with diagonal entries Ai'

=

A[~ ;

Also,

r

A =

-J [1: I:J

You may verify Utat the eigenvalues ')' = A2 of AA' satisfy the equation
')'2 - 22,), + 120 = (y- 12)(')' - 10), and consequently, the eigenvalues are

+ [1.6 -.8J
- .8

V'

A -13 31 11J
Then

=

A

(mXm)(mxk)(kxk)

where U has m orthogonal eigenvectors of AA' as its columns, V has k orthogonal
eigenvectors of A' A as its columns, and A is specified in Result 2A.15.
For example, let

so A has eigenvalues Al = 3 and A2 = 2. The corresponding eigenvectors are
et = [1/VS, 2/VS] and ez = [2/VS, -l/VS], respectively. Consequently,

A= [

101

10
A'Av2 =

[

~

1
= [ v30

102 Chapter 2 Matrix Algebra and Random Vectors
Taking Al

Exercises

= VU and A2 = v1O, we find that the singular-value decomposition of

103

Exercises

Ais

A

=

[ 3 1 1J

2.1.

-1) 1

J

2

v'6 +
v'6 _1

v1O[~l [~

3,

1].

(b) F~nd (i) ~e length of x, (ii) the angle between x and y, and (iii) the projection of y on x.
(c) Smce x = 3 and y = 1, graph [5 - 3,1 - 3,3 - 3] = [2 -2 DJ and
[-1-1,3-1,1-1J=[-2,2,OJ.
'
,

-1 DJ

VS VS

-1

Letx' = [5, 1, 3] andy' = [-1,
. (a) Graph the two vectors.

2.2. Given the matrices

v'2
The equality may be checked by carrying out the operations on the right-hand side.
The singular-value decomposition is closely connected to a result concerning
the approximation of a rectangular matrix by a lower-dimensional matrix, due to
Eckart and Young ([2]). If a m X k matrix A is approximated by B, having the same
dimension but lower rank, the sum of squared differences
m

k

2: 2: (aij -

bijf = tr[(A - B)(A - B)']

i=1 j=1

Result 2A.16. Let A be an m X k matrix of real numbers with m ~ k and singular
value decomposition VAV'. Lets < k = rank (A). Then

perform the indicated multiplications.
(a) 5A
(b) BA
(c) A'B'
(d) C'B
(e) Is AB defined?

2.3. Verify the following properties of the transpose when
A

s

B

=

2: AiDi v;

(a)
(b)
(c)
(d)

i=1

is the rank-s least squares approximation to A. It minimizes
tr[(A - B)(A - B)')
over all m X k matrices B having rank no greater than s. The minimum value, or
k

error of approximation, is

2:

;=s+1

AT.



To establish this result, we use vV'
squares as
tr[(A - B)(A - B)'j

= Im and VV' = Ik

to write the sum of

=

[~ ~

J U~ ~J
B

(A')' = A
(C,)-l = (C- I )'
(AB)' = B' A'
For general A and B , (AB)' = B'A'
(mXk)

(kxt)

2,4. When A-I and B- exist, prove each of the following.
.
(a) (A,)-l = (A-I),
(b) (AB)-I = B-IA- I

Hint: Part a can be proved br noting that AA-I = I, I'; 1', and (AA-i)' = (A-I),A'.
Part b follows from (B- 1A- )AB = B-I(A-IA)B = B-IB = I.

Q =

= tr[V'(A - B)VV'(A - B)'V)

is an orthogonal matrix.

= tr[(A

- C)(A - C)') =

2: 2: (Aij -

m

Cij? =

i=1 j=1

where C

.

2.5. Check that

k

= V'BV. Clearly, the minimum occurs when Cij

2: (Ai -

Cii)2

+

i=1

= Ofor i

2:2:
CTj
i"j

'* j and cns = Ai for

the s largest singular values. The other Cu = O. That is, UBV' = As or B =

2: Ai Di vi·

i=1

and

1

= tr[UV'(A - B)VV'(A - B)')

m

=

2.6. Let

(a) Is A symmetric?
(b) Show that A is positive definite.

[

5
12J
IT
IT
12
5
-IT IT

104

Chapter 2 Matrix Algebra and Random Vectors
2.7.

Exercises

Let A be as given in Exercise 2.6.
(a) Determine the eigenvalues and eigenvectors of A.
(b) Write the spectral decomposition of A.
(c) Find A-I.

2.17. Prove that every eigenvalue of a k x k positive definite matrix A is positive.
Hint: Consider the definition of an eigenvalue, where Ae = Ae. Multiply on the left by
e' so that e' Ae = Ae' e.
2.18. Consider the sets of points (XI, x2) whose "distances" from the origin are given by

(d) Find the eigenvaiues and eigenvectors of A-I.

2

c = 4xt

2

2.8. Given the matrix
A =

105

+ 3x~ -

2v'2XIX2

2

for c = 1 and for c = 4. Determine the major and minor axes of the ellipses of constant distances and their associated lengths. Sketch the ellipses of constant distances and
comment on their pOSitions. What will happen as c2 increases?

G-~J

find the eigenvalues Al and A2 and the associated nonnalized eigenvectors el and e2.
Determine the spectral decomposition (2-16) of A.
2.9. Let A be as in Exercise 2.8.
(a) Find A-I.

(b) Compute the eigenvalues and eigenvectors of A-I.
(c) Write the spectral decomposition of A-I, and compare it with that of A from
Exercise 2.8.

2.19. Let AI/2

(mXm)

= ;=1
~

VA;eie; = PA J/ 2P',wherePP'

= P'P

=

I. (The A.'s and the e.'s are
'

I

the eigenvalues and associated normalized eigenvectors of the matrix A.) Show Properties
(1)-(4) of the square-root matrix in (2-22).
2.20. Determine the square-root matrix AI/2, using the matrix A in Exercise 2.3. Also, deter. mine A-I/2, and show that A I/2A- I/2 = A- 1f2A1/ 2 = I.
2.21. (See Result 2AIS) Using the matrix

2.10. Consider the matrices

A = [:.001

4.001J
4.002

and

4
B = [ 4.001

4.001
4.002001

J

These matrices are identical except for a small difference in the (2,2) position.
Moreover, the columns of A (and B) are nearly linearly dependent. Show that
A-I ='= (-3)B- I. Consequently, small changes-perhaps caused by rounding-can give
substantially different inverses.

(a) Calculate A' A and obtain its eigenvalues and eigenvectors.
(b) Calculate AA' and obtain its eigenvalues and eigenvectors. Check that the nonzero
eigenvalues are the same as those in part a.
(c) Obtain the singular-value decomposition of A.

2.11. Show that the determinant of the p X P diagonal matrix A = {aij} with aij = 0, i *- j,
is given by the product of the diagonal elements; thus, 1A 1 = a" a22 ... a p p.
Hint: By Definition 2A24, I A I = a" A" + 0 + ... + O. Repeat for the submatrix
All obtained by deleting the first row and first column of A.

2.22. (See Result 2A1S) Using the matrix

2.12. Show that the determinant of a square symmetric p x p matrix A can be expressed as
the product of its eigenvalues AI, A2, ... , Ap; that is, IA I =
Ai.
Hint: From (2-16) and (2-20), A = PAP' with P'P = I. From Result 2A.1I(e),
lA I = IPAP' I = IP IIAP' I = IP 11 A liP' I = I A 1111, since III = IP'PI = IP'IIPI. Apply
Exercise 2.11.

(a) Calculate AA' and obtain its eigenvalues and eigenvectors.
(b) Calculate A' A and obtain its eigenvalues and eigenvectors. Check that the nonzero
eigenvalues are the same as those in part a.
(c) Obtain the singular-val~e decomposition of A.
2.23. Verify the relationships V I/ 2pV I!2 = I and p = (Vlf2rII(VI/2rl, where I is the
p X .P popul~tion cov~riance matrix [E~uation (2-32)], p is the p X P population correlatIOn matnx [EquatIOn (2-34)], and V /2 is the population standard deviation matrix
[Equation (2-35)].

rr;=1

2.13. Show that I Q I = + 1 or -1 if Q is a p X P orthogonal matrix.
Hint: I QQ' I = II I. Also, from Result 2A.11, IQ" Q' I = IQ 12. Thus, IQ 12
use Exercise 2.11.
2.14. Show that Q'

A

= II I. Now

Q and A have the same eigenvalues if Q is orthogonal.

(pXp)(pXp)(pxp)

(pXp)

A

= [; 86 -98J

2.24. Let X have covariance matrix

Hint: Let A be an eigenvalue of A. Then 0 = 1A - AI I. By Exercise 2.13 and Result
2A.11(e), we can write 0 = IQ' 11 A - AlII Q I = IQ' AQ - All, since Q'Q = I.
2.1 S. A quadratic form x' A x is said to be positive definite if the matrix A is positive definite.
.
Is the quadratic form 3xt + 3x~ - 2XIX2 positive definite?
2.16. Consider an arbitrary n X p matrix A. Then A' A is a symmetric p
that A' A is necessarily nonnegative definite.
Hint: Set y = A x so that y'y = x' A' A x.

X P

matrix. Show

Find
(a) I-I
(b) The eigenvalues and eigenvectors of I.
(c) The eigenvalues and eigenvectors of I-I.

106 Chapter 2 Matrix Algebra and Random Vectors

Exercises

2.25. Let X have covariance matrix

2.29. Consider the arbitrary random vector X'
,.,: = [ILl> IL2. IL3, IL4, Jl.sJ· Partition X into

I =

25
-2
[
4

-2 4]
4 1
1 9

(a) Determine p a~d V 1/2.
(b) Multiply your matrices to check the relation VI/2pVI/2 =

X =

xl"

(a) Findpl3'
(b) Find the correlation between XI and ~X2 + ~X3'
2.27. Derive expressions for the mean and variances of the following linear combinations in
terms of the means and covariances of the random variables XI, X 2, and X 3.
(a) XI - 2X2
(b) -XI + 3X2
(c) XI + X 2 + X3
(e) XI + 2X2 - X3
(f) 3XI - 4X2 if XI and X 2 are independent random variables.
2.28. Show that

where Cl = [CJl, cl2, ... , Cl PJ and ci = [C2l> C22,' .. , C2 pJ. This verifies the off-diagonal
elements CIxC' in (2-45) or diagonal elements if Cl = C2'
Hint: By (2-43),ZI - E(ZI) = Cl1(XI - ILl) + '" + Clp(Xp - ILp) and
Z2 - E(Z2) = C21(XI - ILl) + ... + C2p(Xp - ILp).SOCov(ZI,Zz) =
E[(ZI - E(Zd)(Z2 - E(Z2»J = E[(cll(XI - ILl) +
'" + CIP(Xp - ILp»(C21(XI - ILd + C22(X2 - IL2) + ... + C2p(Xp - ILp»J.
The product
(Cu(XI - ILl) + CdX2 - IL2) + .. ,

+ Clp(Xp - IL p»(C21(XI - ILl) + C22(X2 - IL2) + ... + C2p(Xp - ILp»

=

2: 2:

p

~ [;;]

ILe»)

(~I C2m(Xm -

[~:!.I'~J
X (2)

ILm»)

p

CJ(C2 m(Xe - ILe) (Xm - ILm)

(=1 m=1

has expected value

.nd X'"

~ [~:]

Let I be the covariance matrix of X with general element (Tik' Partition I into the
covariance matrices of X(l) and X(2) and the covariance matrix of an element of X(1)
and an element of X (2).
2.30. You are given the random vector X' = [XI' X 2, X 3, X 4 J with mean vector
Jl.x = [4,3,2, 1J and variance-covariance matrix

3 0

Ix =

o

1

2 1

f

2 0

Partition X as

(~ cu(Xe -

with mean vector

where

I.

2.26. Use I as given in Exercise 2.25.

=

= [Xl> X 2, X 3, X 4, X5J

107

Let
A = (1

2J

and

B =

C=n

and consider the linear combinations AX(!) and BX(2). Find
(a) E(X(J)
(b) E(AX(l)
(c) Cov(X(l)
(d) COY (AX(!)
(e) E(X(2)
(f) E(BX(2)
(g) COY (X(2)
(h) Cov (BX(2)
(i) COY (X(l), X (2)
(j) COY (AX(J), BX(2)
2 .31. Repeat Exercise 2.30, but with A and B replaced by

Verify the last step by the definition of matrix multiplication. The same steps hold for all
elements.

A = [1

-1 J and

B =

[~

-

~]

108

Exercises

Chapter 2 Matrix Algebra and Random Vectors
2.32. You are given the random vector X' = [XI, X 2 , ... , Xs] with mean vector
IJ.'x = [2,4, -1,3,0] and variance-covariance matrix
4

Ix =

-1
1.

-1

I
-2:

0

1

-1

1

4

0

-1

0

0

2

6

2.3S. Using the vecto~s b' = [-4,3] and d' = [1,1]' verify the extended Cauchy-Schwarz
inequality (b'd) s (b'Bb)(d'B-1d) if

B= [ -22 -2J5

0

-1

3

1
2
I -1
-2
0

I
2:

109

2.36. Fmd the maximum and minimum values of the quadratic form 4x~ + 4x~ +
all points x' = [x I , X2] such that x' x = 1.

6XIX2

for

2.37. With A as given in Exercise 2.6, fmd the maximum value of x' A x for x' x = 1.
2.38. Find the maximum and minimum values of the ratio x' Ax/x'x for any nonzero vectors
x' = [Xl> X2, X3] if

Partition X as

A =

[~!2 -2~: -~]
10

2.39. Show that
s

A

Let
A

=D -~J

and

B=

G ~ -~J

t

C has (i,j)th entry ~ ~ aicbckCkj

B

e~1 k~l

(rXs)(sXt)(tXV)
t

Hint: BC has (e, j)th entry ~ bCkCkj = dCj' So A(BC) has (i, j)th element
k~l

and consider the linear combinations AX(I) and BX(2). Find
(a) E(X(l)

(b) E(AX(I)
(c) Cov(X(1)
(d) COV(AX(l)

2.40. Verify (2-24): E(X + Y) = E(X) + E(Y) and E(AXB) = AE(X)B.
Hint: X. + ~ has Xij + Yij as its (i,j~th element. Now,E(Xij + Yij ) = E(X ij ) + E(Yi)
by a umvanate property of expectation, and this last quantity is the (i, j)th element of

+ E(Y). Next (see Exercise 2.39),AXB has (i,j)th entry ~ ~ aieXCkbkj, and
by the additive property of expectation,
C k

(e) E(X(2)
(f) E(BX(2)

(g)
(h)
(i)
(j)

E(X)

COy (X(2)
Cov (BX(2)
COy (X(l), X(2)
COy (AX(I), BX(2)

E(~e
~ aiCXCkbkj)
= ~ ~ aj{E(XCk)bkj
k e
k
which is the (i, j)th element of AE(X)B.
2.41. You are given the random vector X' = [Xl, X 2, X 3 , X 4 ] with mean vector
IJ.x = [3,2, -2,0] and variance-covariance matrix

2.33. Repeat Exercise 2.32, but with X partitioned as

Ix =

Let
and with A and B replaced by
A =

3
[~ -11 0J

and

B =

[11 -12J

2.34. Consider the vectorsb' = [2, -1,4,0] and d' = [-1,3, -2, 1]. Verify the Cauchy-Schwan
inequality (b'd)2 s (b'b)(d'd).

A =

[30

0
3 0
0 0 3
o 0 0

o

[1 -1
1
1

1
1

0

-2

~J
-~]

1
(a) Find E (AX), the mean of AX.
(b) Find Cov (AX), the variances and covariances ofAX.
(c) Which pairs of linear combinations have zero covariances?

,,0

Chapter 2 Matrix Algebra and Random Vectors
2.42. Repeat Exercise 2.41, but with

References
1. BeIlman, R. Introduction to Mat~ix Analysis (2nd ed.) Philadelphia: Soc for Industrial &

Applied Math (SIAM), 1997.
.
2. Eckart, C, and G. young. "The Approximation of One Matrix by Another of Lower
Rank." Psychometrika, 1 (1936),211-218.
3. Graybill, F. A. Introduction to Matrices with Applications in Statistics. Belmont, CA:
Wadsworth,1969.
4. Halmos, P. R. Finite-Dimensional Vector Spaces. New York: Springer-Veriag, 1993.
5. Johnson, R. A., and G. K. Bhattacharyya. Statistics: Principles and Methods (5th ed.) New
York: John Wiley, 2005.
6. Noble, B., and 1. W. Daniel. Applied Linear Algebra (3rd ed.). Englewood Cliffs, NJ:
Prentice Hall, 1988.

SAMPLE GEOMETRY
AND RANDOM SAMPLING
3.1 Introduction
With the vector concepts introduced in the previous chapter, we can now delve deeper
into the geometrical interpretations of the descriptive statistics K, Sn, and R; we do so in
Section 3.2. Many of our explanations use the representation of the columns of X as p
vectors in n dimensions. In Section 3.3 we introduce the assumption that the observations constitute a random sample. Simply stated, random sampling implies that (1) measurements taken on different items (or trials) are unrelated to one another and (2) the
joint distribution of all p variables remains the same for all items. Ultimately, it is this
structure of the random sample that justifies a particular choice of distance and dictates
the geometry for the n-dimensional representation of the data. Furthermore, when data
can be treated as a random sample, statistical inferences are based on a solid foundation.
Returning to geometric interpretations in Section 3.4, we introduce a single
number, called generalized variance, to describe variability. This generalization of
variance is an integral part of the comparison of multivariate means. In later sections we use matrix algebra to provide concise expressions for the matrix products
and sums that allow us to calculate x and Sn directly from the data matrix X. The
connection between K, Sn, and the means and covariances for linear combinations
of variables is also clearly delineated, using the notion of matrix products.

3.2 The Geometry of the Sample
A single multivariate observation is the collection of measurements on p different
variables taken on the same item or trial. As in Chapter 1, if n observations have
been obtained, the entire data set can be placed in an n X p array (matrix):

X
(nxp)

Xl1

=

XZl

r

:

Xnl

"'
.~.

X12
X22

XIPj
X2p
".:

Xn2

•••

x np

111

Chapter 3 Sample Geometry and Random Sampling

The Geometry of the Sample

Each row of X represents a multivariate observation. Since the entire set of
measurements is often one particular realization of what might have been
observed, we say that the data are a sample of size n from a
"population." The sample then consists of n measurements, each of which has p
components.
As we have seen, the data can be ploUed in two different ways. For the.
p-dimensional scatter plot, the rows of X represent n points in p-dimensional
space. We can write

X

=

(nXp)

Xll

X12

XI P]

X~l

X22

X2p

:

···

Xnl

xnp

[

[X~J
_

-

-1st '(multivariate) observation

2

.x

5

3

4

x
2

3


@x

2

.x,
-2 -1

2

4

3

5

-1

X2

..
.

x~

113

Figure 3.1 A plot of the data
matrix X as n = 3 points in p = 2
space.

-2

-nth (multivariate) observation

The row vector xj, representing the jth observation, contains the coordinates of
point.
. . . .
.
The scatter plot of n points in p-dlmensIOnal space provIdes mformatlOn on the
. locations and variability of the points. If the points are regarded as solid spheres,
the sample mean vector X, given by (1-8), is the center of balance. Variability occurs
in more than one direction, and it is quantified by the sample variance-covariance
matrix Sn. A single numerical measure of variability is provided by the determinant
of the sample variance-covariance matrix. When p is greate: tha~ 3, this scaUer
plot representation cannot actually be graphed. Yet the conslde~atlOn ?f the data
as n points in p dimensions provides insights that are not readIly avallable from
algebraic expressions. Moreover, the concepts illustrated for p = 2 or p = 3 remain
valid for the other cases.

x from the

.

of the scatter

The alternative geometrical representation is constructed by considering the
data as p vectors in n-dimensional space. Here we take the elements of the columns
of the data matrix to be the coordinates of the vectors. Let

x
(nxp)

Example 3.1 (Computing the mean vector) Compute the mean vector

x is the balance point (center of gravity)

Figure 3.1 shows that
~

=

r;;~ ;;~
:

:

XnI

Xn 2

P
XI ]
xZp

".
'"

:

= [YI

"

i Yz i

(3-2)

xnp

data matrix.

Plot the n = 3 data points in p = 2 space, and locate xon the resulting diagram.
The first point, Xl> has coordinates xi = [4,1). Similarly, the remaining two
points are xi = [-1,3] andx3 = [3,5). Finally,

Then the coordinates of the first point yi = [Xll, XZI, ... , xnd are the n measurements on the first variable. In general, the ith point yi = [Xli, X2i,"" xnd is
determined by the n-tuple of all measurements on the ith variable. In this geometrical representation, we depict Yb"" YP as vectors rather than points, as in the
p-dimensional scatter plot. We shall be manipulating these quantities shortly using
the algebra of vectors discussed in Chapter 2.

Example 3.2 (Data as p vectors in n dimensions) Plot the following data as p = 2
vectors in n = 3 space:

I 14

Chapter 3 Sample Geometry and Random Sampling

The Geometry of the Sample

I 15

Further, for each Yi, we have the decomposition

where XiI is perpendicular to Yi - XiI. The deviation, or mean corrected, vector is

],

Figure 3.2 A plot of the data
matrix X as p = 2 vectors in
n = 3-space.

5
1 6

Hereyi

= [4, -1,3] andyz =

di

= Yi

- XiI

=

Xli - Xi]
X2- - X·
[

':_'

Xni -

[1,3,5]. These vectors are shown in Figure 3.2. _

(3-4)

Xi

The elements of d i are the deviations of the measurements on the ith variable from
their sample mean. Decomposition of the Yi vectors into mean components and
deviation from the mean components is shown in Figure 3.3 for p = 3 and n = 3.
3

Many of the algebraic expressions we shall encounter in multivariate analysis
can be related to the geometrical notions of length, angle, and volume. This is important because geometrical representations ordinarily facilitate understanding and
lead to further insights.
Unfortunately, we are limited to visualizing objects in three dimensions, and
consequently, the n-dimensional representation of the data matrix X may not seem
like a particularly useful device for n > 3. It turns out, however, that geometrical
relationships and the associated statistical concepts depicted for any three vectors
remain valid regardless of their dimension. This follows because three vectors, even if
n dimensional, can span no more than a three-dimensional space, just as two vectors
with any number of components must lie in a plane. By selecting an appropriate
three-dimensional perspective-that is, a portion of the n-dimensional space containing the three vectors of interest-a view is obtained that preserves both lengths
and angles. Thus, it is possible, with the right choice of axes, to illustrate certain algebraic statistical concepts in terms of only two or three vectors of any dimension n.
Since the specific choice of axes is not relevant to the geometry, we shall always
.
label the coordinate axes 1,2, and 3.
It is possible to give a geometrical interpretation of the process of finding a sample mean. We start by defining the n X 1 vector 1;, = (1,1, ... ,1]. (To simplify the
notation, the subscript n will be dropped when the dimension of the vector 1" is
clear from the context.) The vector 1 forms equal angles with each of the n
coordinate axes, so the vector (l/Vii)I has unit length in the equal-angle direction.
Consider the vector Y; = [Xli, x2i,"" xn;]. The projection of Yi on the unit vector
(1/ vn)I is, by (2-8),

1 1 ) -1- 1 -xI-+X2'+"'+xnl I - - I
Yi'(-Vii
Vii - "
n
- Xi

Figure 3.3 The decomposition
of Yi into a mean component
XiI and a deviation component
d i = Yi - XiI, i = 1,2,3.

Example 3.3 (Decomposing a vector into its mean and deviation components) Let
us carry out the decomposition of Yi into xjI and d i = Yi - XiI, i = 1,2, for the data
given in Example 3.2:

Here, Xl = (4 - 1
(3-3)

That is, the sample mean Xi = (Xli + x2i + .. , + xn;}/n = yjI/n corresponds to the
multiple of 1 required to give the projection of Yi onto the line determined by 1.

+ 3)/3

= 2 and X2 = (1

+ 3 + 5)/3 = 3, so

The Geometry of the Sample

116 Chapter 3 Sample Geometry and Random Sampling

We have translated the deviation vectors to the origin without changing their lengths
or orientations.
Now consider the squared lengths of the deviation vectors. Using (2-5) and
(3-4), we obtain

Consequently,

I
\

1I 7

L~i = did i =

and

±

(Xji -

j=l

xi

(3-5)

(Length of deviation vector)2 = sum of squared deviations

\
We note that xII and d l = Yl - xII are perpendicular, because

From (1-3), we see that the squared length is proportional to the variance of
the measurements on the ith variable. Equivalently, the length is proportional to
the standard deviation. Longer vectors represent more variability than shorter
vectors.
For any two deviation vectors d i and db
n

did k =

2: (Xji -

Xi)(Xjk -

Xk)

(3-6)

j=l

A similar result holds for x2 1 and d 2 =

Y2 -

x21. The decomposition is

Y,+:]~m+:]

Let fJ ik denote the angle formed by the vectors d i and d k . From (2-6), we get

or,using (3-5) and (3-6), we obtain

pm~ml:]
so that [see (1-5)]
For the time being, we are interested in the deviation (or residual) vectors
d; = Yi - xiI. A plot of the deviation vectors of Figur,e 3.3 is given in Figure 3.4.

The cosine of the angle is the sample correlation coefficient. Thus, if the two
deviation vectors have nearly the same orientation, the sample correlation will be
close to 1. If the two vectors are nearly perpendicular, the sample correlation will
be approximately zero. If the two vectors are oriented in nearly opposite directions,
the sample correlation will be close to -1.

3

dJ~

(3-7)

Example 3.4 (Calculating Sn and R from deviation vectors) Given the deviation vectors in Example 3.3, let us compute the sample variance-covariance matrix Sn and
sample correlation matrix R using the geometrical concepts just introduced.
From Example 3.3,

________~__________________~

Figure 3.4 The deviation
vectors d i from Figure 3.3.

v

I 18

Random Samples and the Expected Values of the Sample Mean and Covariance Matrix

Chapter 3 Sample Geometry and Random Sampling

1,19

The concepts of length, angle, and projection have provided us with a geometrical
interpretation of the sample. We summarize as follows:

3

Geometrical Interpretation of the Sample
X onto the equal angular
vector 1 is the vector XiI. The vector XiI has length Vii 1Xi I. Therefore, the
ith sample mean, Xi, is related to the length of the projection of Yi on 1.
2. The information comprising Sn is obtained from the deviation vectors d i =
Yi - XiI = [Xli - Xi,X2i - x;"",Xni - Xi)" The square of the length ofdi
is nSii, and the (inner) product between d i and d k is nSik.1
3. The sample correlation rik is the cosine of the angle between d i and d k •
1. The projection of a column Yi of the data matrix

4

Figure 3.5 The deviation vectors
d 1 andd2·

5

These vectors, translated to the origin, are shown in Figure 3.5. Now,

or SII =

3.3 Random Samples and the Expected Values of
the Sample Mean and Covariance Matrix
In order to study the sampling variability of statistics such as xand Sn with the ultimate aim of making inferences, we need to make assumptions about the variables
whose oDserved values constitute the data set X.
Suppose, then, that the data have not yet been observed, but we intend to collect
n sets of measurements on p variables. Before the measurements are made, their
values cannot, in general, be predicted exactly. Consequently, we treat them as random variables. In this context, let the (j, k )-th entry in the data matrix be the
random variable X jk • Each set of measurements Xj on p variables is a random vector, and we have the random matrix

¥. Also,

.

rX~J
~2
..

X np

X~

Xll

X

or S22 = ~. Finally,

(nXp)

=

X 21

r

:

Xn!

or S12 = -~. Consequently,

and

R

=

[1 -.189J
-.189 1

XIPJ
x.2P
=
.

(3-8)

A random sample can now be defined.
If the row vectors Xl, Xl, ... , X~ in (3-8) represent independent observations
from a common joint distribution with density function f(x) = f(xl> X2,"" xp),
then Xl, X 2 , ... , Xn are said to form a random sample from f(x). Mathematically,
Xl> X 2, ••. , Xn form a random sample if their joint density function is given by the
product f(Xl)!(X2)'" f(xn), where f(xj) = !(Xj!, Xj2"'" Xjp) is the density function for the jth row vector.
Two points connected with the definition of random sample merit special attention:
1. The measurements of the p variables in a single trial, such as Xi =
[Xjl , X j2 , ... , Xjp], will usually be correlated. Indeed, we expect this to be the
case. The measurements from different trials must, however, be independent.
1 The square of the length and the inner product are (n - l)s;; and (n - I)s;k, respectively, when
the divisor n - 1 is used in the definitions of the sample variance and covariance.

120

Random Samples and the Expected Values of the Sample Mean and Covariance Matrix

Chapter 3 Sample Geometry and Random Sampling
2. The independence of measurements from trial to trial may not hold when the
variables are likely to drift over time, as with sets of p stock prices or p economic indicators. Violations of the tentative assumption of independence can
have a serious impact on the quality of statistical inferences.
The following eJglmples illustrate these remarks.
Example 3.5 (Selecting a random sample) As a preliminary step in designing a
permit system for utilizing a wilderness canoe area without overcrowding, a naturalresource manager took a survey of users. The total wilQerness area was divided into
subregions, and respondents were asked to give information on the regions visited,
lengths of stay, and other variables.
The method followed was to select persons randomly (perhaps using a random·
number table) from all those who entered the wilderness area during a particular
week. All persons were e~ually likely to be in the sample, so the more popular
entrances were represented by larger proportions of canoeists.
Here one would expect the sample observations to conform closely to the criterion for a random sample from the population of users or potential users. On the
other hand, if one of the samplers had waited at a campsite far in the interior of the
area and interviewed only canoeists who reached that spot, successive measurements
would not be independent. For instance, lengths of stay in the wilderness area for dif•
ferent canoeists from this group would all tend to be large.
Example 3.6 (A nonrandom sample) Because of concerns with future solid-waste
disposal, an ongoing study concerns the gross weight of municipal solid waste generated per year in the United States (Environmental Protection Agency). Estimated
amounts attributed to Xl = paper and paperboard waste and X2 = plastic waste, in
millions of tons, are given for selected years in Table 3.1. Should these measurements on X t = [Xl> X 2 ] be treated as a random sample of size n = 7? No! In fact,
except for a slight but fortunate downturn in paper and paperboard waste in 2003,
both variables are increasing over time.

If the n components are not independent or the marginal distributions are not
identical, the influence of individual measurements (coordinates) on location is
asymmetrical. We would then be led to consider a distance function in which the
coordinates were weighted unequally, as in the "statistical" distances or quadratic
forms introduced in Chapters 1 and 2.
Certain conclusions can be reached concerning the sampling distributions of X
and Sn without making further assumptions regarding the form of the underlying
joint distribution of the variables. In particular, we can see how X and Sn fare as point
estimators of the corresponding population mean vector p. and covariance matrix l:.
Result 3.1. Let Xl' X 2 , .•• , Xn be a random sample from a joint distribution that
has mean vector p. and covariance matrix l:. Then X is an unbiased estimator of p.,
and its covariance matrix is

That is,

E(X) = p.

(popUlation mean vector)

1
Cov(X) =-l:

population variance-covariance matrix)
(
divided by sample size

n

(3-9)

For the covariance matrix Sn,

E(S)
n
Thus,

n - 1

1

= -n l : = l: - -l:
n

Ee:

(3-10)

1 Sn) = l:

so [n/(n - 1) ]Sn is an unbiased estimator of l:, while Sn is a biased estimator with
(bias) = E(Sn) - l: = -(l/n)l:.

Proof. Now, X = (Xl + X 2 + ... + Xn)/n. The repeated use of the properties of
expectation in (2-24) for two vectors gives

Table 3.1 Solid Waste
Year

1960

1970

1980

1990

1995

2000

2003

Xl (paper)

29.2

44.3

55.2

72.7

81.7

87.7

83.1

.4

2.9

6.8

17.1

18.9

24.7

26.7

X2 (plastics)

121

-

=


As we have argued heuristically in Chapter 1, the notion of statistical independence has important implications for measuring distance. Euclidean distance appears
appropriate if the components of a vector are independent and have the same vari= [Xlk' X 2k>'.·' X nk ]
ances. Suppose we consider the location ofthe kthcolumn
of X, regarded as a point in n dimensions. The location of this point is determined by
the joint probability distribution !(Yk) = !(Xlk,X2k> ... ,Xnk)' When the measurements X lk , X 2k , ... , X nk are a random sample, !(Yk) = !(Xlk, X2k,"" Xnk) =
!k(Xlk)!k(X2k)'" !k(Xnk) and, consequently, each coordinate Xjk contributes equally
to the location through the identical marginal distributions !k( Xj k)'

Yl

(1

1

1)

E(X) = E ;;Xl + ;;X2 + .,. + ;;Xn

E(~Xl) + E(~X2) + .. , + E(~Xn)
1

1

1

1

1

1

= ;;E(Xd + ;;E(X2 ) + ... + ;;:E(Xn) =;;p. +;;p. + ... + ;;p.
=p.
Next,
n
(X - p.)(X - p.)' = ( -1 ~
(Xj - p.) )
n j~l

1

n

(1-n

n
~
(X t - p.) ) '
t=l

n

= 2 ~ ~ (Xj -

n j=l [=1

p.)(X t - p.)'

122

Generalized Variance

Chapter 3 Sample Geometry and R(lndom Sampling

123

n

so

Result 3.1 shows that the (i, k)th entry, (n - 1)-1

:L (Xii -

Xi) (Xik - X k ), of

i=1

For j "# e, each entry in E(Xj - IL )(Xe - IL)' is zero because the entry is the
covariance between a component of Xi and a component of Xe, and these are
independent. [See Exercise 3.17 and (2-29).]
Therefore,

Since:I = E(Xj - 1L)(X j
each Xi' we have

IL)' is the common population covariance matrix.for

-

(Unbiased) Sample Variance-Covariance Matrix

n
= n12 ( I~
E(Xi

CoveX)

[nl (n - 1) ]Sn is an unbiased estimator of (Fi k' However, the individual sample standard deviations VS;, calculated with either n or n - 1 as a divisor, are not unbiased
estimators of the corresponding population quantities VU;;. Moreover, the correlation coefficients rik are not unbiased estimators of the population quantities Pik'
However, the bias E (~) - VU;;, or E(rik) - Pik> can usually be ignored if the
sample size n is moderately large.
Consideration of bias motivates a slightly modified definition of the sample
variance-covariance matrix. Result 3.1 provides us with an unbiased estimator S of :I:

- IL)(Xi - IL)'

)

= n12

(:I + :I + .,. + :I) ,
n terms

S=

Sn
(n n)
--

1

= -1~
- £.; (X· - -X)(x· - -x)'

n - 1 j=1

1

(3-11)

1

(.!.):I
n

= ..!..(n:I)
=
2

n

To obtain the expected value of Sn' we first note that (Xii - XJ (Xik - X k ) is
the (i, k)th element of (Xi - X) (Xj - X)'. The matrix representing sums of
squares and cross products can then be written as

n

Here S, without a subscript, has (i, k)th entry (n - 1)-1

:L (Xji -

Xi)(X/ k

-

X k ).

i=1

This definition of sample covariance is commonly used in many multivariate test
statistics. Therefore, it will replace Sn as the sample covariance matrix in most of the
material throughout the rest of this book.

n

=

2: XiX; - nXx'

3.4 Generalized Variance

j=1

n

, since

2: (Xi -

With a single variable, the sample variance is often used to describe the amount of
variation in the measurements on that variable. When p variables are observed on
each unit, the variation is described by the sample variance-covariance matrix

n

X) = 0 and nX'

=

2: X;. Therefore, its expected value is
i=1

i=1

l

Sll

For any random vector V with E(V) = ILv and Cov (V) = :Iv, we have E(VV')
:Iv + ILvlLv· (See Exercise 3.16.) Consequently,
E(XjXj) = :I

+

ILIL'

-and E(XX')

=

~

£.;

1
= -:I
+ ILIL'
n

-- = +
(1)
+
=
n
(1In) (± XiX; - nxx'),

E(XjX;) - nE(XX')

and thus, since Sn =

n:I

nlLlL' - n -:I

ILIL'

S~2

SIp

The sample covariance matrix contains p variances and !p(p - 1) potentially
different covariances. Sometimes it is desirable to assign a single numerical value for
the variation expressed by S. One choice for a value is the determinant of S, which
reduces to the usual sample variance of a single characteristic when p = 1. This
determinant 2 is called the generalized sample variance:

Using these results, we obtain

j=1

S =

(n - 1):I

Generalized sample variance =

it follows immediately that

Isi

(3-12)

1=1

(n - 1)
E(Sn) = - n - : I



2 Definition 2A.24 defines "determinant" and indicates one method for calculating the value of a
determinant.

124

Generalized Variance 125

Chapter 3 Sample Geometry and Random Sampling

,~\
,I,

Example 3.7 (Calculating a generalized variance) Employees (Xl) and profits per

employee (X2) for the 16 largest publishing firms in the United States are shown in
Figure 1.3. The sample covariance matrix, obtained from the data in the April 30,
1990,
magazine article, is

,I , 3

,I

3

Forbes

1\'

\

I(

-68.43J
123.67

\
\

,1\

\

I ,

2 \',

d"

Evaluate the generalized variance.
In this case, we compute
/S/

\

" ,
d

S = [252.04
-68.43

,

I', \

I"



= (252.04)(123.67) - (-68.43)(-68.43) = 26,487

The generalized sample variance provides one way of writing the information
on all variances and covariances as a single number. Of course, when p > 1, some
information about the sample is lost in the process. A geometrical interpretation of
/ S / will help us appreciate its strengths and weaknesses as a descriptive summary.
Consider the area generated within the plane by two deviation vectors
d l = YI - XII and d 2 = Yz - x21. Let Ldl be the length of d l and Ldz the length of
d z . By elementary geometry, we have the diagram

'---_2

~------------~2

(b)

(a)

Figure 3.6 (a) "Large" generalized sample variance for p = 3.

(b) "Small" generalized sample variance for p

= 3.

---------~-------------;.-

dl

If we compare (3-14) with (3-13), we see that

Height=Ldl sin «(I)

/S/ = (areafj(n - I)Z

and the area of the trapezoid is / Ld J sin ((1) /L d2 . Since cosz( (1)
express this area as

2

+ sin ( (1)

= 1, we can

Assuming now that / S / = (n - l)-(p-l) (volume )2 holds for the volume generated in n space by the p - 1 deviation vectors d l , d z, ... , d p - l , we can establish the
following general result for p deviation vectors by induction (see [1],p. 266):
GeneraIized sample variance = /S/ = (n -1)-P(volume)Z

From (3-5) and (3-7),
LdJ

=

±

VI

(xj1 - Xl)Z = V(n -

I)Sl1

j=l

and
cos«(1) =

r12

Therefore,
Area

= (n

Also,

/S/

=

=

- 1)~Vs;Vl - riz

= (n -l)"Vsl1 szz (1

I[;~: ;::J I I[~~r12
=

Sl1 S2Z

- sll s2z r iz =

Sl1 S 22(1

- rI2)

- r12)

~s:Ur12J I

(3-15)

Equation (3-15) says that the generalized sample variance, for a fixed set of data, is
3
proportional to the square of the volume generated by the p deviation vectors
d l = YI - XII, d 2 = Yz - x21, ... ,dp = Yp - xpl. Figures 3.6(a) and (b) show
trapezoidal regions, generated by p = 3 residual vectors, corresponding to "large"
and "small" generalized variances.
.
For a fixed sample size, it is clear from the geometry that volume, or / S /, will
increase when the length of any d i = Yi - XiI (or ~) is increased. In addition,
volume will increase if the residual vectors of fixed length are moved until they are
at right angles to one another, as in Figure 3.6(a). On the other hand, the volume,
or / S /, will be small if just one of the Sii is small or one of the deviation vectors lies
nearly in the (hyper) plane formed by the others, or both. In the second case, the
trapezoid has very little height above the plane. This is the situation in Figure 3.6(b),
where d 3 1ies nearly in me plane formed by d 1 and d 2 .
3 If generalized variance is defmed in terms of the samplecovariance matrix S. = [en - l)/njS, then,
using Result 2A.11,ISnl = I[(n - 1)/n]IpSI = I[(n -l)/njIpIlSI = [en - l)/nJPISI. Consequently,
using (3-15), we can also write the following: Generalized sample variance = IS.I = n volume? .

-pr

$
126 Chapter 3 Sample Geometry and Random Sampling

Generalized Variance
Generalized variance also has interpretations in the p-space scatter plot representa_
tion of the data. The most intuitive interpretation concerns the spread of the scatter
about the sample mean point x' = [XI, X2,"" xpJ. Consider the measure of distance_
given in the comment below (2-19), with xplaying the role of the fixed point p. and S-I
playing the role of A. With these choices, the coordinates x/ = [Xl> X2"'" xp) of the
points a constant distance c from x satisfy
(x - x)'S-I(X - i) =

7

• •
•• • •

• •
• •• • •
•• •
•• • •



• •• •


Cl

oS c2} =

.

..

[When p = 1, (x - x)/S-I(x. - x) = (XI - XI,2jSll is the squared distance from XI
to XI in standard deviation units.]
Equation (3-16) defines a hyperellipsoid (an ellipse if p = 2) centered at X. It
can be shown using integral calculus that the volume of this hyperellipsoid is related
to 1S I. In particular,
Volume of {x: (x - x)'S-I(x - i)

127

kplSII/2cP

7

(b)

or
(Volume of ellipsoid)2 = (constant) (generalized sample variance)



4

where the constant kp is rather formidable. A large volume corresponds to a large
generalized variance.
Although the generalized variance has some intuitively pleasing geometrical
interpretations, it suffers from a basic weakness as a descriptive summary of the
sample covariance matrix S, as the following example shows.

Example 3.8 (Interpreting the generalized variance) Figure 3.7 gives three scatter

plots with very different patterns of correlation.
All three data sets have x' = [2,1 J, and the covariance matrices are

S=

[54 54J

,r =.8 S =

[30 3DJ

,r = 0 S =

[-45 -4J5 '

r = -.8




7



. ••....
• ••• •
.. ..'.



7

._

x,

•e •

• ••

•• •



(c)

Figure 3.7 Scatter plots with three different orientations.

Each covariance matrix S contains the information on the variability of the
component variables and also the information required to calculate the correlation coefficient. In this sense, S captures the orientation and size of the pattern
of scatter.
The eigenvalues and eigenvectors extracted from S further describe the pattern
in the scatter plot. For
S=

4

at z.

[~

;l

the eigenvalues satisfy

0= (A - 5)2 - 42
= (A - 9)(A - 1)

For those who are curious, kp = 2-u1'/2/ p r(p/2). where f(z) denotes the gamma function evaluated

:n~we d~term[in.~ !,he eigenva]lue-eigenvector pairs Al = 9 ei = [1/\1'2 1/\/2] and
"2 - 1,e2 = 1/ v2, -1/\/2 .
"
The mean-centered ellipse, with center x' = [2 , 1] £or a I1 three cases, IS
.
(x - x),S-I(X - x) ::s c2
To describe this ellipse as in S ti 2 3 '
I
eigenvalue-eigenvecto; air fo~c on . ,,:,::th ~ = S~ , we notice that if (A, e) is an
S-I That' if S _ A P
S, .the? (A ,e) IS an elgenvalue-eigenvector pair for
-I' _ ,!? The - e, the? mu1tlplymg on the left by S-I givesS-ISe = AS-le or
S e -" e
erefore usmg t h ·
I
'
extends cvX; in the dir;ction of eiefr~:~~a ues from S, we know that the e11ipse

x,

tL

Generalized Variance

128 Chapter 3 Sample Geometry and Random Sampling

In p = 2 dimensions, the choice C Z = 5.99 will produce an ellipse that contains
approximately 95% of the observations. The vectors 3v'5.99 el and V5.99 ez are
drawn in Figure 3.8(a). Notice how the directions are the natural axes for the ellipse,
and observe that the lengths of these scaled eigenvectors are comparable to the size
of the pattern in each direction.
Next,for

s=[~ ~J.

0= (A - 3)z

the eigenvalues satisfy

and we arbitrarily choose the eigerivectors so that Al = 3, ei = [I, 0] and A2 = 3,
ei ,: [0, 1]. The vectors v'3 v'5]9 el and v'3 v'5:99 ez are drawn in Figure 3.8(b).

"2
7

7



,•

,•

• •




• •
• •
• • ••
• •
• ••



• • ••• •
• ••

•••


.

• •
7

XI

• •




(b)

(a)

129

Finally, for

S=

[ 5 -4J
-4

5'

the eigenval1les satisfy

o=
=

(A - 5)Z - (-4)Z
(A - 9) (A - 1)

and we determine theeigenvalue-eigenvectorpairs Al = 9, el = [1/V2, -1/V2J and
A2 = 1, ei = [1/V2, 1/V2J. The scaled eigenvectors 3V5.99 el and V5.99 e2 are
drawn in Figure 3.8(c).
In two dimensions, we can often sketch the axes of the mean-centered ellipse by
eye. However, the eigenvector approach also works for high dimensions where the
data cannot be examined visually.
Note: Here the generalized variance 1SI gives the same value, 1S I = 9, for all
three patterns. But generalized variance does not contain any information on the
orientation of the patterns. Generalized variance is easier to interpret when the two
or more samples (patterns) being compared have nearly the same orientations.
Notice that our three patterns of scatter appear to cover approximately the
same area. The ellipses that summarize the variability
(x - i)'S-I(X - i) :5 c2
do have exactly the same area [see (3-17)], since all have IS I = 9.



As Example 3.8 demonstrates, different correlation structures are not detected
by IS I. The situation for p > 2 can be even more obscure. .
Consequently, it is often desirable to provide more than the single number 1S I
_as a summary of S. From Exercise 2.12, IS I can be expressed as the product
AIAz'" Ap of the eigenvalues of S. Moreover, the mean-centered ellipsoid based on
S-I [see (3-16)] has axes. whose lengths are proportional to the square roots of the
A;'s (see Section 2.3). These eigenvalues then provide information on the variability
in all directions in the p-space representation of the data. It is useful, therefore, to
report their individual values, as well as their product. We shall pursue this topic
later when we discuss principal components.

x2

Situations in which the Generalized Sample Variance Is Zero

7


• • •
• • •• • O!

The generalized sample variance will be zero in certain situations. A generalized
variance of zero is indicative of extreme degeneracy, in the sense that at least one
column of the matrix of deviations,

.. -.

xi -

xi -:[

,

..

Xn -





••

(c)

Figure 3.8 Axes of the mean-centered 95% ellipses for the scatter plots in
Figure 3.7.

i']
i'

=

-,

[Xll - XlXl
X21

~

..

Xnl -

X

=

-

Xl

X-I

(nxp)

i'

Xlp X2p -

~p]
Xp

X np -

Xp
(3-18)

(nxI)(lxp)

can be expressed as a linear combination of the other columns. As we have shown
geometrically, this is a case where one of the deviation vectors-for instance, di =
[Xli - Xi'"'' Xni - xd-lies in the (hyper) plane generated by d 1 ,· .. , di-l>
di+l>"" d p .

130

Generalized Variance

Chapter 3 Sample Geometry and Random Sampling

13 1

3

Result 3.2. The generalized variance is zero when, and only when, at least one deviation vector lies in the (hyper) plane formed by all linear combinations of the
others-that is, when the columns of the matrix of deviations in (3-18) are linearly
dependent.
Proof. If the ct>lumns of the deviation matrix (X - li') are linearly dependent,
there is a linear combination of the columns such that
0= al coll(X - li') + ... + apcolp(X - li')

= (X -

li')a

for some a", 0

figure 3.9 A case where the
three-dimensional volume is zero
(/SI = 0).

3
4

But then, as you may verify, (n - 1)S = (X - li')'(X - Ix') and
(n - 1)Sa

= (X

- li')'(X - li')a

=0

so the same a corresponds to a linear dependency, al coll(S) + ... + ap colp(S) =
Sa = 0, in the columns of S. So, by Result 2A.9, 1S 1 = O.
In the other direction, if 1S 1 = 0, then there is some linear combination Sa of the
columns of S such that Sa = O. That is, 0 = (n - 1)Sa = (X - Ix')' (X - li') a.
Premultiplying by a' yields

and from Definition 2A.24,

ISI=3!!
=

~1(-1?+(-~)1-~ ~1(-1)3+(0)1-~

3 (1 - ~) + (~) (- ~ - 0) + 0 = ~ - ~

=

0

tl(-1)4



0= a'(X - li')' (X - li')a = Lfx-b')a

and, for the length to equal zero, we must have (X - li')a = O. Thus, the columns
of (X - li') are linearly dependent.
Example 3.9 (A case where the generalized variance is zero) Show that 1 S 1 = 0 for

1 2 5]
[

X = 4 1 6

(3X3)

4 0 4

and determine the degeneracy.
Here x' = [3,1, 5J, so
1- 3

X -

lX' =

[

4- 3

4- 3

~ =~ ~ =~] [-~1 -1~ -1~]
0-1 4 - 5
=

The deviation (column) vectors are di = [-2,1, 1J, d z = [1,0, -1], and
= d l + 2d2 , there is column degeneracy. (Note that there
3
is row degeneracy also.) This means that one of the deviation vectors-for example,
d -lies in the plane generated by the other two residual vectors. Consequently, the
three-dimensional volume is zero. This case is illustrated in Figure 3.9 and may be
verified algebraically by showing that IS I = O. We have
d = [0,1, -IJ. Since d3

3
S -

(3X3) - [

_J

~

-~1

0]

!

1

2

!

2

When large data sets are sent and received electronically, investigators are
sometimes unpleasantly surprised to find a case of zero generalized variance, so that
S does not have an inverse. We have encountered several such cases, with their associated difficulties, before the situation was unmasked. A singular covariance matrix
occurs when, for instance, the data are test scores and the investigator has included
variables that are sums of the others. For example, an algebra score and a geometry
score could be combined to give a total math score, or class midterm and final exam
scores summed to give total points. Once, the total weight of a number of chemicals
was included along with that of each component.
This common practice of creating new variables that are sums of the original
variables and then including them in the data set has caused enough lost time that
we emphasize the necessity of being alert to avoid these consequences.
Example 3.10 (Creating new variables that lead to a zero generalized variance)
Consider the data matrix

1 9 1610]
X=
10 12
13
[
4 12
2
5 8
3 11

14

where the third column is the sum of first two columns. These data could be the number of successful phone solicitations per day by a part-time and a full-time employee,
respectively, so the third column is the total number of successful solicitations per day.
Show that the generalized variance 1S 1 = 0, and determine the nature of the
dependency in the data.

132

Generalized Variance

Chapter 3 Sample Geometry and Random Sampling

We find that the mean corrected data matrix, with entries Xjk - xb is

X-

fi'

+1 ~~ ~1l

. [2.5 0 2.5]'
2.5 2.5
S= 0
2.5 2.5 5.0

We verify that, in this case, the generalized variance

IS I = 2.52 X 5 + 0 + 0 -

2.5 3

-

3

2.5 -.0

=0

In general, if the three columns of the data matrix X satisfy a linear constraint
al xjl + a2Xj2 + a3xj3 = c, a constant for all j, then alxl + a2 x2+ a3 x3 = c, so that

al(Xjl - Xl) + az(Xj2 - X2)

+ a3(Xj3 - X3) = 0

for all j. That is,

(X - li/)a

=

0

and the columns of the mean corrected data matrix are linearly dependent. Thus, the
inclusion of the third variable, which is linearly related to the first two, has led to the
case of a zero generalized variance.
Whenever the columns of the mean corrected data matrix are linearly dependent,

(n - I)Sa = (X - li/)/(X -li/)a = (X - li/)O = 0
and Sa = 0 establishes the linear dependency of the columns of S. Hence, IS I = o.
Since Sa = 0 = 0 a, we see th...

flag Report DMCA
Review

Anonymous
Top quality work from this guy! I'll be back!

Similar Questions
Hot Questions
Related Tags
Study Guides

Brown University





1271 Tutors

California Institute of Technology




2131 Tutors

Carnegie Mellon University




982 Tutors

Columbia University





1256 Tutors

Dartmouth University





2113 Tutors

Emory University





2279 Tutors

Harvard University





599 Tutors

Massachusetts Institute of Technology



2319 Tutors

New York University





1645 Tutors

Notre Dam University





1911 Tutors

Oklahoma University





2122 Tutors

Pennsylvania State University





932 Tutors

Princeton University





1211 Tutors

Stanford University





983 Tutors

University of California





1282 Tutors

Oxford University





123 Tutors

Yale University





2325 Tutors