DOKK Library

A First Course in Linear Algebra (Version 3.50)

Authors Robert A. Beezer

License GFDL-1.2-no-invariants-or-later

Plaintext
A First Course in Linear Algebra
A First Course in Linear Algebra

           Robert A. Beezer

       University of Puget Sound
             Version 3.50




            Congruent Press
Robert A. Beezer is a Professor of Mathematics at the University of Puget Sound,
where he has been on the faculty since 1984. He received a B.S. in Mathematics
(with an Emphasis in Computer Science) from the University of Santa Clara in 1978,
a M.S. in Statistics from the University of Illinois at Urbana-Champaign in 1982
and a Ph.D. in Mathematics from the University of Illinois at Urbana-Champaign
in 1984.
In addition to his teaching at the University of Puget Sound, he has made sabbatical
visits to the University of the West Indies (Trinidad campus) and the University
of Western Australia. He has also given several courses in the Master’s program at
the African Institute for Mathematical Sciences, South Africa. He has been a Sage
developer since 2008.
He teaches calculus, linear algebra and abstract algebra regularly, while his research
interests include the applications of linear algebra to graph theory. His professional
website is at http://buzzard.ups.edu.




Edition
Version 3.50
ISBN: 978-0-9844175-5-1


Cover Design
Aidan Meacham


Publisher
Robert A. Beezer
Congruent Press
Gig Harbor, Washington, USA


c 2004—2015      Robert A. Beezer

Permission is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.2 or any later version
published by the Free Software Foundation; with no Invariant Sections, no Front-
Cover Texts, and no Back-Cover Texts. A copy of the license is included in the
appendix entitled “GNU Free Documentation License”.
The most recent version can always be found at http://linear.pugetsound.edu.
To my wife, Pat.
Contents

Preface                                                                                                                              ix

Acknowledgements                                                                                                                    xiii

Systems of Linear Equations                                                                                                           1
   What is Linear Algebra? . . . . . . .            .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     1
   Solving Systems of Linear Equations              .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .     7
   Reduced Row-Echelon Form . . . . .               .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    17
   Types of Solution Sets . . . . . . . .           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    35
   Homogeneous Systems of Equations                 .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    45
   Nonsingular Matrices . . . . . . . . .           .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    51

Vectors                                                                                                                              57
  Vector Operations . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    57
  Linear Combinations . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    65
  Spanning Sets . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    83
  Linear Independence . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .    95
  Linear Dependence and Spans       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   105
  Orthogonality . . . . . . . . .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   115

Matrices                                                                                                                            125
  Matrix Operations . . . . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   125
  Matrix Multiplication . . . . . . . . . . . . . . .                           .   .   .   .   .   .   .   .   .   .   .   .   .   135
  Matrix Inverses and Systems of Linear Equations                               .   .   .   .   .   .   .   .   .   .   .   .   .   149
  Matrix Inverses and Nonsingular Matrices . . . .                              .   .   .   .   .   .   .   .   .   .   .   .   .   159
  Column and Row Spaces . . . . . . . . . . . . . .                             .   .   .   .   .   .   .   .   .   .   .   .   .   167
  Four Subsets . . . . . . . . . . . . . . . . . . . .                          .   .   .   .   .   .   .   .   .   .   .   .   .    181

Vector Spaces                                                                                                                       197
  Vector Spaces . . . . . . . . . . . .         . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   197
  Subspaces . . . . . . . . . . . . . .         . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   209
  Linear Independence and Spanning              Sets        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   221
  Bases . . . . . . . . . . . . . . . .         . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   233
  Dimension . . . . . . . . . . . . . .         . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   245
  Properties of Dimension . . . . . .           . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   255

Determinants                                                                                                                        261
  Determinant of a Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                     261
  Properties of Determinants of Matrices . . . . . . . . . . . . . . . . . . .                                                      273

Eigenvalues                                                                                                                         283
   Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . .                                                     283
   Properties of Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . .                                                        301
   Similarity and Diagonalization . . . . . . . . . . . . . . . . . . . . . . .                                                      311

Linear Transformations                                                                                                              323
   Linear Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                     323
   Injective Linear Transformations . . . . . . . . . . . . . . . . . . . . . .                                                     343
   Surjective Linear Transformations . . . . . . . . . . . . . . . . . . . . .                                                      355

                                                vii
   Invertible Linear Transformations . . . . . . . . . . . . . . . . . . . . . .                                                      369

Representations                                                                                                                       385
  Vector Representations . . . .      .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   385
  Matrix Representations . . .        .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   395
  Change of Basis . . . . . . . .     .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   415
  Orthonormal Diagonalization         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   435

Preliminaries                                                                                                                         445
  Complex Number Operations . . . . . . . . . . . . . . . . . . . . . . . .                                                           445
  Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .                                                    449

Reference                                                                                                                             453
  Proof Techniques . . . . .      . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   453
  Archetypes . . . . . . . .      . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   463
  Definitions . . . . . . . . .   . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   467
  Theorems . . . . . . . . .      . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   468
  Notation . . . . . . . . . .    . . . . .       .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   469
  GNU Free Documentation          License         .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   .   470




                                                  viii
Preface

This text is designed to teach the concepts and techniques of basic linear algebra
as a rigorous mathematical subject. Besides computational proficiency, there is an
emphasis on understanding definitions and theorems, as well as reading, understand-
ing and creating proofs. A strictly logical organization, complete and exceedingly
detailed proofs of every theorem, advice on techniques for reading and writing proofs,
and a selection of challenging theoretical exercises will slowly provide the novice
with the tools and confidence to be able to study other mathematical topics in a
rigorous fashion.
    Most students taking a course in linear algebra will have completed courses in
differential and integral calculus, and maybe also multivariate calculus, and will
typically be second-year students in university. This level of mathematical maturity is
expected, however there is little or no requirement to know calculus itself to use this
book successfully. With complete details for every proof, for nearly every example,
and for solutions to a majority of the exercises, the book is ideal for self-study, for
those of any age.
    While there is an abundance of guidance in the use of the software system, Sage,
there is no attempt to address the problems of numerical linear algebra, which are
arguably continuous in nature. Similarly, there is little emphasis on a geometric
approach to problems of linear algebra. While this may contradict the experience of
many experienced mathematicians, the approach here is consciously algebraic. As a
result, the student should be well-prepared to encounter groups, rings and fields in
future courses in algebra, or other areas of discrete mathematics.

How to Use This Book
While the book is divided into chapters, the main organizational unit is the thirty-
seven sections. Each contains a selection of definitions, theorems, and examples
interspersed with commentary. If you are enrolled in a course, read the section before
class and then answer the section’s reading questions as preparation for class.
    The version available for viewing in a web browser is the most complete, integrat-
ing all of the components of the book. Consider acquainting yourself with this version.
Knowls are indicated by a dashed underlines and will allow you to seamlessly remind
yourself of the content of definitions, theorems, examples, exercises, subsections and
more. Use them liberally.
    Historically, mathematics texts have numbered definitions and theorems. We
have instead adopted a strategy more appropriate to the heavy cross-referencing,
linking and knowling afforded by modern media. Mimicking an approach taken by
Donald Knuth, we have given items short titles and associated acronyms. You will
become comfortable with this scheme after a short time, and might even come to
appreciate its inherent advantages. In the web version, each chapter has a list of ten
or so important items from that chapter, and you will find yourself recognizing some
of these acronyms with no extra effort beyond the normal amount of study. Bruno
Mello suggests that some say an acronym should be pronounceable as a word (such
as “radar”), and otherwise is an abbreviation. We will not be so strict in our use of
the term.
    Exercises come in three flavors, indicated by the first letter of their label. “C”
indicates a problem that is essentially computational. “T” represents a problem
that is more theoretical, usually requiring a solution that is as rigorous as a proof.

                                          ix
“M” stands for problems that are “medium”, “moderate”, “midway”, “mediate” or
“median”, but never “mediocre.” Their statements could feel computational, but their
 solutions require a more thorough understanding of the concepts or theory, while
 perhaps not being as rigorous as a proof. Of course, such a tripartite division will
 be subject to interpretation. Otherwise, larger numerical values indicate greater
 perceived difficulty, with gaps allowing for the contribution of new problems from
 readers. Many, but not all, exercises have complete solutions. These are indicated
 by daggers in the PDF and print versions, with solutions available in an online
 supplement, while in the web version a solution is indicated by a knowl right after the
 problem statement. Resist the urge to peek early. Working the exercises diligently is
 the best way to master the material.
     The Archetypes are a collection of twenty-four archetypical examples. The open
 source lexical database, WordNet, defines an archetype as “something that serves as
 a model or a basis for making copies.” We employ the word in the first sense here.
 By carefully choosing the examples we hope to provide at least one example that
 is interesting and appropriate for many of the theorems and definitions, and also
 provide counterexamples to conjectures (and especially counterexamples to converses
 of theorems). Each archetype has numerous computational results which you could
 strive to duplicate as you encounter new definitions and theorems. There are some
 exercises which will help guide you in this quest.

Supplements
Print versions of the book (either a physical copy or a PDF version) have significant
material available as supplements. Solutions are contained in the Exercise Manual.
Advice on the use of the open source mathematical software system, Sage, is contained
in another supplement. (Look for a linear algebra “Quick Reference” sheet at the
Sage website.) The Archetypes are available in a PDF form which could be used
as a workbook. Flashcards, with the statement of every definition and theorem, in
order of appearance, are also available.

Freedom
This book is copyrighted by its author. Some would say it is his “intellectual property,”
a distasteful phrase if there ever was one. Rather than exercise all the restrictions
provided by the government-granted monopoly that is copyright, the author has
granted you a license, the GNU Free Documentation License (GFDL). In summary
it says you may receive an electronic copy at no cost via electronic networks and you
may make copies forever. So your copy of the book never has to go “out-of-print.”
You may redistribute copies and you may make changes to your copy for your own
use. However, you have one major responsibility in accepting this license. If you
make changes and distribute the changed version, then you must offer the same
license for the new version, you must acknowledge the original author’s work, and
you must indicate where you have made changes.
    In practice, if you see a change that needs to be made (like correcting an error,
or adding a particularly nice theoretical exercise), you may just wish to donate the
change to the author rather than create and maintain a new version. Such donations
are highly encouraged and gratefully accepted. You may notice the large number of
small mistakes that have been corrected by readers that have come before you. Pay
it forward.
    So, in one word, the book really is “free” (as in “no cost”). But the open license
employed is vastly different than “free to download, all rights reserved.” Most
importantly, you know that this book, and its ideas, are not the property of anyone.
Or they are the property of everyone. Either way, this book has its own inherent
“freedom,” separate from those who contribute to it. Much of this philosophy is
embodied in the following quote:

      If nature has made any one thing less susceptible than all others of
      exclusive property, it is the action of the thinking power called an idea,

                                           x
     which an individual may exclusively possess as long as he keeps it to
     himself; but the moment it is divulged, it forces itself into the possession
     of every one, and the receiver cannot dispossess himself of it. Its peculiar
     character, too, is that no one possesses the less, because every other
     possesses the whole of it. He who receives an idea from me, receives
     instruction himself without lessening mine; as he who lights his taper
     at mine, receives light without darkening me. That ideas should freely
     spread from one to another over the globe, for the moral and mutual
     instruction of man, and improvement of his condition, seems to have
     been peculiarly and benevolently designed by nature, when she made
     them, like fire, expansible over all space, without lessening their density
     in any point, and like the air in which we breathe, move, and have our
     physical being, incapable of confinement or exclusive appropriation.

                                                             Thomas Jefferson
                                                    Letter to Isaac McPherson
                                                               August 13, 1813

To the Instructor
The first half of this text (through Chapter M) is a course in matrix algebra, though
the foundation of some more advanced ideas is also being formed in these early
sections (such as Theorem NMUS, which presages invertible linear transformations).
Vectors are presented exclusively as column vectors (not transposes of row vectors),
and linear combinations are presented very early. Spans, null spaces, column spaces
and row spaces are also presented early, simply as sets, saving most of their vector
space properties for later, so they are familiar objects before being scrutinized
carefully.
    You cannot do everything early, so in particular matrix multiplication comes later
than usual. However, with a definition built on linear combinations of column vectors,
it should seem more natural than the more frequent definition using dot products
of rows with columns. And this delay emphasizes that linear algebra is built upon
vector addition and scalar multiplication. Of course, matrix inverses must wait for
matrix multiplication, but this does not prevent nonsingular matrices from occurring
sooner. Vector space properties are hinted at when vector and matrix operations
are first defined, but the notion of a vector space is saved for a more axiomatic
treatment later (Chapter VS). Once bases and dimension have been explored in
the context of vector spaces, linear transformations and their matrix representation
follow. The predominant purpose of the book is the four sections of Chapter R, which
introduces the student to representations of vectors and matrices, change-of-basis,
and orthonormal diagonalization (the spectral theorem). This final chapter pulls
together all the important ideas of the previous chapters.
    Our vector spaces use the complex numbers as the field of scalars. This avoids
the fiction of complex eigenvalues being used to form scalar multiples of eigenvectors.
The presence of the complex numbers in the earliest sections should not frighten
students who need a review, since they will not be used heavily until much later,
and Section CNO provides a quick review.
    Linear algebra is an ideal subject for the novice mathematics student to learn how
to develop a subject precisely, with all the rigor mathematics requires. Unfortunately,
much of this rigor seems to have escaped the standard calculus curriculum, so
for many university students this is their first exposure to careful definitions and
theorems, and the expectation that they fully understand them, to say nothing of the
expectation that they become proficient in formulating their own proofs. We have
tried to make this text as helpful as possible with this transition. Every definition
is stated carefully, set apart from the text. Likewise, every theorem is carefully
stated, and almost every one has a complete proof. Theorems usually have just one
conclusion, so they can be referenced precisely later. Definitions and theorems are
cataloged in order of their appearance (Definitions and Theorems in the Reference
chapter at the end of the book). Along the way, there are discussions of some more

                                          xi
important ideas relating to formulating proofs (Proof Techniques), which is partly
advice and partly a primer on logic.
    Collecting responses to the Reading Questions prior to covering material in class
will require students to learn how to read the material. Sections are designed to be
covered in a fifty-minute lecture. Later sections are longer, but as students become
more proficient at reading the text, it is possible to survey these longer sections at
the same pace. With solutions to many of the exercises, students may be given the
freedom to work homework at their own pace and style (individually, in groups, with
an instructor’s help, etc.). To compensate and keep students from falling behind, I
give an examination on each chapter.
    Sage is a powerful open source program for advanced mathematics. It is especially
robust for linear algebra. We have included an abundance of material which will help
the student (and instructor) learn how to use Sage for the study of linear algebra
and how to understand linear algebra better with Sage. This material is tightly
integrated with the web version of the book and will become even easier to use
since the technology for interfaces to Sage continues to rapidly evolve. Sage is highly
capable for mathematical research as well, and so should be a tool that students can
use in subsequent courses and careers.

Conclusion
Linear algebra is a beautiful subject. I have enjoyed preparing this exposition and
making it widely available. Much of my motivation for writing this book is captured
by the sentiments expressed by H.M. Cundy and A.P. Rollet in their Preface to the
First Edition of Mathematical Models (1952), especially the final sentence,
      This book was born in the classroom, and arose from the spontaneous
      interest of a Mathematical Sixth in the construction of simple models. A
      desire to show that even in mathematics one could have fun led to an
      exhibition of the results and attracted considerable attention throughout
      the school. Since then the Sherborne collection has grown, ideas have come
      from many sources, and widespread interest has been shown. It seems
      therefore desirable to give permanent form to the lessons of experience so
      that others can benefit by them and be encouraged to undertake similar
      work.

   Foremost, I hope that students find their time spent with this book profitable. I
hope that instructors find it flexible enough to fit the needs of their course. You can
always find the latest version, and keep current with any changes, at the book’s website
(http://linear.pugetsound.edu). I appreciate receiving suggestions, corrections,
and other comments, so please do contact me.

                                                                    Robert A. Beezer
                                                                 Tacoma, Washington
                                                                      December 2012




                                          xii
Acknowledgements

Many people have helped to make this book, and its freedoms, possible.
    First, the time to create, edit and distribute the book has been provided implicitly
and explicitly by the University of Puget Sound. A sabbatical leave Spring 2004, a
course release in Spring 2007 and a Lantz Senior Fellowship for the 2010-11 academic
year are three obvious examples of explicit support. The course release was provided
by support from the Lind-VanEnkevort Fund. The university has also provided
clerical support, computer hardware, network servers and bandwidth. Thanks to
Dean Kris Bartanen and the chairs of the Mathematics and Computer Science
Department, Professors Martin Jackson, Sigrun Bodine and Bryan Smith, for their
support, encouragement and flexibility.
    My colleagues in the Mathematics and Computer Science Department have
graciously taught our introductory linear algebra course using earlier versions and
have provided valuable suggestions that have improved the book immeasurably.
Thanks to Professor Martin Jackson (v0.30), Professor David Scott (v0.70), Professor
Bryan Smith (v0.70, 0.80, v1.00, v2.00, v2.20), Professor Manley Perkel (v2.10), and
Professor Cynthia Gibson (v2.20).
    University of Puget Sound librarians Lori Ricigliano, Elizabeth Knight and Jeanne
Kimura provided valuable advice on production, and interesting conversations about
copyrights.
    Many aspects of the book have been influenced by insightful questions and creative
suggestions from the students who have labored through the book in our courses. For
example, the flashcards with theorems and definitions are a direct result of a student
suggestion. I will single out a handful of students at the University of Puget Sound
who have been especially adept at finding and reporting mathematically significant
typographical errors: Jake Linenthal, Christie Su, Kim Le, Sarah McQuate, Andy
Zimmer, Travis Osborne, Andrew Tapay, Mark Shoemaker, Tasha Underhill, Tim
Zitzer, Elizabeth Million, Steve Canfield, Jinshil Yi, Cliff Berger, Preston Van Buren,
Duncan Bennett, Dan Messenger, Caden Robinson, Glenna Toomey, Tyler Ueltschi,
Kyle Whitcomb, Anna Dovzhik, Chris Spalding and Jenna Fontaine. All the students
of the Fall 2012 Math 290 sections were very helpful and patient through the major
changes required in making Version 3.00.
    I have tried to be as original as possible in the organization and presentation of
this beautiful subject. However, I have been influenced by many years of teaching
from another excellent textbook, Introduction to Linear Algebra by L.W. Johnson,
R.D. Reiss and J.T. Arnold. When I have needed inspiration for the correct approach
to particularly important proofs, I have learned to eventually consult two other
textbooks. Sheldon Axler’s Linear Algebra Done Right is a highly original exposition,
while Ben Noble’s Applied Linear Algebra frequently strikes just the right note
between rigor and intuition. Noble’s excellent book is highly recommended, even
though its publication dates to 1969.
    Conversion to various electronic formats have greatly depended on assistance from:
Eitan Gurari, author of the powerful LaTeX translator, tex4ht; Davide Cervone,
author of jsMath and MathJax; and Carl Witty, who advised and tested the Sony
Reader format. Thanks to these individuals for their critical assistance.
    Incorporation of Sage code is made possible by the entire community of Sage
developers and users, who create and refine the mathematical routines, the user
interfaces and applications in educational settings. Technical and logistical aspects of

                                          xiii
incorporating Sage code in open textbooks was supported by a grant from the United
States National Science Foundation (DUE-1022574), which has been administered by
the American Institute of Mathematics, and in particular, David Farmer. The support
and assistance of my fellow Principal Investigators, Jason Grout, Tom Judson, Kiran
Kedlaya, Sandra Laursen, Susan Lynds, and William Stein is especially appreciated.
    David Farmer and Sally Koutsoliotas are responsible for the vision and initial
experiments which lead to the knowl-enabled web version, as part of the Version 3
project.
    General support and encouragement of free and affordable textbooks, in addition
to specific promotion of this text, was provided by Nicole Allen, Textbook Advocate
at Student Public Interest Research Groups. Nicole was an early consumer of this
material, back when it looked more like lecture notes than a textbook.
    Finally, in every respect, the production and distribution of this book has been
accomplished with open source software. The range of individuals and projects is far
too great to pretend to list them all. This project is an attempt to pay it forward.

Contributors
 Name                  Location               Contact
 Beezer, David         Santa Clara U.
 Beezer, Robert        U. of Puget Sound      buzzard.ups.edu/
 Black, Chris
 Braithwaite, David    Chicago, Illinois
 Bucht, Sara           U. of Puget Sound
 Canfield, Steve       U. of Puget Sound
 Hubert, Dupont        Creteil, France
 Fellez, Sarah         U. of Puget Sound
 Fickenscher, Eric     U. of Puget Sound
 Jackson, Martin       U. of Puget Sound      www.math.ups.edu/~martinj
 Johnson, Chili        U. of Puget Sound
 Kessler, Ivan         U. of Puget Sound
 Kreher, Don           Michigan Tech. U.      www.math.mtu.edu/~kreher
 Hamrick, Mark         St. Louis U.
 Linenthal, Jacob      U. of Puget Sound
 Million, Elizabeth    U. of Puget Sound
 Osborne, Travis       U. of Puget Sound
 Riegsecker, Joe       Middlebury, Indiana    joepye(at)pobox(dot)com
 Perkel, Manley        U. of Puget Sound
 Phelps, Douglas       U. of Puget Sound
 Shoemaker, Mark       U. of Puget Sound
 Toth, Zoltan                                 zoli.web.elte.hu
 Ueltschi, Tyler       U. of Puget Sound
 Zimmer, Andy          U. of Puget Sound




                                        xiv
Chapter SLE
Systems of Linear Equations

We will motivate our study of linear algebra by studying solutions to systems of
linear equations. While the focus of this chapter is on the practical matter of how
to find, and describe, these solutions, we will also be setting ourselves up for more
theoretical ideas that will appear later.


Section WILA
What is Linear Algebra?
We begin our study of linear algebra with an introduction and a motivational example.


Subsection LA
Linear + Algebra
The subject of linear algebra can be partially explained by the meaning of the
 two terms comprising the title. “Linear” is a term you will appreciate better at
 the end of this course, and indeed, attaining this appreciation could be taken as
 one of the primary goals of this course. However for now, you can understand it
 to mean anything that is “straight” or “flat.” For example in the xy-plane you
 might be accustomed to describing straight lines (is there any other kind?) as
 the set of solutions to an equation of the form y = mx + b, where the slope m
 and the y-intercept b are constants that together describe the line. If you have
 studied multivariate calculus, then you will have encountered planes. Living in three
 dimensions, with coordinates described by triples (x, y, z), they can be described
 as the set of solutions to equations of the form ax + by + cz = d, where a, b, c, d
 are constants that together determine the plane. While we might describe planes as
“flat,” lines in three dimensions might be described as “straight.” From a multivariate
 calculus course you will recall that lines are sets of points described by equations
 such as x = 3t − 4, y = −7t + 2, z = 9t, where t is a parameter that can take on any
value.
     Another view of this notion of “flatness” is to recognize that the sets of points
 just described are solutions to equations of a relatively simple form. These equations
 involve addition and multiplication only. We will have a need for subtraction, and
 occasionally we will divide, but mostly you can describe “linear” equations as
 involving only addition and multiplication. Here are some examples of typical
 equations we will see in the next few sections:
   2x + 3y − 4z = 13     4x1 + 5x2 − x3 + x4 + x5 = 0      9a − 2b + 7c + 2d = −7
    What we will not see are equations like:
     xy + 5yz = 13      x1 + x32 /x4 − x3 x4 x25 = 0   tan(ab) + log(c − d) = −7
The exception will be that we will on occasion need to take a square root.
    You have probably heard the word “algebra” frequently in your mathematical
preparation for this course. Most likely, you have spent a good ten to fifteen years
learning the algebra of the real numbers, along with some introduction to the very

                                           1
2                            Ro b e rt B e e z e r                            §W I L A

 similar algebra of complex numbers (see Section CNO). However, there are many
 new algebras to learn and use, and likely linear algebra will be your second algebra.
 Like learning a second language, the necessary adjustments can be challenging at
 times, but the rewards are many. And it will make learning your third and fourth
 algebras even easier. Perhaps you have heard of “groups” and “rings” (or maybe
you have studied them already), which are excellent examples of other algebras with
very interesting properties and applications. In any event, prepare yourself to learn
 a new algebra and realize that some of the old rules you used for the real numbers
 may no longer apply to this new algebra you will be learning!
     The brief discussion above about lines and planes suggests that linear algebra
 has an inherently geometric nature, and this is true. Examples in two and three
 dimensions can be used to provide valuable insight into important concepts of this
 course. However, much of the power of linear algebra will be the ability to work with
“flat” or “straight” objects in higher dimensions, without concerning ourselves with
visualizing the situation. While much of our intuition will come from examples in
 two and three dimensions, we will maintain an algebraic approach to the subject,
with the geometry being secondary. Others may wish to switch this emphasis around,
 and that can lead to a very fruitful and beneficial course, but here and now we are
 laying our bias bare.


Subsection AA
An Application
We conclude this section with a rather involved example that will highlight some of
the power and techniques of linear algebra. Work through all of the details with pencil
and paper, until you believe all the assertions made. However, in this introductory
example, do not concern yourself with how some of the results are obtained or how
you might be expected to solve a similar problem. We will come back to this example
later and expose some of the techniques used and properties exploited. For now,
use your background in mathematics to convince yourself that everything said here
really is correct.
Example TMP Trail Mix Packaging
Suppose you are the production manager at a food-packaging plant and one of your
product lines is trail mix, a healthy snack popular with hikers and backpackers,
containing raisins, peanuts and hard-shelled chocolate pieces. By adjusting the mix
of these three ingredients, you are able to sell three varieties of this item. The fancy
version is sold in half-kilogram packages at outdoor supply stores and has more
chocolate and fewer raisins, thus commanding a higher price. The standard version
is sold in one kilogram packages in grocery stores and gas station mini-markets.
Since the standard version has roughly equal amounts of each ingredient, it is not as
expensive as the fancy version. Finally, a bulk version is sold in bins at grocery stores
for consumers to load into plastic bags in amounts of their choosing. To appeal to
the shoppers that like bulk items for their economy and healthfulness, this mix has
many more raisins (at the expense of chocolate) and therefore sells for less.
    Your production facilities have limited storage space and early each morning
you are able to receive and store 380 kilograms of raisins, 500 kilograms of peanuts
and 620 kilograms of chocolate pieces. As production manager, one of your most
important duties is to decide how much of each version of trail mix to make every
day. Clearly, you can have up to 1500 kilograms of raw ingredients available each
day, so to be the most productive you will likely produce 1500 kilograms of trail
mix each day. Also, you would prefer not to have any ingredients leftover each day,
so that your final product is as fresh as possible and so that you can receive the
maximum delivery the next morning. But how should these ingredients be allocated
to the mixing of the bulk, standard and fancy versions? First, we need a little more
information about the mixes. Workers mix the ingredients in 15 kilogram batches,
and each row of the table below gives a recipe for a 15 kilogram batch. There is some
additional information on the costs of the ingredients and the price the manufacturer
can charge for the different versions of the trail mix.
§W I L A               A First Course in Linear Algebra                                3

                      Raisins       Peanuts      Chocolate       Cost     Sale Price
                    (kg/batch)     (kg/batch)    (kg/batch)     ($/kg)     ($/kg)
    Bulk                 7             6             2           3.69        4.99
    Standard             6             4             5           3.86        5.50
    Fancy                2             5             8           4.45        6.50
    Storage (kg)       380            500           620
    Cost ($/kg)        2.55           4.65          4.80

    As production manager, it is important to realize that you only have three
decisions to make — the amount of bulk mix to make, the amount of standard mix to
make and the amount of fancy mix to make. Everything else is beyond your control
or is handled by another department within the company. Principally, you are also
limited by the amount of raw ingredients you can store each day. Let us denote the
amount of each mix to produce each day, measured in kilograms, by the variable
quantities b, s and f . Your production schedule can be described as values of b, s
and f that do several things. First, we cannot make negative quantities of each mix,
so
                   b≥0                   s≥0                     f ≥0
    Second, if we want to consume all of our ingredients each day, the storage capacities
lead to three (linear) equations, one for each ingredient,
                   7     6      2
                     b + s + f = 380                        (raisins)
                  15    15     15
                   6     4      5
                     b + s + f = 500                        (peanuts)
                  15    15     15
                   2     5      8
                     b + s + f = 620                        (chocolate)
                  15    15     15
    It happens that this system of three equations has just one solution. In other
words, as production manager, your job is easy, since there is but one way to use up
all of your raw ingredients making trail mix. This single solution is
              b = 300 kg              s = 300 kg               f = 900 kg.
    We do not yet have the tools to explain why this solution is the only one, but it
should be simple for you to verify that this is indeed a solution. (Go ahead, we will
wait.) Determining solutions such as this, and establishing that they are unique, will
be the main motivation for our initial study of linear algebra.
    So we have solved the problem of making sure that we make the best use of
our limited storage space, and each day use up all of the raw ingredients that are
shipped to us. Additionally, as production manager, you must report weekly to the
CEO of the company, and you know he will be more interested in the profit derived
from your decisions than in the actual production levels. So you compute,
           300(4.99 − 3.69) + 300(5.50 − 3.86) + 900(6.50 − 4.45) = 2727.00
for a daily profit of $2,727 from this production schedule. The computation of the
daily profit is also beyond our control, though it is definitely of interest, and it too
looks like a “linear” computation.
    As often happens, things do not stay the same for long, and now the marketing
department has suggested that your company’s trail mix products standardize on
every mix being one-third peanuts. Adjusting the peanut portion of each recipe by
also adjusting the chocolate portion leads to revised recipes, and slightly different
costs for the bulk and standard mixes, as given in the following table.

                      Raisins       Peanuts      Chocolate       Cost     Sale Price
                    (kg/batch)     (kg/batch)    (kg/batch)     ($/kg)     ($/kg)
    Bulk                 7             5             3           3.70        4.99
    Standard             6             5             4           3.85        5.50
    Fancy                2             5             8           4.45        6.50
    Storage (kg)       380            500           620
    Cost ($/kg)        2.55           4.65          4.80
4                             Ro b e rt B e e z e r                            §W I L A

      In a similar fashion as before, we desire values of b, s and f so that
                   b≥0                     s≥0                   f ≥0
and
                   7     6       2
                     b + s + f = 380                       (raisins)
                  15    15      15
                   5     5       5
                     b + s + f = 500                       (peanuts)
                  15    15      15
                   3     4       8
                     b + s + f = 620                       (chocolate)
                  15    15      15
    It now happens that this system of equations has infinitely many solutions,
as we will now demonstrate. Let f remain a variable quantity. Then suppose we
make f kilograms of the fancy mix, b = 4f − 3300 kilograms of the bulk mix, and
s = −5f + 4800 kilograms of the standard mix. We now show that these choices,
for any value of f , will yield a production schedule that exhausts all of the day’s
supply of raw ingredients. (We will very soon learn how to solve systems of equations
with infinitely many solutions and then determine expressions like these for b and
s). Grab your pencil and paper and play along by substituting these choices for the
production schedule into the storage limits for each raw ingredient and simpliying
the algebra.
             7                  6                  2            5700
               (4f − 3300) + (−5f + 4800) + f = 0f +                 = 380
            15                 15                  15            15
             5                  5                  5            7500
               (4f − 3300) + (−5f + 4800) + f = 0f +                 = 500
            15                 15                  15            15
             3                  4                  8            9300
               (4f − 3300) + (−5f + 4800) + f = 0f +                 = 620
            15                 15                  15            15
    Convince yourself that these expressions for b and s allow us to vary f and obtain
an infinite number of possibilities for solutions to the three equations that describe
our storage capacities. As a practical matter, there really are not an infinite number
of solutions, since we are unlikely to want to end the day with a fractional number
of bags of fancy mix, so our allowable values of f should probably be integers. More
importantly, we need to remember that we cannot make negative amounts of each
mix! Where does this lead us? Positive quantities of the bulk mix requires that
                      b≥0      ⇒    4f − 3300 ≥ 0       ⇒    f ≥ 825
      Similarly for the standard mix,
                     s≥0      ⇒    −5f + 4800 ≥ 0        ⇒   f ≤ 960
      So, as production manager, you really have to choose a value of f from the finite
set
                                   {825, 826, . . . , 960}
leaving you with 136 choices, each of which will exhaust the day’s supply of raw
ingredients. Pause now and think about which you would choose.
    Recalling your weekly meeting with the CEO suggests that you might want to
choose a production schedule that yields the biggest possible profit for the company.
So you compute an expression for the profit based on your as yet undetermined
decision for the value of f ,
       (4f − 3300)(4.99 − 3.70) + (−5f + 4800)(5.50 − 3.85) + (f )(6.50 − 4.45)
             = −1.04f + 3663
    Since f has a negative coefficient it would appear that mixing fancy mix is
detrimental to your profit and should be avoided. So you will make the decision
to set daily fancy mix production at f = 825. This has the effect of setting b =
4(825) − 3300 = 0 and we stop producing bulk mix entirely. So the remainder of your
daily production is standard mix at the level of s = −5(825) + 4800 = 675 kilograms
and the resulting daily profit is (−1.04)(825) + 3663 = 2805. It is a pleasant surprise
that daily profit has risen to $2,805, but this is not the most important part of the
story. What is important here is that there are a large number of ways to produce
§W I L A                A First Course in Linear Algebra                                   5

trail mix that use all of the day’s worth of raw ingredients and you were able to
easily choose the one that netted the largest profit. Notice too how all of the above
computations look “linear.”
    In the food industry, things do not stay the same for long, and now the sales
department says that increased competition has led to the decision to stay competitive
and charge just $5.25 for a kilogram of the standard mix, rather than the previous
$5.50 per kilogram. This decision has no effect on the possibilities for the production
schedule, but will affect the decision based on profit considerations. So you revisit
just the profit computation, suitably adjusted for the new selling price of standard
mix,
      (4f − 3300)(4.99 − 3.70) + (−5f + 4800)(5.25 − 3.85) + (f )(6.50 − 4.45)
           = 0.21f + 2463
    Now it would appear that fancy mix is beneficial to the company’s profit since
the value of f has a positive coefficient. So you take the decision to make as much
fancy mix as possible, setting f = 960. This leads to s = −5(960) + 4800 = 0 and the
increased competition has driven you out of the standard mix market all together. The
remainder of production is therefore bulk mix at a daily level of b = 4(960) − 3300 =
540 kilograms and the resulting daily profit is 0.21(960) + 2463 = 2664.60. A daily
profit of $2,664.60 is less than it used to be, but as production manager, you have
made the best of a difficult situation and shown the sales department that the best
course is to pull out of the highly competitive standard mix market completely. 4
    This example is taken from a field of mathematics variously known by names such
as operations research, systems science, or management science. More specifically,
this is a prototypical example of problems that are solved by the techniques of “linear
programming.”
    There is a lot going on under the hood in this example. The heart of the matter is
the solution to systems of linear equations, which is the topic of the next few sections,
and a recurrent theme throughout this course. We will return to this example on
several occasions to reveal some of the reasons for its behavior.

Reading Questions

1. Is the equation x2 + xy + tan(y 3 ) = 0 linear or not? Why or why not?
2. Find all solutions to the system of two linear equations 2x + 3y = −8, x − y = 6.
3. Describe how the production manager might explain the importance of the procedures
   described in the trail mix application (Subsection WILA.AA).

Exercises
C10 In Example TMP the first table lists the cost (per kilogram) to manufacture each of
the three varieties of trail mix (bulk, standard, fancy). For example, it costs $3.69 to make
one kilogram of the bulk variety. Re-compute each of these three costs and notice that the
computations are linear in character.
M70† In Example TMP two different prices were considered for marketing standard mix
with the revised recipes (one-third peanuts in each recipe). Selling standard mix at $5.50
resulted in selling the minimum amount of the fancy mix and no bulk mix. At $5.25 it
was best for profits to sell the maximum amount of fancy mix and then sell no standard
mix. Determine a selling price for standard mix that allows for maximum profits while still
selling some of each type of mix.
6   Ro b e rt B e e z e r   §W I L A
Section SSLE
Solving Systems of Linear Equations
We will motivate our study of linear algebra by considering the problem of solving
several linear equations simultaneously. The word “solve” tends to get abused
somewhat, as in “solve this problem.” When talking about equations we understand
a more precise meaning: find all of the values of some variable quantities that make
an equation, or several equations, simultaneously true.

Subsection SLE
Systems of Linear Equations
Our first example is of a type we will not pursue further. While it has two equations,
the first is not linear. So this is a good example to come back to later, especially
after you have seen Theorem PSSLS.
Example STNE Solving two (nonlinear) equations
Suppose we desire the simultaneous solutions of the two equations,
                                       x2 + y 2 = 1
                                          √
                                     −x + 3y = 0

                                                      √                     √
    You can easily check by substitution that x = 23 , y = 12 and x = − 23 , y = − 21
are both solutions. We need to also convince ourselves that these are the only
solutions. To see this, plot each equation on the xy-plane, which means to plot (x, y)
pairs that make an individual equation true. In this case we get a circle centered at
the origin with radius 1 and a straight line through the origin with slope √13 . The
intersections of these two curves are our desired simultaneous solutions, and so we
believe from our plot that the two solutions we know already are indeed the only
ones. We like to write solutions as sets, so in this case we write the set of solutions as
                                  n √       √           o
                                       3 1          3    1
                             S=       2 , 2 , − 2 , −2

                                                                                       4
    In order to discuss systems of linear equations carefully, we need a precise
definition. And before we do that, we will introduce our periodic discussions about
“Proof Techniques.” Linear algebra is an excellent setting for learning how to read,
understand and formulate proofs. But this is a difficult step in your development as
a mathematician, so we have included a series of short essays containing advice and
explanations to help you along. These will be referenced in the text as needed, and
are also collected as a list you can consult when you want to return to re-read them.
(Which is strongly encouraged!)
    With a definition next, now is the time for the first of our proof techniques. So
study Proof Technique D. We’ll be right here when you get back. See you in a bit.
Definition SLE System of Linear Equations
A system of linear equations is a collection of m equations in the variable
quantities x1 , x2 , x3 , . . . , xn of the form,
                       a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1
                       a21 x1 + a22 x2 + a23 x3 + · · · + a2n xn = b2
                       a31 x1 + a32 x2 + a33 x3 + · · · + a3n xn = b3
                                                                ..
                                                                 .
                    am1 x1 + am2 x2 + am3 x3 + · · · + amn xn = bm
where the values of aij , bi and xj , 1 ≤ i ≤ m, 1 ≤ j ≤ n, are from the set of complex
numbers, C.                                                                          
    Do not let the mention of the complex numbers, C, rattle you. We will stick with

                                            7
8                                Ro b e rt B e e z e r                                  §S S L E

real numbers exclusively for many more sections, and it will sometimes seem like
we only work with integers! However, we want to leave the possibility of complex
numbers open, and there will be occasions in subsequent sections where they are
necessary. You can review the basic properties of complex numbers in Section CNO,
but these facts will not be critical until we reach Section O.
   Now we make the notion of a solution to a linear system precise.
Definition SSLE Solution of a System of Linear Equations
A solution of a system of linear equations in n variables, x1 , x2 , x3 , . . . , xn (such
as the system given in Definition SLE), is an ordered list of n complex numbers,
s1 , s2 , s3 , . . . , sn such that if we substitute s1 for x1 , s2 for x2 , s3 for x3 , . . . , sn
for xn , then for every equation of the system the left side will equal the right side,
i.e. each equation is true simultaneously.                                                        
    More typically, we will write a solution in a form like x1 = 12, x2 = −7, x3 = 2 to
mean that s1 = 12, s2 = −7, s3 = 2 in the notation of Definition SSLE. To discuss
all of the possible solutions to a system of linear equations, we now define the set
of all solutions. (So Section SET is now applicable, and you may want to go and
familiarize yourself with what is there.)
Definition SSSLE Solution Set of a System of Linear Equations
The solution set of a linear system of equations is the set which contains every
solution to the system, and nothing more.                                     
    Be aware that a solution set can be infinite, or there can be no solutions, in
which case we write the solution set as the empty set, ∅ = {} (Definition ES). Here
is an example to illustrate using the notation introduced in Definition SLE and the
notion of a solution (Definition SSLE).
Example NSE Notation for a system of equations
Given the system of linear equations,
                                           x1 + 2x2 + x4 = 7
                                      x1 + x2 + x3 − x4 = 3
                                  3x1 + x2 + 5x3 − 7x4 = 1
we have n = 4 variables and m = 3 equations. Also,
         a11 = 1          a12 = 2           a13 = 0           a14 = 1             b1 = 7
         a21 = 1          a22 = 1           a23 = 1           a24 = −1            b2 = 3
         a31 = 3          a32 = 1           a33 = 5           a34 = −7            b3 = 1
    Additionally, convince yourself that x1 = −2, x2 = 4, x3 = 2, x4 = 1 is one
solution (Definition SSLE), but it is not the only one! For example, another solution
is x1 = −12, x2 = 11, x3 = 1, x4 = −3, and there are more to be found. So the
solution set contains at least two elements.                                       4
    We will often shorten the term “system of linear equations” to “system of
equations” leaving the linear aspect implied. After all, this is a book about linear
algebra.

Subsection PSS
Possibilities for Solution Sets
The next example illustrates the possibilities for the solution set of a system of linear
equations. We will not be too formal here, and the necessary theorems to back up
our claims will come in subsequent sections. So read for feeling and come back later
to revisit this example.
Example TTS Three typical systems
Consider the system of two equations with two variables,
                                         2x1 + 3x2 = 3
                                           x1 − x2 = 4
§S S L E               A First Course in Linear Algebra                                9

    If we plot the solutions to each of these equations separately on the x1 x2 -plane,
we get two lines, one with negative slope, the other with positive slope. They have
exactly one point in common, (x1 , x2 ) = (3, −1), which is the solution x1 = 3,
x2 = −1. From the geometry, we believe that this is the only solution to the system
of equations, and so we say it is unique.
    Now adjust the system with a different second equation,
                                     2x1 + 3x2 = 3
                                     4x1 + 6x2 = 6
   A plot of the solutions to these equations individually results in two lines, one on
top of the other! There are infinitely many pairs of points that make both equations
true. We will learn shortly how to describe this infinite solution set precisely (see
Example SAA, Theorem VFSLS). Notice now how the second equation is just a
multiple of the first.
   One more minor adjustment provides a third system of linear equations,
                                    2x1 + 3x2 = 3
                                    4x1 + 6x2 = 10
   A plot now reveals two lines with identical slopes, i.e. parallel lines. They have
no points in common, and so the system has a solution set that is empty, S = ∅.4
    This example exhibits all of the typical behaviors of a system of equations. A
subsequent theorem will tell us that every system of linear equations has a solution
set that is empty, contains a single solution or contains infinitely many solutions
(Theorem PSSLS). Example STNE yielded exactly two solutions, but this does not
contradict the forthcoming theorem. The equations in Example STNE are not linear
because they do not match the form of Definition SLE, and so we cannot apply
Theorem PSSLS in this case.

Subsection ESEO
Equivalent Systems and Equation Operations
With all this talk about finding solution sets for systems of linear equations, you
might be ready to begin learning how to find these solution sets yourself. We begin
with our first definition that takes a common word and gives it a very precise meaning
in the context of systems of linear equations.
Definition ESYS Equivalent Systems
Two systems of linear equations are equivalent if their solution sets are equal. 
     Notice here that the two systems of equations could look very different (i.e. not
 be equal), but still have equal solution sets, and we would then call the systems
 equivalent. Two linear equations in two variables might be plotted as two lines
 that intersect in a single point. A different system, with three equations in two
variables might have a plot that is three lines, all intersecting at a common point,
with this common point identical to the intersection point for the first system. By our
 definition, we could then say these two very different looking systems of equations
 are equivalent, since they have identical solution sets. It is really like a weaker form
 of equality, where we allow the systems to be different in some respects, but we use
 the term equivalent to highlight the situation when their solution sets are equal.
     With this definition, we can begin to describe our strategy for solving linear
 systems. Given a system of linear equations that looks difficult to solve, we would
 like to have an equivalent system that is easy to solve. Since the systems will have
 equal solution sets, we can solve the “easy” system and get the solution set to the
“difficult” system. Here come the tools for making this strategy viable.
Definition EO Equation Operations
Given a system of linear equations, the following three operations will transform the
system into a different one, and each operation is known as an equation operation.

   1. Swap the locations of two equations in the list of equations.
10                                Ro b e rt B e e z e r                                  §S S L E

     2. Multiply each term of an equation by a nonzero quantity.
     3. Multiply each term of one equation by some quantity, and add these terms to
        a second equation, on both sides of the equality. Leave the first equation the
        same after this operation, but replace the second equation by the new one.
                                                                                                 
    These descriptions might seem a bit vague, but the proof or the examples that
follow should make it clear what is meant by each. We will shortly prove a key
theorem about equation operations and solutions to linear systems of equations.
    We are about to give a rather involved proof, so a discussion about just what a
theorem really is would be timely. Stop and read Proof Technique T first.
    In the theorem we are about to prove, the conclusion is that two systems are
equivalent. By Definition ESYS this translates to requiring that solution sets be
equal for the two systems. So we are being asked to show that two sets are equal.
How do we do this? Well, there is a very standard technique, and we will use it
repeatedly through the course. If you have not done so already, head to Section SET
and familiarize yourself with sets, their operations, and especially the notion of set
equality, Definition SE, and the nearby discussion about its use.
    The following theorem has a rather long proof. This chapter contains a few very
necessary theorems like this, with proofs that you can safely skip on a first reading.
You might come back to them later, when you are more comfortable with reading
and studying proofs.
Theorem EOPSS Equation Operations Preserve Solution Sets
If we apply one of the three equation operations of Definition EO to a system of
linear equations (Definition SLE), then the original system and the transformed
system are equivalent.
Proof. We take each equation operation in turn and show that the solution sets of
the two systems are equal, using the definition of set equality (Definition SE).
     1. It will not be our habit in proofs to resort to saying statements are “obvious,”
        but in this case, it should be. There is nothing about the order in which we
        write linear equations that affects their solutions, so the solution set will be
        equal if the systems only differ by a rearrangement of the order of the equations.
     2. Suppose α 6= 0 is a number. Let us choose to multiply the terms of equation i
        by α to build the new system of equations,
                               a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1
                               a21 x1 + a22 x2 + a23 x3 + · · · + a2n xn = b2
                               a31 x1 + a32 x2 + a33 x3 + · · · + a3n xn = b3
                                                                        ..
                                                                         .
                         αai1 x1 + αai2 x2 + αai3 x3 + · · · + αain xn = αbi
                                                                      ..
                                                                       .
                           am1 x1 + am2 x2 + am3 x3 + · · · + amn xn = bm
        Let S denote the solutions to the system in the statement of the theorem, and
        let T denote the solutions to the transformed system.

         (a) Show S ⊆ T . Suppose (x1 , x2 , x3 , . . . , xn ) = (β1 , β2 , β3 , . . . , βn ) ∈ S is
             a solution to the original system. Ignoring the i-th equation for a moment,
             we know it makes all the other equations of the transformed system true.
             We also know that
                                   ai1 β1 + ai2 β2 + ai3 β3 + · · · + ain βn = bi
             which we can multiply by α to get
                            αai1 β1 + αai2 β2 + αai3 β3 + · · · + αain βn = αbi
§S S L E              A First Course in Linear Algebra                                       11

           This says that the i-th equation of the transformed system is also true, so
           we have established that (β1 , β2 , β3 , . . . , βn ) ∈ T , and therefore S ⊆ T .
      (b) Now show T ⊆ S. Suppose (x1 , x2 , x3 , . . . , xn ) = (β1 , β2 , β3 , . . . , βn ) ∈
          T is a solution to the transformed system. Ignoring the i-th equation
          for a moment, we know it makes all the other equations of the original
          system true. We also know that
                         αai1 β1 + αai2 β2 + αai3 β3 + · · · + αain βn = αbi
                                            1
           which we can multiply by         α,   since α 6= 0, to get
                                ai1 β1 + ai2 β2 + ai3 β3 + · · · + ain βn = bi


           This says that the i-th equation of the original system is also true, so
           we have established that (β1 , β2 , β3 , . . . , βn ) ∈ S, and therefore T ⊆ S.
           Locate the key point where we required that α 6= 0, and consider what
           would happen if α = 0.

  3. Suppose α is a number. Let us choose to multiply the terms of equation i by
     α and add them to equation j in order to build the new system of equations,
                                            a11 x1 + a12 x2 + · · · + a1n xn = b1
                                            a21 x1 + a22 x2 + · · · + a2n xn = b2
                                            a31 x1 + a32 x2 + · · · + a3n xn = b3
                                                                            ..
                                                                             .
           (αai1 + aj1 )x1 + (αai2 + aj2 )x2 + · · · + (αain + ajn )xn = αbi + bj
                                                                      ..
                                                                       .
                                        am1 x1 + am2 x2 + · · · + amn xn = bm
     Let S denote the solutions to the system in the statement of the theorem, and
     let T denote the solutions to the transformed system.

      (a) Show S ⊆ T . Suppose (x1 , x2 , x3 , . . . , xn ) = (β1 , β2 , β3 , . . . , βn ) ∈ S is
          a solution to the original system. Ignoring the j-th equation for a moment,
          we know this solution makes all the other equations of the transformed
          system true. Using the fact that the solution makes the i-th and j-th
          equations of the original system true, we find
           (αai1 + aj1 )β1 + (αai2 + aj2 )β2 + · · · + (αain + ajn )βn
              = (αai1 β1 + αai2 β2 + · · · + αain βn ) + (aj1 β1 + aj2 β2 + · · · + ajn βn )
              = α(ai1 β1 + ai2 β2 + · · · + ain βn ) + (aj1 β1 + aj2 β2 + · · · + ajn βn )
              = αbi + bj .
           This says that the j-th equation of the transformed system is also true, so
           we have established that (β1 , β2 , β3 , . . . , βn ) ∈ T , and therefore S ⊆ T .
      (b) Now show T ⊆ S. Suppose (x1 , x2 , x3 , . . . , xn ) = (β1 , β2 , β3 , . . . , βn ) ∈
          T is a solution to the transformed system. Ignoring the j-th equation
          for a moment, we know it makes all the other equations of the original
          system true. We then find
                  aj1 β1 + · · · + ajn βn
                        = aj1 β1 + · · · + ajn βn + αbi − αbi
                        = aj1 β1 + · · · + ajn βn + (αai1 β1 + · · · + αain βn ) − αbi
                        = aj1 β1 + αai1 β1 + · · · + ajn βn + αain βn − αbi
                        = (αai1 + aj1 )β1 + · · · + (αain + ajn )βn − αbi
                        = αbi + bj − αbi
                        = bj
12                           Ro b e rt B e e z e r                            §S S L E

           This says that the j-th equation of the original system is also true, so we
           have established that (β1 , β2 , β3 , . . . , βn ) ∈ S, and therefore T ⊆ S.

      Why did we not need to require that α =  6 0 for this row operation? In other
      words, how does the third statement of the theorem read when α = 0? Does
      our proof require some extra care when α = 0? Compare your answers with
      the similar situation for the second row operation. (See Exercise SSLE.T20.)

                                                                                     

     Theorem EOPSS is the necessary tool to complete our strategy for solving systems
of equations. We will use equation operations to move from one system to another,
all the while keeping the solution set the same. With the right sequence of operations,
we will arrive at a simpler equation to solve. The next two examples illustrate this
idea, while saving some of the details for later.
Example US Three equations, one solution
We solve the following system by a sequence of equation operations.
                                 x1 + 2x2 + 2x3 = 4
                                 x1 + 3x2 + 3x3 = 5
                               2x1 + 6x2 + 5x3 = 6
α = −1 times equation 1, add to equation 2:
                                 x1 + 2x2 + 2x3 = 4
                               0x1 + 1x2 + 1x3 = 1
                               2x1 + 6x2 + 5x3 = 6
α = −2 times equation 1, add to equation 3:
                                 x1 + 2x2 + 2x3 = 4
                               0x1 + 1x2 + 1x3 = 1
                               0x1 + 2x2 + 1x3 = −2
α = −2 times equation 2, add to equation 3:
                                 x1 + 2x2 + 2x3 = 4
                               0x1 + 1x2 + 1x3 = 1
                               0x1 + 0x2 − 1x3 = −4
α = −1 times equation 3:
                                 x1 + 2x2 + 2x3 = 4
                               0x1 + 1x2 + 1x3 = 1
                               0x1 + 0x2 + 1x3 = 4
which can be written more clearly as
                                 x1 + 2x2 + 2x3 = 4
                                        x2 + x3 = 1
                                              x3 = 4
     This is now a very easy system of equations to solve. The third equation requires
that x3 = 4 to be true. Making this substitution into equation 2 we arrive at x2 = −3,
and finally, substituting these values of x2 and x3 into the first equation, we find
that x1 = 2. Note too that this is the only solution to this final system of equations,
since we were forced to choose these values to make the equations true. Since we
performed equation operations on each system to obtain the next one in the list, all
of the systems listed here are all equivalent to each other by Theorem EOPSS. Thus
(x1 , x2 , x3 ) = (2, −3, 4) is the unique solution to the original system of equations
§S S L E             A First Course in Linear Algebra                               13

(and all of the other intermediate systems of equations listed as we transformed one
into another).                                                                    4
Example IS Three equations, infinitely many solutions
The following system of equations made an appearance earlier in this section (Ex-
ample NSE), where we listed one of its solutions. Now, we will try to find all of the
solutions to this system. Do not concern yourself too much about why we choose
this particular sequence of equation operations, just believe that the work we do is
all correct.
                              x1 + 2x2 + 0x3 + x4 = 7
                                 x1 + x2 + x3 − x4 = 3
                             3x1 + x2 + 5x3 − 7x4 = 1
α = −1 times equation 1, add to equation 2:
                              x1 + 2x2 + 0x3 + x4 = 7
                              0x1 − x2 + x3 − 2x4 = −4
                             3x1 + x2 + 5x3 − 7x4 = 1
α = −3 times equation 1, add to equation 3:
                              x1 + 2x2 + 0x3 + x4 = 7
                              0x1 − x2 + x3 − 2x4 = −4
                           0x1 − 5x2 + 5x3 − 10x4 = −20
α = −5 times equation 2, add to equation 3:
                              x1 + 2x2 + 0x3 + x4 = 7
                              0x1 − x2 + x3 − 2x4 = −4
                            0x1 + 0x2 + 0x3 + 0x4 = 0
α = −1 times equation 2:
                              x1 + 2x2 + 0x3 + x4 = 7
                              0x1 + x2 − x3 + 2x4 = 4
                            0x1 + 0x2 + 0x3 + 0x4 = 0
α = −2 times equation 2, add to equation 1:
                             x1 + 0x2 + 2x3 − 3x4 = −1
                              0x1 + x2 − x3 + 2x4 = 4
                            0x1 + 0x2 + 0x3 + 0x4 = 0
which can be written more clearly as
                                    x1 + 2x3 − 3x4 = −1
                                     x2 − x3 + 2x4 = 4
                                                  0=0
     What does the equation 0 = 0 mean? We can choose any values for x1 , x2 ,
x3 , x4 and this equation will be true, so we only need to consider further the first
two equations, since the third is true no matter what. We can analyze the second
equation without consideration of the variable x1 . It would appear that there is
considerable latitude in how we can choose x2 , x3 , x4 and make this equation true.
Let us choose x3 and x4 to be anything we please, say x3 = a and x4 = b.
     Now we can take these arbitrary values for x3 and x4 , substitute them in equation
1, to obtain
                            x1 + 2a − 3b = −1
                                       x1 = −1 − 2a + 3b
14                              Ro b e rt B e e z e r                                 §S S L E

Similarly, equation 2 becomes
                                x2 − a + 2b = 4
                                          x2 = 4 + a − 2b
   So our arbitrary choices of values for x3 and x4 (a and b) translate into specific
values of x1 and x2 . The lone solution given in Example NSE was obtained by
choosing a = 2 and b = 1. Now we can easily and quickly find many more (infinitely
more). Suppose we choose a = 5 and b = −2, then we compute
                             x1 = −1 − 2(5) + 3(−2) = −17
                             x2 = 4 + 5 − 2(−2) = 13
and you can verify that (x1 , x2 , x3 , x4 ) = (−17, 13, 5, −2) makes all three equations
true. The entire solution set is written as
                 S = { (−1 − 2a + 3b, 4 + a − 2b, a, b)| a ∈ C, b ∈ C}
   It would be instructive to finish off your study of this example by taking the
general form of the solutions given in this set and substituting them into each of the
three equations and verify that they are true in each case (Exercise SSLE.M40). 4
    In the next section we will describe how to use equation operations to systemati-
cally solve any system of linear equations. But first, read one of our more important
pieces of advice about speaking and writing mathematics. See Proof Technique L.
    Before attacking the exercises in this section, it will be helpful to read some
advice on getting started on the construction of a proof. See Proof Technique GS.

Reading Questions

1. How many solutions does the system of equations 3x + 2y = 4, 6x + 4y = 8 have? Explain
   your answer.
2. How many solutions does the system of equations 3x + 2y = 4, 6x + 4y = −2 have?
   Explain your answer.
3. What do we mean when we say mathematics is a language?


Exercises
C10 Find a solution to the system in Example IS where x3 = 6 and x4 = 2. Find two
other solutions to the system. Find a solution where x1 = −17 and x2 = 14. How many
possible answers are there to each of these questions?
C20 Each archetype (Archetypes) that is a system of equations begins by listing some
specific solutions. Verify the specific solutions listed in the following archetypes by evaluat-
ing the system of equations with the solutions listed.

Archetype A, Archetype B, Archetype C, Archetype D, Archetype E, Archetype F, Archetype
G, Archetype H, Archetype I, Archetype J
C30†    Find all solutions to the linear system:
                                           x+y =5
                                          2x − y = 3

C31    Find all solutions to the linear system:
                                         3x + 2y = 1
                                           x−y =2
                                         4x + 2y = 2

C32    Find all solutions to the linear system:
                                          x + 2y = 8
                                           x−y =2
                                           x+y =4
§S S L E               A First Course in Linear Algebra                                      15

C33    Find all solutions to the linear system:
                                       x + y − z = −1
                                       x − y − z = −1
                                                z=2

C34    Find all solutions to the linear system:
                                       x + y − z = −5
                                       x − y − z = −3
                                       x+y−z =0

C50† A three-digit number has two properties. The tens-digit and the ones-digit add up
to 5. If the number is written with the digits in the reverse order, and then subtracted
from the original number, the result is 792. Use a system of equations to find all of the
three-digit numbers with these properties.
C51† Find all of the six-digit numbers in which the first digit is one less than the second,
the third digit is half the second, the fourth digit is three times the third and the last two
digits form a number that equals the sum of the fourth and fifth. The sum of all the digits
is 24. (From The MENSA Puzzle Calendar for January 9, 2006.)
C52† Driving along, Terry notices that the last four digits on his car’s odometer are
palindromic. A mile later, the last five digits are palindromic. After driving another mile, the
middle four digits are palindromic. One more mile, and all six are palindromic. What was
the odometer reading when Terry first looked at it? Form a linear system of equations that
expresses the requirements of this puzzle. (Car Talk Puzzler, National Public Radio, Week
of January 21, 2008) (A car odometer displays six digits and a sequence is a palindrome
if it reads the same left-to-right as right-to-left.)
C53† An article in The Economist (“Free Exchange”, December 6, 2014) quotes the
following problem as an illustration that some of the “underlying assumptions of classical
economics” about people’s behavior are incorrect and “the mind plays tricks.” A bat and
ball cost $1.10 between them. How much does each cost? Answer this quickly with no
writing, then construct system of linear equations and solve the problem carefully.
M10† Each sentence below has at least two meanings. Identify the source of the double
meaning, and rewrite the sentence (at least twice) to clearly convey each meaning.

   1. They are baking potatoes.

   2. He bought many ripe pears and apricots.

   3. She likes his sculpture.

   4. I decided on the bus.

M11† Discuss the difference in meaning of each of the following three almost identical
sentences, which all have the same grammatical structure. (These are due to Keith Devlin.)

   1. She saw him in the park with a dog.

   2. She saw him in the park with a fountain.

   3. She saw him in the park with a telescope.

M12† The following sentence, due to Noam Chomsky, has a correct grammatical structure,
but is meaningless. Critique its faults. “Colorless green ideas sleep furiously.” (Chomsky,
Noam. Syntactic Structures, The Hague/Paris: Mouton, 1957. p. 15.)
M13† Read the following sentence and form a mental picture of the situation.
  The baby cried and the mother picked it up.
  What assumptions did you make about the situation?
M14 Discuss the difference in meaning of the following two almost identical sentences,
which have nearly identical grammatical structure. (This antanaclasis is often attributed to
the comedian Groucho Marx, but has earlier roots.)

   1. Time flies like an arrow.

   2. Fruit flies like a banana.
16                              Ro b e rt B e e z e r                               §S S L E

M30† This problem appears in a middle-school mathematics textbook: Together Dan
and Diane have $20. Together Diane and Donna have $15. How much do the three of them
have in total? (Transition Mathematics, Second Edition, Scott Foresman Addison Wesley,
1998. Problem 5–1.19.)
M40    Solutions to the system in Example IS are given as
                    (x1 , x2 , x3 , x4 ) = (−1 − 2a + 3b, 4 + a − 2b, a, b)
Evaluate the three equations of the original system with these expressions in a and b and
verify that each equation is true, no matter what values are chosen for a and b.
M70† We have seen in this section that systems of linear equations have limited possi-
bilities for solution sets, and we will shortly prove Theorem PSSLS that describes these
possibilities exactly. This exercise will show that if we relax the requirement that our equa-
tions be linear, then the possibilities expand greatly. Consider a system of two equations in
the two variables x and y, where the departure from linearity involves simply squaring the
variables.
                                         x2 − y 2 = 1
                                         x2 + y 2 = 4
After solving this system of nonlinear equations, replace the second equation in turn by
x2 + 2x + y 2 = 3, x2 + y 2 = 1, x2 − 4x + y 2 = −3, −x2 + y 2 = 1 and solve each resulting
system of two equations in two variables. (This exercise includes suggestions from Don
Kreher.)
T10† Proof Technique D asks you to formulate a definition of what it means for a whole
number to be odd. What is your definition? (Do not say “the opposite of even.”) Is 6 odd?
Is 11 odd? Justify your answers by using your definition.
T20† Explain why the second equation operation in Definition EO requires that the
scalar be nonzero, while in the third equation operation this restriction on the scalar is not
present.
Section RREF
Reduced Row-Echelon Form
After solving a few systems of equations, you will recognize that it does not matter so
much what we call our variables, as opposed to what numbers act as their coefficients.
A system in the variables x1 , x2 , x3 would behave the same if we changed the names
of the variables to a, b, c and kept all the constants the same and in the same places.
In this section, we will isolate the key bits of information about a system of equations
into something called a matrix, and then use this matrix to systematically solve
the equations. Along the way we will obtain one of our most important and useful
computational tools.


Subsection MVNSE
Matrix and Vector Notation for Systems of Equations
Definition M Matrix
An m × n matrix is a rectangular layout of numbers from C having m rows and
n columns. We will use upper-case Latin letters from the start of the alphabet
(A, B, C, . . . ) to denote matrices and squared-off brackets to delimit the layout.
Many use large parentheses instead of brackets — the distinction is not important.
Rows of a matrix will be referenced starting at the top and working down (i.e. row 1
is at the top) and columns will be referenced starting from the left (i.e. column 1 is
at the left). For a matrix A, the notation [A]ij will refer to the complex number in
row i and column j of A.                                                            
    Be careful with this notation for individual entries, since it is easy to think that
[A]ij refers to the whole matrix. It does not. It is just a number, but is a convenient
way to talk about the individual entries simultaneously. This notation will get a
heavy workout once we get to Chapter M.
Example AM A matrix
                                  "                         #
                                   −1        2 5        3
                                B= 1         0 −6       1
                                   −4        2 2       −2
is a matrix with m = 3 rows and n = 4 columns. We can say that [B]2,3 = −6 while
[B]3,4 = −2.                                                                 4
    When we do equation operations on system of equations, the names of the
variables really are not very important. Use x1 , x2 , x3 , or a, b, c, or x, y, z, it really
does not matter. In this subsection we will describe some notation that will make it
easier to describe linear systems, solve the systems and describe the solution sets.
Here is a list of definitions, laden with notation.
Definition CV Column Vector
A column vector of size m is an ordered list of m numbers, which is written in
order vertically, starting at the top and proceeding to the bottom. At times, we will
refer to a column vector as simply a vector. Column vectors will be written in bold,
usually with lower case Latin letter from the end of the alphabet such as u, v, w,
x, y, z. Some books like to write vectors with arrows, such as ~u. Writing by hand,
some like to put arrows on top of the symbol, or a tilde underneath the symbol, as
in u. To refer to the entry or component of vector v in location i of the list, we
   ∼
write [v]i .                                                                              
    Be careful with this notation. While the symbols [v]i might look somewhat
substantial, as an object this represents just one entry of a vector, which is just a
single complex number.
Definition ZCV Zero Column Vector
The zero vector of size m is the column vector of size m where each entry is the

                                             17
18                         Ro b e rt B e e z e r                      §R R E F

number zero,
                                         
                                         0
                                        0
                                        0
                                      0=
                                        .
                                             
                                         .. 
                                            0
or defined much more compactly, [0]i = 0 for 1 ≤ i ≤ m.                     


Definition CM Coefficient Matrix
For a system of linear equations,
                     a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1
                     a21 x1 + a22 x2 + a23 x3 + · · · + a2n xn = b2
                     a31 x1 + a32 x2 + a33 x3 + · · · + a3n xn = b3
                                                              ..
                                                               .
                  am1 x1 + am2 x2 + am3 x3 + · · · + amn xn = bm
the coefficient matrix is the m × n matrix
                                                            
                             a11 a12 a13           ...   a1n
                            a21 a22 a23           ...   a2n 
                           a     a32 a33          ...   a3n 
                      A=   .
                              31                             
                                                             
                            ..                              
                             am1    am2    am3     ...   amn


                                                                            


Definition VOC Vector of Constants
For a system of linear equations,
                     a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1
                     a21 x1 + a22 x2 + a23 x3 + · · · + a2n xn = b2
                     a31 x1 + a32 x2 + a33 x3 + · · · + a3n xn = b3
                                                              ..
                                                               .
                  am1 x1 + am2 x2 + am3 x3 + · · · + amn xn = bm
the vector of constants is the column vector of size m
                                       
                                        b1
                                       b2 
                                      b 
                                   b=   3
                                       . 
                                       .. 
                                       bm


                                                                            


Definition SOLV Solution Vector
For a system of linear equations,
                     a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1
                     a21 x1 + a22 x2 + a23 x3 + · · · + a2n xn = b2
                     a31 x1 + a32 x2 + a33 x3 + · · · + a3n xn = b3
                                                              ..
                                                               .
                  am1 x1 + am2 x2 + am3 x3 + · · · + amn xn = bm
§R R E F              A First Course in Linear Algebra                               19

the solution vector is the column vector of size n
                                        
                                         x1
                                        x2 
                                       x 
                                  x=      3
                                        . 
                                        .. 
                                               xn

                                                                                      

    The solution vector may do double-duty on occasion. It might refer to a list of
variable quantities at one point, and subsequently refer to values of those variables
that actually form a particular solution to that system.

Definition MRLS Matrix Representation of a Linear System
If A is the coefficient matrix of a system of linear equations and b is the vector of
constants, then we will write LS(A, b) as a shorthand expression for the system of
linear equations, which we will refer to as the matrix representation of the linear
system.                                                                            

Example NSLE Notation for systems of linear equations
The system of linear equations
                           2x1 + 4x2 − 3x3 + 5x4 + x5 = 9
                              3x1 + x2 +            x4 − 3x5 = 0
                         −2x1 + 7x2 − 5x3 + 2x4 + 2x5 = −3
has coefficient matrix
                                  "                          #
                                 2       4 −3         5 1
                             A= 3        1 0          1 −3
                                −2       7 −5         2 2
and vector of constants
                                           "#
                                          9
                                      b= 0
                                         −3
and so will be referenced as LS(A, b).                                               4

Definition AM Augmented Matrix
Suppose we have a system of m equations in n variables, with coefficient matrix A
and vector of constants b. Then the augmented matrix of the system of equations
is the m × (n + 1) matrix whose first n columns are the columns of A and whose last
column (n + 1) is the column vector b. This matrix will be written as [ A | b]. 

    The augmented matrix represents all the important information in the system of
equations, since the names of the variables have been ignored, and the only connection
with the variables is the location of their coefficients in the matrix. It is important
to realize that the augmented matrix is just that, a matrix, and not a system of
equations. In particular, the augmented matrix does not have any “solutions,” though
it will be useful for finding solutions to the system of equations that it is associated
with. (Think about your objects, and review Proof Technique L.) However, notice
that an augmented matrix always belongs to some system of equations, and vice
versa, so it is tempting to try and blur the distinction between the two. Here is a
quick example.

Example AMAA Augmented matrix for Archetype A
Archetype A is the following system of 3 equations in 3 variables.
                                  x1 − x2 + 2x3 = 1
                                  2x1 + x2 + x3 = 8
                                         x1 + x2 = 5
20                            Ro b e rt B e e z e r                            §R R E F

     Here is its augmented matrix.
                                     "               #
                                         1   −1   2 1
                                         2    1   1 8
                                         1    1   0 5
                                                                                      4

Subsection RO
Row Operations
An augmented matrix for a system of equations will save us the tedium of continually
writing down the names of the variables as we solve the system. It will also release
us from any dependence on the actual names of the variables. We have seen how
certain operations we can perform on equations (Definition EO) will preserve their
solutions (Theorem EOPSS). The next two definitions and the following theorem
carry over these ideas to augmented matrices.
Definition RO Row Operations
The following three operations will transform an m × n matrix into a different matrix
of the same size, and each is known as a row operation.

     1. Swap the locations of two rows.
     2. Multiply each entry of a single row by a nonzero quantity.
     3. Multiply each entry of one row by some quantity, and add these values to the
        entries in the same columns of a second row. Leave the first row the same after
        this operation, but replace the second row by the new values.

     We will use a symbolic shorthand to describe these row operations:

     1. Ri ↔ Rj : Swap the location of rows i and j.
     2. αRi : Multiply row i by the nonzero scalar α.
     3. αRi + Rj : Multiply row i by the scalar α and add to row j.

                                                                                      
Definition REM Row-Equivalent Matrices
Two matrices, A and B, are row-equivalent if one can be obtained from the other
by a sequence of row operations.                                             
Example TREM Two row-equivalent matrices
The matrices
              "         #                "                               #
                2 −1 3 4                   1                 1    0   6
           A = 5 2 −2 3              B= 3                    0   −2   −9
                1 1  0 6                   2                −1    3    4
are row-equivalent   as can be seen from
    "                 #          "                  #          "                  #
      2 −1 3         4             1 1   0         6             1    1    0   6
                         R1 ↔R3                       −2R1 +R2
      5 2 −2         3 −−−−−→ 5 2 −2               3 −−−−−−→ 3        0   −2   −9
      1 1     0      6             2 −1 3          4             2   −1    3    4
We can also say that any pair of these three matrices are row-equivalent.             4
    Notice that each of the three row operations is reversible (Exercise RREF.T10),
so we do not have to be careful about the distinction between “A is row-equivalent
to B” and “B is row-equivalent to A.” (Exercise RREF.T11)
    The preceding definitions are designed to make the following theorem possible.
It says that row-equivalent matrices represent systems of linear equations that have
identical solution sets.
Theorem REMES Row-Equivalent Matrices represent Equivalent Systems
Suppose that A and B are row-equivalent augmented matrices. Then the systems of
linear equations that they represent are equivalent systems.
§R R E F              A First Course in Linear Algebra                             21

Proof. If we perform a single row operation on an augmented matrix, it will have the
same effect as if we did the analogous equation operation on the system of equations
the matrix represents. By exactly the same methods as we used in the proof of
Theorem EOPSS we can see that each of these row operations will preserve the set
of solutions for the system of equations the matrix represents.                   
     So at this point, our strategy is to begin with a system of equations, represent
 the system by an augmented matrix, perform row operations (which will preserve
 solutions for the system) to get a “simpler” augmented matrix, convert back to a
“simpler” system of equations and then solve that system, knowing that its solutions
 are those of the original system. Here is a rehash of Example US as an exercise in
 using our new tools.
Example USR Three equations, one solution, reprised
We solve the following system using augmented matrices and row operations. This
is the same system of equations solved in Example US using equation operations.
                                 x1 + 2x2 + 2x3 = 4
                                 x1 + 3x2 + 3x3 = 5
                                2x1 + 6x2 + 5x3 = 6
Form the augmented matrix,
                                   "                    #
                                    1       2   2   4
                                 A= 1       3   3   5
                                    2       6   5   6
and apply row operations,
                         "         #           "           #
                           1 2 2 4               1 2 2 4
                 −1R1 +R2             −2R1 +R3
                −−−−−−→ 0 1 1 1 −−−−−−→ 0 1 1 1
                           2 6 5 6               0 2 1 −2
                         "             #       "         #
                           1 2 2     4           1 2 2 4
                −2R2 +R3                 −1R3
                −−−−−−→ 0 1 1        1 −−−→ 0 1 1 1
                           0 0 −1 −4             0 0 1 4
    So the matrix
                                      "                 #
                                    1       2   2   4
                                 B= 0       1   1   1
                                    0       0   1   4
is row equivalent to A and by Theorem REMES the system of equations below has
the same solution set as the original system of equations.
                                 x1 + 2x2 + 2x3 = 4
                                          x2 + x3 = 1
                                                x3 = 4
   Solving this “simpler” system is straightforward and is identical to the process
in Example US.                                                                   4

Subsection RREF
Reduced Row-Echelon Form
The preceding example amply illustrates the definitions and theorems we have seen
so far. But it still leaves two questions unanswered. Exactly what is this “simpler”
form for a matrix, and just how do we get it? Here is the answer to the first question,
a definition of reduced row-echelon form.
Definition RREF Reduced Row-Echelon Form
A matrix is in reduced row-echelon form if it meets all of the following conditions:

   1. If there is a row where every entry is zero, then this row lies below any other
      row that contains a nonzero entry.
   2. The leftmost nonzero entry of a row is equal to 1.
22                             Ro b e rt B e e z e r                        §R R E F

     3. The leftmost nonzero entry of a row is the only nonzero entry in its column.

     4. Consider any two different leftmost nonzero entries, one located in row i,
        column j and the other located in row s, column t. If s > i, then t > j.

    A row of only zero entries is called a zero row and the leftmost nonzero entry
of a nonzero row is a leading 1. A column containing a leading 1 will be called a
pivot column. The number of nonzero rows will be denoted by r, which is also
equal to the number of leading 1’s and the number of pivot columns.
    The set of column indices for the pivot columns will be denoted by D =
{d1 , d2 , d3 , . . . , dr } where d1 < d2 < d3 < · · · < dr , while the columns that
are not pivot columns will be denoted as F = {f1 , f2 , f3 , . . . , fn−r } where f1 <
f2 < f3 < · · · < fn−r .
                                                                                     
    The principal feature of reduced row-echelon form is the pattern of leading 1’s
guaranteed by conditions (2) and (4), reminiscent of a flight of geese, or steps in a
staircase, or water cascading down a mountain stream.
    There are a number of new terms and notation introduced in this definition,
which should make you suspect that this is an important definition. Given all there
is to digest here, we will mostly save the use of D and F until Section TSS. However,
one important point to make here is that all of these terms and notation apply to a
matrix. Sometimes we will employ these terms and sets for an augmented matrix,
and other times it might be a coefficient matrix. So always give some thought to
exactly which type of matrix you are analyzing.
Example RREF A matrix in reduced row-echelon form
The matrix C is in reduced row-echelon form.
                                                  
                           1 −3 0 6 0 0 −5 9
                          0 0 0 0 1 0 3 −7
                                                  
                     C = 0 0 0 0 0 1 7          3
                          0 0 0 0 0 0 0         0
                           0 0 0 0 0 0 0         0
This matrix has two zero rows and three pivot columns. So r = 3. Columns 1, 5,
and 6 are the three pivot columns, so D = {1, 5, 6} and then F = {2, 3, 4, 7, 8}.4
Example NRREF A matrix not in reduced row-echelon form
The matrix E is not in reduced row-echelon form, as it fails each of the four
requirements once.
                                                      
                        1 0 −3 0 6 0 7 −5 9
                      0 0 0 5 0 1 0 3 −7
                                                      
                      0 0 0 0 0 0 0 0               0
                   E=                                 
                      0 1 0 0 0 0 0 −4 2 
                      0 0 0 0 0 0 1 7               3
                        0 0 0 0 0 0 0 0              0
   Our next theorem has a “constructive” proof. Learn about the meaning of this
term in Proof Technique C.
Theorem REMEF Row-Equivalent Matrix in Echelon Form
Suppose A is a matrix. Then there is a matrix B so that

     1. A and B are row-equivalent.

     2. B is in reduced row-echelon form.

Proof. Suppose that A has m rows and n columns. We will describe a process for
converting A into B via row operations. This procedure is known as Gauss-Jordan
elimination. Tracing through this procedure will be easier if you recognize that i
refers to a row that is being converted, j refers to a column that is being converted,
and r keeps track of the number of nonzero rows. Here we go.

     1. Set j = 0 and r = 0.
§R R E F              A First Course in Linear Algebra                               23

   2. Increase j by 1. If j now equals n + 1, then stop.

   3. Examine the entries of A in column j located in rows r + 1 through m. If all
      of these entries are zero, then go to Step 2.

   4. Choose a row from rows r + 1 through m with a nonzero entry in column j.
      Let i denote the index for this row.

   5. Increase r by 1.

   6. Use the first row operation to swap rows i and r.

   7. Use the second row operation to convert the entry in row r and column j to a
      1.

   8. Use the third row operation with row r to convert every other entry of column
      j to zero.

   9. Go to Step 2.

    The result of this procedure is that the matrix A is converted to a matrix in
reduced row-echelon form, which we will refer to as B. We need to now prove this
claim by showing that the converted matrix has the requisite properties of Definition
RREF. First, the matrix is only converted through row operations (Steps 6, 7, 8), so
A and B are row-equivalent (Definition REM).
    It is a bit more work to be certain that B is in reduced row-echelon form. We
claim that as we begin Step 2, the first j columns of the matrix are in reduced
row-echelon form with r nonzero rows. Certainly this is true at the start when j = 0,
since the matrix has no columns and so vacuously meets the conditions of Definition
RREF with r = 0 nonzero rows.
    In Step 2 we increase j by 1 and begin to work with the next column. There
are two possible outcomes for Step 3. Suppose that every entry of column j in rows
r + 1 through m is zero. Then with no changes we recognize that the first j columns
of the matrix has its first r rows still in reduced-row echelon form, with the final
m − r rows still all zero.
    Suppose instead that the entry in row i of column j is nonzero. Notice that since
r + 1 ≤ i ≤ m, we know the first j − 1 entries of this row are all zero. Now, in Step
5 we increase r by 1, and then embark on building a new nonzero row. In Step 6 we
swap row r and row i. In the first j columns, the first r − 1 rows remain in reduced
row-echelon form after the swap. In Step 7 we multiply row r by a nonzero scalar,
creating a 1 in the entry in column j of row i, and not changing any other rows.
This new leading 1 is the first nonzero entry in its row, and is located to the right of
all the leading 1’s in the preceding r − 1 rows. With Step 8 we insure that every
entry in the column with this new leading 1 is now zero, as required for reduced
row-echelon form. Also, rows r + 1 through m are now all zeros in the first j columns,
so we now only have one new nonzero row, consistent with our increase of r by one.
Furthermore, since the first j − 1 entries of row r are zero, the employment of the
third row operation does not destroy any of the necessary features of rows 1 through
r − 1 and rows r + 1 through m, in columns 1 through j − 1.
    So at this stage, the first j columns of the matrix are in reduced row-echelon
form. When Step 2 finally increases j to n + 1, then the procedure is completed and
the full n columns of the matrix are in reduced row-echelon form, with the value of
r correctly recording the number of nonzero rows.                                     

    The procedure given in the proof of Theorem REMEF can be more precisely
described using a pseudo-code version of a computer program. Single-letter variables,
like m, n, i, j, r have the same meanings as above. := is assignment of the value
on the right to the variable on the left, A[i,j] is the equivalent of the matrix entry
[A]ij , while == is an equality test and <> is a “not equals” test.
24                           Ro b e rt B e e z e r                           §R R E F

input m, n and A
r := 0
for j := 1 to n
   i := r+1
   while i <= m and A[i,j] == 0
       i := i+1
   if i < m+1
       r := r+1
       swap rows i and r of A (row op 1)
       scale A[r,j] to a leading 1 (row op 2)
       for k := 1 to m, k <> r
            make A[k,j] zero (row op 3, employing row r)
output r and A

   Notice that as a practical matter the “and” used in the conditional statement of
the while statement should be of the “short-circuit” variety so that the array access
that follows is not out-of-bounds.
   So now we can put it all together. Begin with a system of linear equations
(Definition SLE), and represent the system by its augmented matrix (Definition AM).
Use row operations (Definition RO) to convert this matrix into reduced row-echelon
form (Definition RREF), using the procedure outlined in the proof of Theorem
REMEF. Theorem REMEF also tells us we can always accomplish this, and that the
result is row-equivalent (Definition REM) to the original augmented matrix. Since
the matrix in reduced-row echelon form has the same solution set, we can analyze
the row-reduced version instead of the original matrix, viewing it as the augmented
matrix of a different system of equations. The beauty of augmented matrices in
reduced row-echelon form is that the solution sets to the systems they represent can
be easily determined, as we will see in the next few examples and in the next section.
   We will see through the course that almost every interesting property of a matrix
can be discerned by looking at a row-equivalent matrix in reduced row-echelon form.
For this reason it is important to know that the matrix B is guaranteed to exist by
Theorem REMEF is also unique.
   Two proof techniques are applicable to the proof. First, head out and read two
proof techniques: Proof Technique CD and Proof Technique U.
Theorem RREFU Reduced Row-Echelon Form is Unique
Suppose that A is an m × n matrix and that B and C are m × n matrices that are
row-equivalent to A and in reduced row-echelon form. Then B = C.

Proof. We need to begin with no assumptions about any relationships between B
and C, other than they are both in reduced row-echelon form, and they are both
row-equivalent to A.
    If B and C are both row-equivalent to A, then they are row-equivalent to each
other. Repeated row operations on a matrix combine the rows with each other using
operations that are linear, and are identical in each column. A key observation for
this proof is that each individual row of B is linearly related to the rows of C. This
relationship is different for each row of B, but once we fix a row, the relationship is
the same across columns. More precisely, there are scalars δik , 1 ≤ i, k ≤ m such
that for any 1 ≤ i ≤ m, 1 ≤ j ≤ n,
                                          Xm
                                  [B]ij =     δik [C]kj
                                         k=1

    You should read this as saying that an entry of row i of B (in column j) is a
linear function of the entries of all the rows of C that are also in column j, and the
scalars (δik ) depend on which row of B we are considering (the i subscript on δik ),
but are the same for every column (no dependence on j in δik ). This idea may be
complicated now, but will feel more familiar once we discuss “linear combinations”
(Definition LCCV) and moreso when we discuss “row spaces” (Definition RSM).
For now, spend some time carefully working Exercise RREF.M40, which is designed
§R R E F                     A First Course in Linear Algebra                                    25

to illustrate the origins of this expression. This completes our exploitation of the
row-equivalence of B and C.
    We now repeatedly exploit the fact that B and C are in reduced row-echelon
form. Recall that a pivot column is all zeros, except a single one. More carefully, if
R is a matrix in reduced row-echelon form, and d` is the index of a pivot column,
then [R]kd` = 1 precisely when k = ` and is otherwise zero. Notice also that any
entry of R that is both below the entry in row ` and to the left of column d` is also
zero (with below and left understood to include equality). In other words, look at
examples of matrices in reduced row-echelon form and choose a leading 1 (with a
box around it). The rest of the column is also zeros, and the lower left “quadrant”
of the matrix that begins here is totally zeros.
    Assuming no relationship about the form of B and C, let B have r nonzero rows
and denote the pivot columns as D = {d1 , d2 , d3 , . . . , dr }. For C let r0 denote the
number of nonzero rows and denote the pivot columns as
    D0 = { d0 1 , d0 2 , d0 3 , . . . , d0 r0 } (Definition RREF). There are four steps in the
proof, and the first three are about showing that B and C have the same number
of pivot columns, in the same places. In other words, the “primed” symbols are a
necessary fiction.
    First Step. Suppose that d1 < d01 . Then
                    1 = [B]1d1                                      Definition RREF
                             m
                             X
                         =         δ1k [C]kd1
                             k=1
                             Xm
                         =         δ1k (0)                          d1 < d01
                             k=1
                         =0
The entries of C are all zero since they are left and below of the leading 1 in row 1
and column d01 of C. This is a contradiction, so we know that d1 ≥ d01 . By an entirely
similar argument, reversing the roles of B and C, we could conclude that d1 ≤ d01 .
Together this means that d1 = d01 .
      Second Step. Suppose that we have determined that d1 = d01 , d2 = d02 , d3 = d03 ,
. . . dp = d0p . Let us now show that dp+1 = d0p+1 . Working towards a contradiction,
suppose that dp+1 < d0p+1 . For 1 ≤ ` ≤ p,
             0 = [B]p+1,d`                                                Definition RREF
                   m
                   X
              =          δp+1,k [C]kd`
                   k=1
                   Xm
              =          δp+1,k [C]kd0
                                            `
                   k=1
                                        m
                                        X
              = δp+1,` [C]`d0 +                 δp+1,k [C]kd0             Property CACN
                                    `                         `
                                        k=1
                                        k6=`
                                     m
                                     X
              = δp+1,` (1) +                δp+1,k (0)                    Definition RREF
                                     k=1
                                     k6=`

              = δp+1,`
Now,
       1 = [B]p+1,dp+1                                                         Definition RREF
             m
             X
         =         δp+1,k [C]kdp+1
             k=1
             Xp                                  m
                                                 X
         =         δp+1,k [C]kdp+1 +                    δp+1,k [C]kdp+1        Property AACN
             k=1                                k=p+1
26                                        Ro b e rt B e e z e r                              §R R E F

             p
             X                                 m
                                               X
         =       (0) [C]kdp+1 +                       δp+1,k [C]kdp+1
             k=1                              k=p+1
              Xm
         =             δp+1,k [C]kdp+1
             k=p+1
              Xm
         =             δp+1,k (0)                                            dp+1 < d0p+1
             k=p+1

         =0
This contradiction shows that dp+1 ≥ d0p+1 . By an entirely similar argument, we
could conclude that dp+1 ≤ d0p+1 , and therefore dp+1 = d0p+1 .
   Third Step. Now we establish that r = r0 . Suppose that r0 < r. By the arguments
above, we know that d1 = d01 , d2 = d02 , d3 = d03 , . . . , dr0 = d0r0 . For 1 ≤ ` ≤ r0 < r,
           0 = [B]rd`                                                      Definition RREF
                 m
                 X
             =         δrk [C]kd`
                 k=1
                   0
                 r
                 X                              m
                                                X
             =         δrk [C]kd` +                       δrk [C]kd`       Property AACN
                 k=1                           k=r 0 +1
                   0
                 r
                 X                              m
                                                X
             =         δrk [C]kd` +                       δrk (0)          Property AACN
                 k=1                           k=r 0 +1
                   0
                 r
                 X
             =         δrk [C]kd`
                 k=1
                   0
                 r
                 X
             =         δrk [C]kd0
                                      `
                 k=1
                                          0
                                      r
                                      X
             = δr` [C]`d0 +                   δrk [C]kd0                   Property CACN
                              `                            `
                                      k=1
                                      k6=`
                                  0
                               r
                               X
             = δr` (1) +              δrk (0)                              Definition RREF
                               k=1
                               k6=`

             = δr`
Now examine the entries of row r of B,
               Xm
       [B]rj =     δrk [C]kj
                       k=1
                         0
                       r
                       X                           m
                                                   X
                 =           δrk [C]kj +                       δrk [C]kj   Property CACN
                       k=1                       k=r 0 +1
                         0
                       r
                       X                           m
                                                   X
                 =           δrk [C]kj +                       δrk (0)     Definition RREF
                       k=1                       k=r 0 +1
                         0
                       r
                       X
                 =           δrk [C]kj
                       k=1
                         0
                       r
                       X
                 =           (0) [C]kj
                       k=1
                 =0
So row r is a totally zero row, contradicting that this should be the bottommost
§R R E F                   A First Course in Linear Algebra                      27

nonzero row of B. So r0 ≥ r. By an entirely similar argument, reversing the roles of
B and C, we would conclude that r0 ≤ r and therefore r = r0 . Thus, combining the
first three steps we can say that D = D0 . In other words, B and C have the same
pivot columns, in the same locations.
    Fourth Step. In this final step, we will not argue by contradiction. Our intent
is to determine the values of the δij . Notice that we can use the values of the di
interchangeably for B and C. Here we go,
             1 = [B]idi                                     Definition RREF
                     m
                     X
                 =         δik [C]kdi
                     k=1
                                   m
                                   X
                 = δii [C]idi +           δik [C]kdi        Property CACN
                                   k=1
                                   k6=i
                                m
                                X
                 = δii (1) +           δik (0)              Definition RREF
                                k=1
                                k6=i

                 = δii
and for ` 6= i
             0 = [B]id`                                     Definition RREF
                     m
                     X
                 =         δik [C]kd`
                     k=1
                                   m
                                   X
                 = δi` [C]`d` +           δik [C]kd`        Property CACN
                                   k=1
                                   k6=`
                                m
                                X
                 = δi` (1) +           δik (0)              Definition RREF
                                k=1
                                k6=`

                 = δi`
Finally, having determined the values of the δij , we can show that B = C. For
1 ≤ i ≤ m, 1 ≤ j ≤ n,
                    m
                    X
            [B]ij =   δik [C]kj
                         k=1
                                        m
                                        X
                     = δii [C]ij +           δik [C]kj       Property CACN
                                        k=1
                                        k6=i
                                         Xm
                     = (1) [C]ij +             (0) [C]kj
                                         k=1
                                         k6=i

                     = [C]ij
So B and C have equal values in every entry, and so are the same matrix.          

    We will now run through some examples of using these definitions and theorems
to solve some systems of equations. From now on, when we have a matrix in reduced
row-echelon form, we will mark the leading 1’s with a small box. This will help you
count, and identify, the pivot columns. In your work, you can box ’em, circle ’em or
write ’em in a different color — just identify ’em somehow. This device will prove
very useful later and is a very good habit to start developing right now.
Example SAB Solutions for Archetype B
Let us find the solutions to the following system of equations,
                                       −7x1 − 6x2 − 12x3 = −33
28                              Ro b e rt B e e z e r                                       §R R E F

                                     5x1 + 5x2 + 7x3 = 24
                                              x1 + 4x3 = 5

     First, form the augmented matrix,
                              "                                  #
                                −7 −6             −12 −33
                                 5   5             7  24
                                 1   0             4   5
and work to reduced row-echelon form, first with j = 1,
                "                      #           "                                        #
                  1    0     4      5                 1                   0        4   5
         R1 ↔R3                          −5R1 +R2
        −−−−−→ 5       5     7     24 −−−−−−→ 0                           5       −13 −1
                 −7 −6 −12 −33                       −7                  −6       −12 −33
                                     
        7R1 +R3
                  1    0     4     5
       −−−  −−→  0    5 −13 −1
                  0 −6 16          2
Now, with j = 2,
                                                                                     
             1     1        0     4       5            1             0        4     5
             5 R2                             6R2 +R3
            −−−→  0        1    −13
                                  5
                                         −1  −
                                          5
                                               −−− −→ 0             1    −13
                                                                           5
                                                                                    −1 
                                                                                    5
                   0       −6     16     2                                 2        4
                                                       0             0     5        5

And finally, with j = 3,
                                                                               
              5      1     0      4       5       13    1            0        4 5
              2 R3                                5 R3 +R2
             −− −→  0     1     −13     −1  −−−−−−→ 
                                                        0            1        0 5
                                  5        5
                     0     0      1       2             0            0        1 2
                                           
                     1     0     0      −3
       −4R3 +R1
       −−−−  −−→  0       1     0       5
                     0     0     1       2

    This is now the augmented matrix of a very simple system of equations, namely
x1 = −3, x2 = 5, x3 = 2, which has an obvious solution. Furthermore, we can
see that this is the only solution to this system, so we have determined the entire
solution set,
                                         (" #)
                                            −3
                                    S=       5
                                             2

     You might compare this example with the procedure we used in Example US.4

    Archetypes A and B are meant to contrast each other in many respects. So let
us solve Archetype A now.

Example SAA Solutions for Archetype A
Let us find the solutions to the following system of equations,
                                       x1 − x2 + 2x3 = 1
                                       2x1 + x2 + x3 = 8
                                              x1 + x2 = 5

     First, form the augmented matrix,
                                 "                     #
                                   1 −1             2 1
                                   2 1              1 8
                                   1 1              0 5
and work to reduced row-echelon form, first with j = 1,
                     "              #                                                  
                      1 −1 2 1                      1 −1                       2      1
            −2R1 +R2                    −1R1 +R3
           −−−−−−→ 0 3 −3 6 −−−−−−→  0                 3                     −3      6
                      1 1      0 5                  0   2                     −2      4
§R R E F                  A First Course in Linear Algebra                          29

Now, with j = 2,
                                                                    
                   1   1       −1      2   1 1R +R     1      0 1     3
                   3 R2
                −−−→  0        1     −1   2 −−−−−→  0
                                                2  1
                                                              1 −1    2
                       0        2     −2   4           0      2 −2    4
                                           
                       1        0      1   3
            −2R2 +R3
            −−−−−−→  0         1     −1   2
                       0        0      0   0

   The system of equations represented by this augmented matrix needs to be
considered a bit differently than that for Archetype B. First, the last row of the
matrix is the equation 0 = 0, which is always true, so it imposes no restrictions on
our possible solutions and therefore we can safely ignore it as we analyze the other
two equations. These equations are,
                                       x1 + x3 = 3
                                       x2 − x3 = 2.

    While this system is fairly easy to solve, it also appears to have a multitude of
solutions. For example, choose x3 = 1 and see that then x1 = 2 and x2 = 3 will
together form a solution. Or choose x3 = 0, and then discover that x1 = 3 and
x2 = 2 lead to a solution. Try it yourself: pick any value of x3 you please, and figure
out what x1 and x2 should be to make the first and second equations (respectively)
true. We’ll wait while you do that. Because of this behavior, we say that x3 is a “free”
or “independent” variable. But why do we vary x3 and not some other variable? For
now, notice that the third column of the augmented matrix is not a pivot column.
With this idea, we can rearrange the two equations, solving each for the variable
whose index is the same as the column index of a pivot column.
                                        x1 = 3 − x3
                                        x2 = 2 + x3

   To write the set of solution vectors in set notation, we have
                                  ("        #         )
                                     3 − x3
                             S=      2 + x3 x3 ∈ C
                                       x3

    We will learn more in the next section about systems with infinitely many
solutions and how to express their solution sets. Right now, you might look back at
Example IS.                                                                      4

Example SAE Solutions for Archetype E
Let us find the solutions to the following system of equations,
                                2x1 + x2 + 7x3 − 7x4 = 2
                              −3x1 + 4x2 − 5x3 − 6x4 = 3
                                    x1 + x2 + 4x3 − 5x4 = 2

   First, form the augmented matrix,
                            "                          #
                              2 1 7              −7   2
                             −3 4 −5             −6   3
                              1 1 4              −5   2
and work to reduced row-echelon form, first with j = 1,
                "                    #           "                         #
                  1 1 4 −5 2                       1 1 4        −5     2
         R1 ↔R3                         3R1 +R2
        −−−−−→ −3 4 −5 −6 3 −−−−−→ 0 7 7                        −21    9
                  2 1 7 −7 2                       2 1 7        −7     2
                                          
       −2R1 +R3
                  1    1    4    −5     2
      −−−−  −−→  0    7    7 −21 9 
                  0 −1 −1         3    −2
30                             Ro b e rt B e e z e r                              §R R E F

Now, with j = 2,
                                                                         
           R ↔R3
                    1      1     4   −5    2    −1R2
                                                        1 1          4 −5 2
          −−2−−−→ 0      −1    −1    3   −2 −−−→  0 1             1 −3 2
                    0      7     7   −21 9              0 7          7 −21 9
                                                                          
                    1     0 3     −2 0 −7R +R         1   0          3 −2 0
        −1R2 +R1
        −−−−  −−→  0     1 1     −3 2 −−−−−−→  0
                                              2   3
                                                          1          1 −3 2 
                    0     7 7     −21 9               0   0          0 0 −5
And finally, with j = 4,
                                                                               
                      1  0      3 −2 0             1             0    3   −2    0
            − 15 R3                     −2R3 +R2
           −−−−→  0     1      1 −3 2 −−−−−−→  0              1    1   −3    0
                      0  0      0 0 1              0             0    0    0    1
    Let us analyze the equations in the system represented by this augmented matrix.
The third equation will read 0 = 1. This is patently false, all the time. No choice
of values for our variables will ever make it true. We are done. Since we cannot
even make the last equation true, we have no hope of making all of the equations
simultaneously true. So this system has no solutions, and its solution set is the empty
set, ∅ = { } (Definition ES).
    Notice that we could have reached this conclusion sooner. After performing the
row operation −7R2 + R3 , we can see that the third equation reads 0 = −5, a false
statement. Since the system represented by this matrix has no solutions, none of the
systems represented has any solutions. However, for this example, we have chosen to
bring the matrix all the way to reduced row-echelon form as practice.                4
    These three examples (Example SAB, Example SAA, Example SAE) illustrate
the full range of possibilities for a system of linear equations — no solutions, one
solution, or infinitely many solutions. In the next section we will examine these three
scenarios more closely.
    We (and everybody else) will often speak of “row-reducing” a matrix. This is an
informal way of saying we begin with a matrix A and then analyze the matrix B that
is row-equivalent to A and in reduced row-echelon form. So the term row-reduce is
used as a verb, but describes something a bit more complicated, since we do not really
change A. Theorem REMEF tells us that this process will always be successful and
Theorem RREFU tells us that B will be unambiguous. Typically, an investigation
of A will proceed by analyzing B and applying theorems whose hypotheses include
the row-equivalence of A and B, and usually the hypothesis that B is in reduced
row-echelon form.


Reading Questions

1. Is the matrix below in reduced row-echelon form? Why or why not?
                                                   
                                      1 5 0 6 8
                                    0 0 1 2 0 
                                      0 0 0 0 1

2. Use row operations to convert the matrix   below to reduced row-echelon form and report
   the final matrix.
                                                     
                                        2      1    8
                                      −1      1   −1
                                       −2      5    4

3. Find all the solutions to the system below by using an augmented matrix and row
   operations. Report your final matrix in reduced row-echelon form and the set of solutions.
                                     2x1 + 3x2 − x3 = 0
                                       x1 + 2x2 + x3 = 3
                                     x1 + 3x2 + 3x3 = 7
§R R E F               A First Course in Linear Algebra                                   31

Exercises
C05 Each archetype below is a system of equations. Form the augmented matrix of the
system of equations, convert the matrix to reduced row-echelon form by using equation
operations and then describe the solution set of the original system of equations.

Archetype A, Archetype B, Archetype C, Archetype D, Archetype E, Archetype F, Archetype
G, Archetype H, Archetype I, Archetype J


For problems C10–C19, find all solutions to the system of linear equations. Use your favorite
computing device to row-reduce the augmented matrices for the systems, and write the
solutions as a set, using correct set notation.
   C10†
                                   2x1 − 3x2 + x3 + 7x4 = 14
                                 2x1 + 8x2 − 4x3 + 5x4 = −1
                                         x1 + 3x2 − 3x3 = 4
                                −5x1 + 2x2 + 3x3 + 4x4 = −19

   C11†
                                   3x1 + 4x2 − x3 + 2x4 = 6
                                    x1 − 2x2 + 3x3 + x4 = 2
                                       10x2 − 10x3 − x4 = 1

   C12†
                                 2x1 + 4x2 + 5x3 + 7x4 = −26
                                     x1 + 2x2 + x3 − x4 = −4
                                −2x1 − 4x2 + x3 + 11x4 = −10

   C13†
                                  x1 + 2x2 + 8x3 − 7x4 = −2
                                3x1 + 2x2 + 12x3 − 5x4 = 6
                                   −x1 + x2 + x3 − 5x4 = −10

   C14†
                                  2x1 + x2 + 7x3 − 2x4 = 4
                                      3x1 − 2x2 + 11x4 = 13
                                   x1 + x2 + 5x3 − 3x4 = 1

   C15†
                                  2x1 + 3x2 − x3 − 9x4 = −16
                                          x1 + 2x2 + x3 = 0
                                −x1 + 2x2 + 3x3 + 4x4 = 8

   C16†
                                 2x1 + 3x2 + 19x3 − 4x4 = 2
                                   x1 + 2x2 + 12x3 − 3x4 = 1
                                  −x1 + 2x2 + 8x3 − 5x4 = 1

   C17†
                                              −x1 + 5x2 = −8
                                −2x1 + 5x2 + 5x3 + 2x4 = 9
                                   −3x1 − x2 + 3x3 + x4 = 3
                                   7x1 + 6x2 + 5x3 + x4 = 30

   C18†
                                   x1 + 2x2 − 4x3 − x4 = 32
32                               Ro b e rt B e e z e r                              §R R E F

                                     x1 + 3x2 − 7x3 − x5 = 33
                                    x1 + 2x3 − 2x4 + 3x5 = 22


     C19†
                                              2x1 + x2 = 6
                                          −x1 − x2 = −2
                                          3x1 + 4x2 = 4
                                          3x1 + 5x2 = 2


For problems C30–C33, row-reduce the matrix without the aid of a calculator, indicating
the row operations you are using at each step using the notation of Definition RO.
   C30†
                                                      
                                      2   1    5    10
                                    1 −3 −1 −2
                                      4 −2     6    12

     C31†
                                                         
                                            1     2    −4
                                          −3     −1   −3
                                           −2      1   −7

     C32†
                                                         
                                            1     1     1
                                          −4     −3   −2
                                            3     2     1

     C33†
                                                               
                                        1       2    −1      −1
                                      2        4    −1       4
                                       −1       −2    3      5


M40†      Consider the two 3 × 4 matrices below
                                                                             
                     1    3   −2    2                         1     2     1   2
             B = −1 −2 −1 −1                            C= 1     1     4   0
                    −1 −5      8   −3                        −1     −1   −4   1




     1. Row-reduce each matrix and determine that the reduced row-echelon forms of B and
        C are identical. From this argue that B and C are row-equivalent.

     2. In the proof of Theorem RREFU, we begin by arguing that entries of row-equivalent
        matrices are related by way of certain scalars and sums. In this example, we would
        write that entries of B from row i that are in column j are linearly related to the
        entries of C in column j from all three rows
                    [B]ij = δi1 [C]1j + δi2 [C]2j + δi3 [C]3j            1≤j≤4
        For each 1 ≤ i ≤ 3 find the corresponding three scalars in this relationship. So your
        answer will be nine scalars, determined three at a time.

M45† You keep a number of lizards, mice and peacocks as pets. There are a total of 108
legs and 30 tails in your menagerie. You have twice as many mice as lizards. How many of
each creature do you have?
M50† A parking lot has 66 vehicles (cars, trucks, motorcycles and bicycles) in it. There
are four times as many cars as trucks. The total number of tires (4 per car or truck, 2 per
motorcycle or bicycle) is 252. How many cars are there? How many bicycles?
T10† Prove that each of the three row operations (Definition RO) is reversible. More
precisely, if the matrix B is obtained from A by application of a single row operation, show
that there is a single row operation that will transform B back into A.
§R R E F               A First Course in Linear Algebra                                   33

T11 Suppose that A, B and C are m × n matrices. Use the definition of row-equivalence
(Definition REM) to prove the following three facts.

   1. A is row-equivalent to A.

   2. If A is row-equivalent to B, then B is row-equivalent to A.

   3. If A is row-equivalent to B, and B is row-equivalent to C, then A is row-equivalent
      to C.

    A relationship that satisfies these three properties is known as an equivalence relation,
an important idea in the study of various algebras. This is a formal way of saying that
a relationship behaves like equality, without requiring the relationship to be as strict as
equality itself. We will see it again in Theorem SER.
T12 Suppose that B is an m × n matrix in reduced row-echelon form. Build a new, likely
smaller, k × ` matrix C as follows. Keep any collection of k adjacent rows, k ≤ m. From
these rows, keep columns 1 through `, ` ≤ n. Prove that C is in reduced row-echelon form.
T13 Generalize Exercise RREF.T12 by just keeping any k rows, and not requiring the
rows to be adjacent. Prove that any such matrix C is in reduced row-echelon form.
34   Ro b e rt B e e z e r   §R R E F
Section TSS
Types of Solution Sets
We will now be more careful about analyzing the reduced row-echelon form derived
from the augmented matrix of a system of linear equations. In particular, we will see
how to systematically handle the situation when we have infinitely many solutions
to a system, and we will prove that every system of linear equations has either zero,
one or infinitely many solutions. With these tools, we will be able to routinely solve
any linear system.


Subsection CS
Consistent Systems
The computer scientist Donald Knuth said, “Science is what we understand well
enough to explain to a computer. Art is everything else.” In this section we will
remove solving systems of equations from the realm of art, and into the realm of
science. We begin with a definition.
Definition CS Consistent System
A system of linear equations is consistent if it has at least one solution. Otherwise,
the system is called inconsistent.                                                 
    We will want to first recognize when a system is inconsistent or consistent, and in
the case of consistent systems we will be able to further refine the types of solutions
possible. We will do this by analyzing the reduced row-echelon form of a matrix,
using the value of r, and the sets of column indices, D and F , first defined back in
Definition RREF.
    Use of the notation for the elements of D and F can be a bit confusing, since
we have subscripted variables that are in turn equal to integers used to index the
matrix. However, many questions about matrices and systems of equations can be
answered once we know r, D and F . The choice of the letters D and F refer to our
upcoming definition of dependent and free variables (Definition IDV). An example
will help us begin to get comfortable with this aspect of reduced row-echelon form.
Example RREFN Reduced row-echelon                 form notation
For the 5 × 9 matrix
                                                                   
                       1 5 0  0                   2   8   0     5 −1
                     0 0 1   0                   4   7   0     2 0
                                                                   
                   B=
                     0 0 0   1                   3   9   0     3 −6
                                                                    
                                                                   
                       0 0 0  0                   0   0   1     4 2
                       0 0 0  0                   0   0   0     0 0
in reduced row-echelon form we have
          r=4
        d1 = 1            d2 = 3            d3 = 4            d4 = 7
         f1 = 2           f2 = 5            f3 = 6            f4 = 8         f5 = 9
   Notice that the sets
  D = {d1 , d2 , d3 , d4 } = {1, 3, 4, 7}    F = {f1 , f2 , f3 , f4 , f5 } = {2, 5, 6, 8, 9}
have nothing in common and together account for all of the columns of B (we say it
is a partition of the set of column indices).                                  4
    The number r is the single most important piece of information we can get from
the reduced row-echelon form of a matrix. It is defined as the number of nonzero
rows, but since each nonzero row has a leading 1, it is also the number of leading
1’s present. For each leading 1, we have a pivot column, so r is also the number of
pivot columns. Repeating ourselves, r is the number of nonzero rows, the number
of leading 1’s and the number of pivot columns. Across different situations, each
of these interpretations of the meaning of r will be useful, though it may be most

                                             35
36                               Ro b e rt B e e z e r                                 §T S S

helpful to think in terms of pivot columns.

    Before proving some theorems about the possibilities for solution sets to systems
of equations, let us analyze one particular system with an infinite solution set very
carefully as an example. We will use this technique frequently, and shortly we will
refine it slightly.

   Archetypes I and J are both fairly large for doing computations by hand (though
not impossibly large). Their properties are very similar, so we will frequently analyze
the situation in Archetype I, and leave you the joy of analyzing Archetype J yourself.
So work through Archetype I with the text, by hand and/or with a computer, and
then tackle Archetype J yourself (and check your results with those listed). Notice
too that the archetypes describing systems of equations each lists the values of r, D
and F . Here we go. . .

Example ISSI Describing infinite solution sets, Archetype I
Archetype I is the system of m = 4 equations in n = 7 variables.
                                        x1 + 4x2 − x4 + 7x6 − 9x7 = 3
                         2x1 + 8x2 − x3 + 3x4 + 9x5 − 13x6 + 7x7 = 9
                                   2x3 − 3x4 − 4x5 + 12x6 − 8x7 = 1
                   −x1 − 4x2 + 2x3 + 4x4 + 8x5 − 31x6 + 37x7 = 4

    This system has a 4 × 8 augmented matrix that is row-equivalent to the following
matrix (check this!), and which is in reduced row-echelon form (the existence of
this matrix is guaranteed by Theorem REMEF and its uniqueness is guaranteed by
Theorem RREFU),
                                                         
                          1 4 0       0 2 1 −3 4
                        0 0 1        0 1 −3 5 2
                                                         
                        0 0 0        1 2 −6 6 1
                          0 0 0       0 0 0          0 0

     So we find that r = 3 and
      D = {d1 , d2 , d3 } = {1, 3, 4}     F = {f1 , f2 , f3 , f4 , f5 } = {2, 5, 6, 7, 8}
Let i denote any one of the r = 3 nonzero rows. Then the index di is a pivot
column. It will be easy in this case to use the equation represented by row i to
write an expression for the variable xdi . It will be a linear function of the variables
xf1 , xf2 , xf3 , xf4 (notice that f5 = 8 does not reference a variable, but does tell us
the final column is not a pivot column). We will now construct these three expressions.
Notice that using subscripts upon subscripts takes some getting used to.
               (i = 1)               xd1 = x1 = 4 − 4x2 − 2x5 − x6 + 3x7
               (i = 2)               xd2 = x3 = 2 − x5 + 3x6 − 5x7
               (i = 3)               xd3 = x4 = 1 − 2x5 + 6x6 − 6x7

    Each element of the set F = {f1 , f2 , f3 , f4 , f5 } = {2, 5, 6, 7, 8} is the index
of a variable, except for f5 = 8. We refer to xf1 = x2 , xf2 = x5 , xf3 = x6 and
xf4 = x7 as “free” (or “independent”) variables since they are allowed to assume
any possible combination of values that we can imagine and we can continue on to
build a solution to the system by solving individual equations for the values of the
other (“dependent”) variables.

    Each element of the set D = {d1 , d2 , d3 } = {1, 3, 4} is the index of a variable.
We refer to the variables xd1 = x1 , xd2 = x3 and xd3 = x4 as “dependent” variables
since they depend on the independent variables. More precisely, for each possible
choice of values for the independent variables we get exactly one set of values for
the dependent variables that combine to form a solution of the system.
§T S S              A First Course in Linear Algebra                                37

   To express the solutions as a set, we write
                                                            
               
                  4 − 4x2 − 2x5 − x6 + 3x7                    
                                                               
               
                            x2                              
                                                               
               
                                                            
                                                               
               
                 2 − x5 + 3x6 − 5x7                         
                                                               
                                            
                  1 − 2x5 + 6x6 − 6x7  x2 , x5 , x6 , x7 ∈ C
               
                            x5                              
                                                               
               
                                                            
                                                               
               
                            x                               
                                                               
               
                               6                              
                                                               
                              x7
     The condition that x2 , x5 , x6 , x7 ∈ C is how we specify that the variables
x2 , x5 , x6 , x7 are “free” to assume any possible values.
     This systematic approach to solving a system of equations will allow us to create
a precise description of the solution set for any consistent system once we have found
the reduced row-echelon form of the augmented matrix. It will work just as well
when the set of free variables is empty and we get just a single solution. And we
could program a computer to do it! Now have a whack at Archetype J (Exercise
TSS.C10), mimicking the discussion in this example. We’ll still be here when you
get back.                                                                           4
    Using the reduced row-echelon form of the augmented matrix of a system of
equations to determine the nature of the solution set of the system is a very key
idea. So let us look at one more example like the last one. But first a definition, and
then the example. We mix our metaphors a bit when we call variables free versus
dependent. Maybe we should call dependent variables “enslaved”?
Definition IDV Independent and Dependent Variables
Suppose A is the augmented matrix of a consistent system of linear equations and
B is a row-equivalent matrix in reduced row-echelon form. Suppose j is the index
of a pivot column of B. Then the variable xj is dependent. A variable that is not
dependent is called independent or free.                                       
   If you studied this definition carefully, you might wonder what to do if the system
has n variables and column n + 1 is a pivot column? We will see shortly, by Theorem
RCLS, that this never happens for a consistent system.
Example FDV Free and dependent variables
Consider the system of five equations in five variables,
                            x1 − x2 − 2x3 + x4 + 11x5 = 13
                                 x1 − x2 + x3 + x4 + 5x5 = 16
                                  2x1 − 2x2 + x4 + 10x5 = 21
                         2x1 − 2x2 − x3 + 3x4 + 20x5 = 38
                            2x1 − 2x2 + x3 + x4 + 8x5 = 22
whose augmented matrix row-reduces       to
                                                         
                          1 −1            0    0     3   6
                        0    0           1    0    −2   1
                                                         
                                                         
                        0    0           0    1     4   9
                        0    0           0    0     0   0
                          0   0           0    0     0   0
    Columns 1, 3 and 4 are pivot columns, so D = {1, 3, 4}. From this we know
that the variables x1 , x3 and x4 will be dependent variables, and each of the r = 3
nonzero rows of the row-reduced matrix will yield an expression for one of these
three variables. The set F is all the remaining column indices, F = {2, 5, 6}. The
column index 6 in F means that the final column is not a pivot column, and thus
the system is consistent (Theorem RCLS). The remaining indices in F indicate free
variables, so x2 and x5 (the remaining variables) are our free variables. The resulting
three equations that describe our solution set are then,
                   (xd1 = x1 )                     x1 = 6 + x2 − 3x5
                   (xd2 = x3 )                     x3 = 1 + 2x5
38                               Ro b e rt B e e z e r                          §T S S

                   (xd3 = x4 )                     x4 = 9 − 4x5
    Make sure you understand where these three equations came from, and notice
how the location of the pivot columns determined the variables on the left-hand side
of each equation. We can compactly describe the solution set as,
                                                      
                            
                                6 + x2 − 3x5            
                                                         
                            
                                    x2                
                                                         
                                             
                        S =  1 + 2x5  x2 , x5 ∈ C
                            
                               9 − 4x                 
                                                         
                            
                                         5              
                                                         
                                      x5
     Notice how we express the freedom for x2 and x5 : x2 , x5 ∈ C.                 4
   Sets are an important part of algebra, and we have seen a few already. Being
comfortable with sets is important for understanding and writing proofs. If you have
not already, pay a visit now to Section SET.
   We can now use the values of m, n, r, and the independent and dependent
variables to categorize the solution sets for linear systems through a sequence of
theorems.
   Through the following sequence of proofs, you will want to consult three proof
techniques. See Proof Technique E, Proof Technique N, Proof Technique CP.
   First we have an important theorem that explores the distinction between
consistent and inconsistent linear systems.
Theorem RCLS Recognizing Consistency of a Linear System
Suppose A is the augmented matrix of a system of linear equations with n variables.
Suppose also that B is a row-equivalent matrix in reduced row-echelon form with r
nonzero rows. Then the system of equations is inconsistent if and only if column
n + 1 of B is a pivot column.

Proof. (⇐) The first half of the proof begins with the assumption that column n + 1
of B is a pivot column. Then the leading 1 of row r is located in column n + 1 of
B and so row r of B begins with n consecutive zeros, finishing with the leading 1.
This is a representation of the equation 0 = 1, which is false. Since this equation
is false for any collection of values we might choose for the variables, there are no
solutions for the system of equations, and the system is inconsistent.
    (⇒) For the second half of the proof, we wish to show that if we assume the system
is inconsistent, then column n + 1 of B is a pivot column. But instead of proving this
directly, we will form the logically equivalent statement that is the contrapositive,
and prove that instead (see Proof Technique CP). Turning the implication around,
and negating each portion, we arrive at the logically equivalent statement: if column
n + 1 of B is not a pivot column, then the system of equations is consistent.
    If column n + 1 of B is not a pivot column, the leading 1 for row r is located
somewhere in columns 1 through n. Then every preceding row’s leading 1 is also
located in columns 1 through n. In other words, since the last leading 1 is not in
the last column, no leading 1 for any row is in the last column, due to the echelon
layout of the leading 1’s (Definition RREF). We will now construct a solution to the
system by setting each dependent variable to the entry of the final column in the
row with the corresponding leading 1, and setting each free variable to zero. That
sentence is pretty vague, so let us be more precise. Using our notation for the sets
D and F from the reduced row-echelon form (Definition RREF):
            xdi = [B]i,n+1 ,   1≤i≤r                xfi = 0,   1≤i≤n−r
    These values for the variables make the equations represented by the first r rows
of B all true (convince yourself of this). Rows numbered greater than r (if any) are
all zero rows, hence represent the equation 0 = 0 and are also all true. We have now
identified one solution to the system represented by B, and hence a solution to the
system represented by A (Theorem REMES). So we can say the system is consistent
(Definition CS).                                                                   

    The beauty of this theorem being an equivalence is that we can unequivocally
test to see if a system is consistent or inconsistent by looking at just a single entry
§T S S              A First Course in Linear Algebra                                39

of the reduced row-echelon form matrix. We could program a computer to do it!
    Notice that for a consistent system the row-reduced augmented matrix has
n + 1 ∈ F , so the largest element of F does not refer to a variable. Also, for an
inconsistent system, n + 1 ∈ D, and it then does not make much sense to discuss
whether or not variables are free or dependent since there is no solution. Take a look
back at Definition IDV and see why we did not need to consider the possibility of
referencing xn+1 as a dependent variable.
    With the characterization of Theorem RCLS, we can explore the relationships
between r and n for a consistent system. We can distinguish between the case of a
unique solution and infinitely many solutions, and furthermore, we recognize that
these are the only two possibilities.
Theorem CSRN Consistent Systems, r and n
Suppose A is the augmented matrix of a consistent system of linear equations with
n variables. Suppose also that B is a row-equivalent matrix in reduced row-echelon
form with r pivot columns. Then r ≤ n. If r = n, then the system has a unique
solution, and if r < n, then the system has infinitely many solutions.

Proof. This theorem contains three implications that we must establish. Notice first
that B has n + 1 columns, so there can be at most n + 1 pivot columns, i.e. r ≤ n + 1.
If r = n + 1, then every column of B is a pivot column, and in particular, the last
column is a pivot column. So Theorem RCLS tells us that the system is inconsistent,
contrary to our hypothesis. We are left with r ≤ n.
    When r = n, we find n − r = 0 free variables (i.e. F = {n + 1}) and the only
solution is given by setting the n variables to the the first n entries of column n + 1
of B.
    When r < n, we have n − r > 0 free variables. Choose one free variable and set
all the other free variables to zero. Now, set the chosen free variable to any fixed
value. It is possible to then determine the values of the dependent variables to create
a solution to the system. By setting the chosen free variable to different values, in
this manner we can create infinitely many solutions.                                 


Subsection FV
Free Variables
The next theorem simply states a conclusion from the final paragraph of the previous
proof, allowing us to state explicitly the number of free variables for a consistent
system.
Theorem FVCS Free Variables for Consistent Systems
Suppose A is the augmented matrix of a consistent system of linear equations with
n variables. Suppose also that B is a row-equivalent matrix in reduced row-echelon
form with r rows that are not completely zeros. Then the solution set can be described
with n − r free variables.

Proof. See the proof of Theorem CSRN.                                               

Example CFV Counting free variables
For each archetype that is a system of equations, the values of n and r are listed.
Many also contain a few sample solutions. We can use this information profitably,
as illustrated by four examples.

   1. Archetype A has n = 3 and r = 2. It can be seen to be consistent by the
      sample solutions given. Its solution set then has n − r = 1 free variables, and
      therefore will be infinite.

   2. Archetype B has n = 3 and r = 3. It can be seen to be consistent by the single
      sample solution given. Its solution set can then be described with n − r = 0
      free variables, and therefore will have just the single solution.
40                               Ro b e rt B e e z e r                             §T S S

     3. Archetype H has n = 2 and r = 3. In this case, column 3 must be a pivot
        column, so by Theorem RCLS, the system is inconsistent. We should not try
        to apply Theorem FVCS to count free variables, since the theorem only applies
        to consistent systems. (What would happen if you did try to incorrectly apply
        Theorem FVCS?)
     4. Archetype E has n = 4 and r = 3. However, by looking at the reduced row-
        echelon form of the augmented matrix, we find that column 5 is a pivot column.
        By Theorem RCLS we recognize the system as inconsistent.

                                                                                       4
    We have accomplished a lot so far, but our main goal has been the following
theorem, which is now very simple to prove. The proof is so simple that we ought to
call it a corollary, but the result is important enough that it deserves to be called a
theorem. (See Proof Technique LC.) Notice that this theorem was presaged first by
Example TTS and further foreshadowed by other examples.
Theorem PSSLS Possible Solution Sets for Linear Systems
A system of linear equations has no solutions, a unique solution or infinitely many
solutions.
Proof. By its definition, a system is either inconsistent or consistent (Definition CS).
The first case describes systems with no solutions. For consistent systems, we have
the remaining two possibilities as guaranteed by, and described in, Theorem CSRN.

   Here is a diagram that consolidates several of our theorems from this section, and
which is of practical use when you analyze systems of equations. Note this presumes
we have the reduced row-echelon form of the augmented matrix of the system to
analyze.

                                              Theorem RCLS
                       no pivot column in                        pivot column in
                       column n + 1                              column n + 1

                                Consistent                      Inconsistent




                              Theorem CSRN

                  r<n                           r=n

         Infinite Solutions                   Unique Solution


             Diagram DTSLS: Decision Tree for Solving Linear Systems

    We have one more theorem to round out our set of tools for determining solution
sets to systems of linear equations.
Theorem CMVEI Consistent, More Variables than Equations, Infinite solutions
Suppose a consistent system of linear equations has m equations in n variables. If
n > m, then the system has infinitely many solutions.
Proof. Suppose that the augmented matrix of the system of equations is row-
equivalent to B, a matrix in reduced row-echelon form with r nonzero rows. Because
B has m rows in total, the number of nonzero rows is less than or equal to m. In
other words, r ≤ m. Follow this with the hypothesis that n > m and we find that
the system has a solution set described by at least one free variable because
                                   n − r ≥ n − m > 0.
§T S S                A First Course in Linear Algebra                              41

    A consistent system with free variables will have an infinite number of solutions,
as given by Theorem CSRN.                                                           


   Notice that to use this theorem we need only know that the system is consistent,
together with the values of m and n. We do not necessarily have to compute a
row-equivalent reduced row-echelon form matrix, even though we discussed such a
matrix in the proof. This is the substance of the following example.
Example OSGMD One solution gives many, Archetype D
Archetype D is the system of m = 3 equations in n = 4 variables,
                               2x1 + x2 + 7x3 − 7x4 = 8
                            −3x1 + 4x2 − 5x3 − 6x4 = −12
                                x1 + x2 + 4x3 − 5x4 = 4
   and the solution x1 = 0, x2 = 1, x3 = 2, x4 = 1 can be checked easily by
substitution. Having been handed this solution, we know the system is consistent.
This, together with n > m, allows us to apply Theorem CMVEI and conclude that
the system has infinitely many solutions.                                     4
    These theorems give us the procedures and implications that allow us to com-
pletely solve any system of linear equations. The main computational tool is using
row operations to convert an augmented matrix into reduced row-echelon form. Here
is a broad outline of how we would instruct a computer to solve a system of linear
equations.

  1. Represent a system of linear equations in n variables by an augmented matrix
     (an array is the appropriate data structure in most computer languages).

  2. Convert the matrix to a row-equivalent matrix in reduced row-echelon form
     using the procedure from the proof of Theorem REMEF. Identify the location
     of the pivot columns, and their number r.

  3. If column n + 1 is a pivot column, output the statement that the system is
     inconsistent and halt.

  4. If column n + 1 is not a pivot column, there are two possibilities:

         (a) r = n and the solution is unique. It can be read off directly from the
             entries in rows 1 through n of column n + 1.
         (b) r < n and there are infinitely many solutions. If only a single solution
             is needed, set all the free variables to zero and read off the dependent
             variable values from column n + 1, as in the second half of the proof of
             Theorem RCLS. If the entire solution set is required, figure out some nice
             compact way to describe it, since your finite computer is not big enough
             to hold all the solutions (we will have such a way soon).

    The above makes it all sound a bit simpler than it really is. In practice, row
operations employ division (usually to get a leading entry of a row to convert to
a leading 1) and that will introduce round-off errors. Entries that should be zero
sometimes end up being very, very small nonzero entries, or small entries lead to
overflow errors when used as divisors. A variety of strategies can be employed to
minimize these sorts of errors, and this is one of the main topics in the important
subject known as numerical linear algebra.
    In this section we have gained a foolproof procedure for solving any system of
linear equations, no matter how many equations or variables. We also have a handful
of theorems that allow us to determine partial information about a solution set
without actually constructing the whole set itself. Donald Knuth would be proud.
42                              Ro b e rt B e e z e r                               §T S S

Reading Questions

1. How can we easily recognize when a system of linear equations is inconsistent or not?
2. Suppose we have converted the augmented matrix of a system of equations into reduced
   row-echelon form. How do we then identify the dependent and independent (free)
   variables?
3. What are the possible solution sets for a system of linear equations?


Exercises
C10    In the spirit of Example ISSI, describe the infinite solution set for Archetype J.

For Exercises C21–C28, find the solution set of the system of linear equations. Give the
values of n and r, and interpret your answers in light of the theorems of this section.
   C21†
                                   x1 + 4x2 + 3x3 − x4 = 5
                                     x1 − x2 + x3 + 2x4 = 6
                                  4x1 + x2 + 6x3 + 5x4 = 9

     C22†
                                     x1 − 2x2 + x3 − x4 = 3
                                   2x1 − 4x2 + x3 + x4 = 2
                                  x1 − 2x2 − 2x3 + 3x4 = 1

     C23†
                                   x1 − 2x2 + x3 − x4 = 3
                                     x1 + x2 + x3 − x4 = 1
                                        x1    + x3 − x4 = 2

     C24†
                                   x1 − 2x2 + x3 − x4 = 2
                                     x1 + x2 + x3 − x4 = 2
                                        x1    + x3 − x4 = 2

     C25†
                                     x1 + 2x2 + 3x3 = 1
                                       2x1 − x2 + x3 = 2
                                       3x1 + x2 + x3 = 4
                                              x2 + 2x3 = 6

     C26†
                                     x1 + 2x2 + 3x3 = 1
                                       2x1 − x2 + x3 = 2
                                       3x1 + x2 + x3 = 4
                                             5x2 + 2x3 = 1

     C27†
                                     x1 + 2x2 + 3x3 = 0
                                       2x1 − x2 + x3 = 2
                                     x1 − 8x2 − 7x3 = 1
                                               x2 + x3 = 0

     C28†
                                     x1 + 2x2 + 3x3 = 1
                                       2x1 − x2 + x3 = 2
§T S S                       A First Course in Linear Algebra                                                 43

                                             x1 − 8x2 − 7x3 = 1
                                                         x2 + x3 = 0


M45† The details for Archetype J include several sample solutions. Verify that one of
these solutions is correct (any one, but just one). Based only on this evidence, and especially
without doing any row operations, explain how you know this system of linear equations
has infinitely many solutions.
M46 Consider Archetype J, and specifically the row-reduced version of the augmented
matrix of the system of equations, denoted as B here, and the values of r, D and F
immediately following. Determine the values of the entries
[B]1,d1       [B]3,d3   [B]1,d3   [B]3,d1     [B]d1 ,1    [B]d3 ,3   [B]d1 ,3     [B]d3 ,1   [B]1,f1   [B]3,f1
(See Exercise TSS.M70 for a generalization.)

For Exercises M51–M57 say as much as possible about each system’s solution set. Be
sure to make it clear which theorems you are using to reach your conclusions.
   M51† A consistent system of 8 equations in 6 variables.
   M52†         A consistent system of 6 equations in 8 variables.
   M53†         A system of 5 equations in 9 variables.
          †
   M54          A system with 12 equations in 35 variables.
   M56†         A system with 6 equations in 12 variables.
          †
   M57 A system with 8 equations and 6 variables. The reduced row-echelon form of
   the augmented matrix of the system has 7 pivot columns.

M60 Without doing any computations, and without examining any solutions, say as much
as possible about the form of the solution set for each archetype that is a system of equations.

Archetype A, Archetype B, Archetype C, Archetype D, Archetype E, Archetype F, Archetype
G, Archetype H, Archetype I, Archetype J
M70 Suppose that B is a matrix in reduced row-echelon form that is equivalent to the
augmented matrix of a system of equations with m equations in n variables. Let r, D and F
be as defined in Definition RREF. What can you conclude, in general, about the following
entries?
[B]1,d1       [B]3,d3   [B]1,d3   [B]3,d1     [B]d1 ,1    [B]d3 ,3   [B]d1 ,3     [B]d3 ,1   [B]1,f1   [B]3,f1
If you cannot conclude anything about an entry, then say so. (See Exercise TSS.M46.)
T10† An inconsistent system may have r > n. If we try (incorrectly!) to apply Theorem
FVCS to such a system, how many free variables would we discover?
T11† Suppose A is the augmented matrix of a system of linear equations in n variables.
and that B is a row-equivalent matrix in reduced row-echelon form with r pivot columns.
If r = n + 1, prove that the system of equations is inconsistent.
T20 Suppose that B is a matrix in reduced row-echelon form that is equivalent to the
augmented matrix of a system of equations with m equations in n variables. Let r, D and
F be as defined in Definition RREF. Prove that dk ≥ k for all 1 ≤ k ≤ r. Then suppose
that r ≥ 2 and 1 ≤ k < ` ≤ r and determine what can you conclude, in general, about the
following entries.
    [B]k,dk        [B]k,d`     [B]`,dk      [B]dk ,k      [B]dk ,`     [B]d` ,k      [B]dk ,f`    [B]d` ,fk
If you cannot conclude anything about an entry, then say so. (See Exercise TSS.M46 and
Exercise TSS.M70.)
T40† Suppose that the coefficient matrix of a consistent system of linear equations has
two columns that are identical. Prove that the system has infinitely many solutions.
T41† Consider the system of linear equations LS(A, b), and suppose that every element
of the vector of constants b is a common multiple of the corresponding element of a certain
column of A. More precisely, there is a complex number α, and a column index j, such
that [b]i = α [A]ij for all i. Prove that the system is consistent.
44   Ro b e rt B e e z e r   §T S S
Section HSE
Homogeneous Systems of Equations
In this section we specialize to systems of linear equations where every equation
has a zero as its constant term. Along the way, we will begin to express more and
more ideas in the language of matrices and begin a move away from writing out
whole systems of equations. The ideas initiated in this section will carry through
the remainder of the course.


Subsection SHS
Solutions of Homogeneous Systems
As usual, we begin with a definition.
Definition HS Homogeneous System
A system of linear equations, LS(A, b) is homogeneous if the vector of constants
is the zero vector, in other words, if b = 0.                                 
Example AHSAC Archetype C as a homogeneous system
For each archetype that is a system of equations, we have formulated a similar, yet
different, homogeneous system of equations by replacing each equation’s constant
term with a zero. To wit, for Archetype C, we can convert the original system of
equations into the homogeneous system,
                             2x1 − 3x2 + x3 − 6x4 = 0
                             4x1 + x2 + 2x3 + 9x4 = 0
                              3x1 + x2 + x3 + 8x4 = 0
  Can you quickly find a solution to this system without row-reducing the aug-
mented matrix?                                                              4
    As you might have discovered by studying Example AHSAC, setting each variable
to zero will always be a solution of a homogeneous system. This is the substance of
the following theorem.
Theorem HSC Homogeneous Systems are Consistent
Suppose that a system of linear equations is homogeneous. Then the system is
consistent and one solution is found by setting each variable to zero.

Proof. Set each variable of the system to zero. When substituting these values into
each equation, the left-hand side evaluates to zero, no matter what the coefficients
are. Since a homogeneous system has zero on the right-hand side of each equation
as the constant term, each equation is true. With one demonstrated solution, we
can call the system consistent.                                                   

   Since this solution is so obvious, we now define it as the trivial solution.
Definition TSHSE Trivial Solution to Homogeneous Systems of Equations
Suppose a homogeneous system of linear equations has n variables. The solution
x1 = 0, x2 = 0, . . . , xn = 0 (i.e. x = 0) is called the trivial solution.  
   Here are three typical examples, which we will reference throughout this section.
Work through the row operations as we bring each to reduced row-echelon form.
Also notice what is similar in each example, and what differs.
Example HUSAB Homogeneous, unique solution, Archetype B
Archetype B can be converted to the homogeneous system,
                              −7x1 − 6x2 − 12x3 = 0
                                 5x1 + 5x2 + 7x3 = 0
                                         x1 + 4x3 = 0

                                         45
46                            Ro b e rt B e e z e r                           §H S E

     whose augmented matrix row-reduces to
                                                   
                                1    0     0      0
                              0     1     0      0
                                0    0     1      0
    By Theorem HSC, the system is consistent, and so the computation n − r =
3 − 3 = 0 means the solution set contains just a single solution. Then, this lone
solution must be the trivial solution.                                         4
Example HISAA Homogeneous, infinite solutions, Archetype A
Archetype A can be converted to the homogeneous system,
                                 x1 − x2 + 2x3 = 0
                                 2x1 + x2 + x3 = 0
                                   x1 + x2      =0
     whose augmented matrix row-reduces to
                                                  
                                1    0     1      0
                              0     1 −1         0
                                0    0     0      0
    By Theorem HSC, the system is consistent, and so the computation n − r =
3 − 2 = 1 means the solution set contains one free variable by Theorem FVCS, and
hence has infinitely many solutions. We can describe this solution set using the free
variable x3 ,
                   (" #                      ) ("          #        )
                      x1                              −x3
              S=      x2 x1 = −x3 , x2 = x3 =          x3    x3 ∈ C
                      x3                               x3
    Geometrically, these are points in three dimensions that lie on a line through the
origin.                                                                             4
Example HISAD Homogeneous, infinite solutions, Archetype D
Archetype D (and identically, Archetype E) can be converted to the homogeneous
system,
                               2x1 + x2 + 7x3 − 7x4 = 0
                            −3x1 + 4x2 − 5x3 − 6x4 = 0
                                x1 + x2 + 4x3 − 5x4 = 0
     whose augmented matrix row-reduces to
                                                       
                               1   0 3 −2             0
                             0    1 1 −3             0
                               0   0 0 0              0
    By Theorem HSC, the system is consistent, and so the computation n − r =
4 − 2 = 2 means the solution set contains two free variables by Theorem FVCS, and
hence has infinitely many solutions. We can describe this solution set using the free
variables x3 and x4 ,
                                                              
                        x                                       
                       1                                        
                        x2 
                 S =   x1 = −3x3 + 2x4 , x2 = −x3 + 3x4
                       x3
                                                                
                                                                 
                         x4
                                                 
                        −3x3 + 2x4                 
                                                   
                         −x3 + 3x4 
                    =                 x3 , x4 ∈ C
                      
                             x3                    
                                                    
                              x4


                                                                                   4
    After working through these examples, you might perform the same computations
for the slightly larger example, Archetype J.
§H S E               A First Course in Linear Algebra                                  47

   Notice that when we do row operations on the augmented matrix of a homogeneous
system of linear equations the last column of the matrix is all zeros. Any one of
the three allowable row operations will convert zeros to zeros and thus, the final
column of the matrix in reduced row-echelon form will also be all zeros. So in this
case, we may be as likely to reference only the coefficient matrix and presume that
we remember that the final column begins with zeros, and after any number of row
operations is still zero.
   Example HISAD suggests the following theorem.
Theorem HMVEI Homogeneous, More Variables than Equations, Infinite solutions
Suppose that a homogeneous system of linear equations has m equations and n
variables with n > m. Then the system has infinitely many solutions.

Proof. We are assuming the system is homogeneous, so Theorem HSC says it is
consistent. Then the hypothesis that n > m, together with Theorem CMVEI, gives
infinitely many solutions.                                                  

    Example HUSAB and Example HISAA are concerned with homogeneous systems
where n = m and expose a fundamental distinction between the two examples. One
has a unique solution, while the other has infinitely many. These are exactly the
only two possibilities for a homogeneous system and illustrate that each is possible
(unlike the case when n > m where Theorem HMVEI tells us that there is only one
possibility for a homogeneous system).

Subsection NSM
Null Space of a Matrix
The set of solutions to a homogeneous system (which by Theorem HSC is never
empty) is of enough interest to warrant its own name. However, we define it as a
property of the coefficient matrix, not as a property of some system of equations.
Definition NSM Null Space of a Matrix
The null space of a matrix A, denoted N (A), is the set of all the vectors that are
solutions to the homogeneous system LS(A, 0).                                    
    In the Archetypes (Archetypes) each example that is a system of equations also
has a corresponding homogeneous system of equations listed, and several sample
solutions are given. These solutions will be elements of the null space of the coefficient
matrix. We will look at one example.
Example NSEAI Null space elements of Archetype I
The write-up for Archetype I lists several solutions of the corresponding homogeneous
system. Here are two, written as solution vectors. We can say that they are in the
null space of the coefficient matrix for the system of equations in Archetype I.
                                                            
                             3                                −4
                           0                               1
                                                            
                           −5                              −3
                                                            
                     x = −6                           y = −2
                           0                               1
                                                            
                           0                               1
                             1                                 1
   However, the vector
                                             
                                             1
                                            0
                                             
                                            0
                                             
                                        z = 0
                                            0
                                             
                                            0
                                             2
is not in the null space, since it is not a solution to the homogeneous system. For
example, it fails to even make the first equation true.                          4
48                              Ro b e rt B e e z e r                               §H S E

  Here are two (prototypical) examples of the computation of the null space of a
matrix.
Example CNS1 Computing a null space, no. 1
Let us compute the null space of
                              "                             #
                                 2 −1 7 −3               −8
                          A= 1 0      2 4                9
                                 2 2 −2 −1                8
which we write as N (A). Translating Definition NSM, we simply desire to solve the
homogeneous system LS(A, 0). So we row-reduce the augmented matrix to obtain
                                                   
                             1    0    2     0 1 0
                           0     1 −3 0 4 0
                             0    0    0     1 2 0
    The variables (of the homogeneous system) x3 and x5 are free (since columns 1,
2 and 4 are pivot columns), so we arrange the equations represented by the matrix
in reduced row-echelon form to
                                     x1 = −2x3 − x5
                                     x2 = 3x3 − 4x5
                                     x4 = −2x5


     So we can write the infinite solution set as sets using column vectors,
                                                            
                                 
                                    −2x3 − x5                 
                                                               
                                 
                                   3x3 − 4x5                
                                                               
                                               
                       N (A) =          x3      x3 , x 5 ∈ C
                                 
                                    −2x                     
                                                               
                                 
                                           5                  
                                                               
                                         x5
                                                                                         4
Example CNS2 Computing a null space,             no. 2
Let us compute the null space of
                                                   
                                   −4            6 1
                                  −1            4 1
                                C=
                                    5            6 7
                                    4            7 1
which we write as N (C). Translating Definition NSM, we simply desire to solve the
homogeneous system LS(C, 0). So we row-reduce the augmented matrix to obtain
                                                
                                  1   0     0 0
                               0     1     0 0
                                                
                               0     0     1 0
                                  0   0     0 0
   There are no free variables in the homogeneous system represented by the row-
reduced matrix, so there is only the trivial solution, the zero vector, 0. So we can
write the (trivial) solution set as
                                              (" #)
                                                  0
                                N (C) = {0} =     0
                                                  0

Reading Questions

1. What is always true of the solution set for a homogeneous system of equations?
2. Suppose a homogeneous system of equations has 13 variables and 8 equations. How
   many solutions will it have? Why?
3. Describe, using only words, the null space of a matrix. (So in particular, do not use any
   symbols.)
§H S E                A First Course in Linear Algebra                             49

Exercises
C10 Each Archetype (Archetypes) that is a system of equations has a corresponding
homogeneous system with the same coefficient matrix. Compute the set of solutions for
each. Notice that these solution sets are the null spaces of the coefficient matrices.

Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J
C20 Archetype K and Archetype L are simply 5 × 5 matrices (i.e. they are not systems
of equations). Compute the null space of each matrix.


For Exercises C21-C23, solve the given homogeneous linear system. Compare your results
to the results of the corresponding exercise in Section TSS.
   C21†
                                  x1 + 4x2 + 3x3 − x4 = 0
                                   x1 − x2 + x3 + 2x4 = 0
                                 4x1 + x2 + 6x3 + 5x4 = 0

   C22†
                                   x1 − 2x2 + x3 − x4 = 0
                                  2x1 − 4x2 + x3 + x4 = 0
                                 x1 − 2x2 − 2x3 + 3x4 = 0

   C23†
                                  x1 − 2x2 + x3 − x4 = 0
                                   x1 + x2 + x3 − x4 = 0
                                      x1    + x3 − x4 = 0




For Exercises C25-C27, solve the given homogeneous linear system. Compare your results
to the results of the corresponding exercise in Section TSS.
   C25†
                                    x1 + 2x2 + 3x3 = 0
                                     2x1 − x2 + x3 = 0
                                     3x1 + x2 + x3 = 0
                                            x2 + 2x3 = 0

   C26†
                                    x1 + 2x2 + 3x3 = 0
                                     2x1 − x2 + x3 = 0
                                     3x1 + x2 + x3 = 0
                                           5x2 + 2x3 = 0

   C27†
                                    x1 + 2x2 + 3x3 = 0
                                     2x1 − x2 + x3 = 0
                                    x1 − 8x2 − 7x3 = 0
                                             x2 + x3 = 0


C30†     Compute the null space of the matrix   A, N (A).
                                     2   4       1   3      8
                                                             
                                  −1 −2        −1   −1     1
                             A=
                                     2   4       0   −3     4
                                     2   4      −1   −7     4
50                             Ro b e rt B e e z e r                             §H S E

C31†    Find the null space of the matrix B, N (B).
                                                         
                                     −6    4    −36     6
                               B= 2      −1     10    −1
                                     −3    2    −18     3

M45 Without doing any computations, and without examining any solutions, say as
much as possible about the form of the solution set for corresponding homogeneous system
of equations of each archetype that is a system of equations.

Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J

For Exercises M50–M52 say as much as possible about each system’s solution set. Be
sure to make it clear which theorems you are using to reach your conclusions.
   M50† A homogeneous system of 8 equations in 8 variables.
     M51†   A homogeneous system of 8 equations in 9 variables.
     M52†   A homogeneous system of 8 equations in 7 variables.

T10† Prove or disprove: A system of linear equations is homogeneous if and only if the
system has the zero vector as a solution.
T11† Suppose that two systems of linear equations are equivalent. Prove that if the first
system is homogeneous, then the second system is homogeneous. Notice that this will
allow us to conclude that two equivalent systems are either both homogeneous or both not
homogeneous.
T12    Give an alternate proof of Theorem HSC that uses Theorem RCLS.
T20† Consider the homogeneous system of linear equations LS(A, 0), and suppose that
     u                                                               4u 
         1                                                                 1
       u
      2                                                              4u2 
      u3                                                             4u3 
u=   .  is one solution to the system of equations. Prove that v =  .  is also a
                                                                           
      .                                                              . 
        .                                                                .
       un                                                               4un
solution to LS(A, 0).
Section NM
Nonsingular Matrices
In this section we specialize further and consider matrices with equal numbers of
rows and columns, which when considered as coefficient matrices lead to systems
with equal numbers of equations and variables. We will see in the second half of
the course (Chapter D, Chapter E, Chapter LT, Chapter R) that these matrices are
especially important.



Subsection NM
Nonsingular Matrices
Our theorems will now establish connections between systems of equations (homo-
geneous or otherwise), augmented matrices representing those systems, coefficient
matrices, constant vectors, the reduced row-echelon form of matrices (augmented and
coefficient) and solution sets. Be very careful in your reading, writing and speaking
about systems of equations, matrices and sets of vectors. A system of equations is
not a matrix, a matrix is not a solution set, and a solution set is not a system of
equations. Now would be a great time to review the discussion about speaking and
writing mathematics in Proof Technique L.
Definition SQM Square Matrix
A matrix with m rows and n columns is square if m = n. In this case, we say the
matrix has size n. To emphasize the situation when a matrix is not square, we will
call it rectangular.                                                           
   We can now present one of the central definitions of linear algebra.
Definition NM Nonsingular Matrix
Suppose A is a square matrix. Suppose further that the solution set to the homoge-
neous linear system of equations LS(A, 0) is {0}, in other words, the system has
only the trivial solution. Then we say that A is a nonsingular matrix. Otherwise
we say A is a singular matrix.                                                  
    We can investigate whether any square matrix is nonsingular or not, no matter if
the matrix is derived somehow from a system of equations or if it is simply a matrix.
The definition says that to perform this investigation we must construct a very
specific system of equations (homogeneous, with the matrix as the coefficient matrix)
and look at its solution set. We will have theorems in this section that connect
nonsingular matrices with systems of equations, creating more opportunities for
confusion. Convince yourself now of two observations, (1) we can decide nonsingularity
for any square matrix, and (2) the determination of nonsingularity involves the
solution set for a certain homogeneous system of equations.
    Notice that it makes no sense to call a system of equations nonsingular (the term
does not apply to a system of equations), nor does it make any sense to call a 5 × 7
matrix singular (the matrix is not square).
Example S A singular matrix, Archetype A
Example HISAA shows that the coefficient matrix derived from Archetype A, specif-
ically the 3 × 3 matrix,
                                   "           #
                                     1 −1 2
                              A= 2 1 1
                                     1 1 0
is a singular matrix since there are nontrivial solutions to the homogeneous system
LS(A, 0).                                                                         4
Example NM A nonsingular matrix, Archetype B
Example HUSAB shows that the coefficient matrix derived from Archetype B,

                                         51
52                               Ro b e rt B e e z e r                             §N M

specifically the 3 × 3 matrix,
                                      "                #
                                    −7       −6     −12
                                 B= 5         5      7
                                     1        0      4
is a nonsingular matrix since the homogeneous system, LS(B, 0), has only the trivial
solution.                                                                         4
   Notice that we will not discuss Example HISAD as being a singular or nonsingular
coefficient matrix since the matrix is not square.
   The next theorem combines with our main computational technique (row reducing
a matrix) to make it easy to recognize a nonsingular matrix. But first a definition.
Definition IM Identity Matrix
The m × m identity matrix, Im , is defined by
                              (
                               1 i=j
                    [Im ]ij =               1 ≤ i, j ≤ m
                               0 i 6= j
                                                                                        
Example IM An identity matrix
The 4 × 4 identity matrix is
                                                     
                                        1   0   0   0
                                       0   1   0   0
                                  I4 =                 .
                                        0   0   1   0
                                        0   0   0   1
                                                                                       4
   Notice that an identity matrix is square, and in reduced row-echelon form. Also,
every column is a pivot column, and every possible pivot column appears once.
Theorem NMRRI Nonsingular Matrices Row Reduce to the Identity matrix
Suppose that A is a square matrix and B is a row-equivalent matrix in reduced
row-echelon form. Then A is nonsingular if and only if B is the identity matrix.

Proof. (⇐) Suppose B is the identity matrix. When the augmented matrix [ A | 0]
is row-reduced, the result is [ B | 0] = [ In | 0]. The number of nonzero rows is equal
to the number of variables in the linear system of equations LS(A, 0), so n = r
and Theorem FVCS gives n − r = 0 free variables. Thus, the homogeneous system
LS(A, 0) has just one solution, which must be the trivial solution. This is exactly
the definition of a nonsingular matrix (Definition NM).
    (⇒) If A is nonsingular, then the homogeneous system LS(A, 0) has a unique
solution, and has no free variables in the description of the solution set. The homo-
geneous system is consistent (Theorem HSC) so Theorem FVCS applies and tells
us there are n − r free variables. Thus, n − r = 0, and so n = r. So B has n pivot
columns among its total of n columns. This is enough to force B to be the n × n
identity matrix In (see Exercise NM.T12).                                            

     Notice that since this theorem is an equivalence it will always allow us to determine
if a matrix is either nonsingular or singular. Here are two examples of this, continuing
our study of Archetype A and Archetype B.
Example SRR Singular matrix, row-reduced
We have the coefficient matrix for Archetype A and a row-equivalent matrix B in
reduced row-echelon form,
                       "          #                    
                         1 −1 2             1    0    1
                                    RREF
                  A = 2 1 1 −−−−→  0            1 −1 = B
                         1 1 0              0    0    0
    Since B is not the 3 × 3 identity matrix, Theorem NMRRI tells us that A is a
singular matrix.                                                              4
§N M                A First Course in Linear Algebra                                53

Example NSR Nonsingular matrix, row-reduced
We have the coefficient matrix for Archetype B and     a row-equivalent matrix B in
reduced row-echelon form,
                     "              #                         
                      −7 −6 −12               1        0     0
                                      RREF
               A= 5        5      7 −−−−→  0          1     0=B
                       1   0      4           0        0     1
   Since B is the 3 × 3 identity matrix, Theorem NMRRI tells us that A is a
nonsingular matrix.                                                      4

Subsection NSNM
Null Space of a Nonsingular Matrix
Nonsingular matrices and their null spaces are intimately related, as the next two
examples illustrate.
Example NSS Null space of a singular matrix
Given the singular coefficient matrix from Archetype A, the null space is the set of
solutions to the homogeneous system of equations LS(A, 0), which has a solution
set and null space constructed in Example HISAA as an infinite set of vectors.

               "             #                      ("    #        )
                1      −1   2                         −x3
             A= 2       1   1               N (A) =    x3   x3 ∈ C
                1       1   0                          x3
                                                                                    4
Example NSNM Null space of a nonsingular matrix
Given the nonsingular coefficient matrix from Archetype B, the solution set to the
homogeneous system LS(A, 0) is constructed in Example HUSAB and contains only
the trivial solution, so the null space of A has only a single element,

                 "                     #                      (" #)
                  −7        −6   −12                            0
               A= 5          5    7                   N (A) =   0
                   1         0    4                             0
                                                                                    4
   These two examples illustrate the next theorem, which is another equivalence.
Theorem NMTNS Nonsingular Matrices have Trivial Null Spaces
Suppose that A is a square matrix. Then A is nonsingular if and only if the null
space of A is the set containing only the zero vector, i.e. N (A) = {0}.

Proof. The null space of a square matrix, A, is equal to the set of solutions to the
homogeneous system, LS(A, 0). A matrix is nonsingular if and only if the set of
solutions to the homogeneous system, LS(A, 0), has only a trivial solution. These
two observations may be chained together to construct the two proofs necessary for
each half of this theorem.                                                        

   The next theorem pulls a lot of big ideas together. Theorem NMUS tells us that
we can learn much about solutions to a system of linear equations with a square
coefficient matrix by just examining a similar homogeneous system.
Theorem NMUS Nonsingular Matrices and Unique Solutions
Suppose that A is a square matrix. A is a nonsingular matrix if and only if the
system LS(A, b) has a unique solution for every choice of the constant vector b.

Proof. (⇐) The hypothesis for this half of the proof is that the system LS(A, b)
has a unique solution for every choice of the constant vector b. We will make a very
specific choice for b: b = 0. Then we know that the system LS(A, 0) has a unique
solution. But this is precisely the definition of what it means for A to be nonsingular
(Definition NM). That almost seems too easy! Notice that we have not used the full
54                             Ro b e rt B e e z e r                                §N M

power of our hypothesis, but there is nothing that says we must use a hypothesis to
its fullest.
    (⇒) We assume that A is nonsingular of size n × n, so we know there is a
sequence of row operations that will convert A into the identity matrix In (Theorem
NMRRI). Form the augmented matrix A0 = [ A | b] and apply this same sequence
of row operations to A0 . The result will be the matrix B 0 = [ In | c], which is in
reduced row-echelon form with r = n. Then the augmented matrix B 0 represents the
(extremely simple) system of equations xi = [c]i , 1 ≤ i ≤ n. The vector c is clearly a
solution, so the system is consistent (Definition CS). With a consistent system, we
use Theorem FVCS to count free variables. We find that there are n − r = n − n = 0
free variables, and so we therefore know that the solution is unique. (This half of
the proof was suggested by Asa Scherer.)                                             
    This theorem helps to explain part of our interest in nonsingular matrices. If a
matrix is nonsingular, then no matter what vector of constants we pair it with, using
the matrix as the coefficient matrix will always yield a linear system of equations
with a solution, and the solution is unique. To determine if a matrix has this property
(nonsingularity) it is enough to just solve one linear system, the homogeneous system
with the matrix as coefficient matrix and the zero vector as the vector of constants
(or any other vector of constants, see Exercise MM.T10).
    Formulating the negation of the second part of this theorem is a good exercise.
A singular matrix has the property that for some value of the vector b, the system
LS(A, b) does not have a unique solution (which means that it has no solution or
infinitely many solutions). We will be able to say more about this case later (see the
discussion following Theorem PSPHS).
    Square matrices that are nonsingular have a long list of interesting properties,
which we will start to catalog in the following, recurring, theorem. Of course, singular
matrices will then have all of the opposite properties. The following theorem is a list
of equivalences.
    We want to understand just what is involved with understanding and proving
a theorem that says several conditions are equivalent. So have a look at Proof
Technique ME before studying the first in this series of theorems.
Theorem NME1 Nonsingular Matrix Equivalences, Round 1
Suppose that A is a square matrix. The following are equivalent.

     1. A is nonsingular.
     2. A row-reduces to the identity matrix.
     3. The null space of A contains only the zero vector, N (A) = {0}.
     4. The linear system LS(A, b) has a unique solution for every possible choice of
        b.

Proof. The statement that A is nonsingular is equivalent to each of the subsequent
statements by, in turn, Theorem NMRRI, Theorem NMTNS and Theorem NMUS.
So the statement of this theorem is just a convenient way to organize all these results.

    Finally, you may have wondered why we refer to a matrix as nonsingular when
it creates systems of equations with single solutions (Theorem NMUS)! I have
wondered the same thing. We will have an opportunity to address this when we get
to Theorem SMZD. Can you wait that long?

Reading Questions

1. In your own words state the definition of a nonsingular matrix.
2. What is the easiest way to recognize if a square matrix is nonsingular or not?
3. Suppose we have a system of equations and its coefficient matrix is nonsingular. What
   can you say about the solution set for this system?
§N M                  A First Course in Linear Algebra                                     55

Exercises

In Exercises C30–C33 determine if the matrix is nonsingular or singular. Give reasons for
your answer.
   C30†
                                     −3   1    2   8
                                                     
                                     2   0    3   4  
                                   1     2    7 −4
                                      5  −1 2      0

   C31†
                                            2    3   1    4
                                                           
                                          1     1   1    0
                                          −1    2   3    5
                                            1    2   1    3

   C32†
                                                          
                                        9      3     2   4
                                       5     −6     1    3
                                        4      1     3   −5

   C33†
                                       −1        2   0     3
                                                            
                                      1        −3   −2    4
                                      −2        0   4     3
                                       −3        1   −2    3

C40 Each of the archetypes below is a system of equations with a square coefficient
matrix, or is itself a square matrix. Determine if these matrices are nonsingular, or singular.
Comment on the null space of each matrix.

Archetype A, Archetype B, Archetype F, Archetype K, Archetype L
C50†    Find the null space of the matrix E below.
                                       2   1 −1            −9
                                                             
                                    2     2 −6            −6
                                E=
                                       1   2 −8             0
                                      −1 2 −12             12

M30† Let A be the coefficient matrix of the system of equations below. Is A nonsingular
or singular? Explain what you could infer about the solution set for the system based only
on what you have learned about A being singular or nonsingular.
                                                −x1 + 5x2 = −8
                               −2x1 + 5x2 + 5x3 + 2x4 = 9
                                 −3x1 − x2 + 3x3 + x4 = 3
                                  7x1 + 6x2 + 5x3 + x4 = 30


For Exercises M51–M52 say as much as possible about each system’s solution set. Be
sure to make it clear which theorems you are using to reach your conclusions.
   M51† 6 equations in 6 variables, singular coefficient matrix.
   M52†     A system with a nonsingular coefficient matrix, not homogeneous.

T10† Suppose that A is a square matrix, and B is a matrix in reduced row-echelon form
that is row-equivalent to A. Prove that if A is singular, then the last row of B is a zero row.
T12 Using (Definition RREF) and (Definition IM) carefully, give a proof of the following
equivalence: A is a square matrix in reduced row-echelon form where every column is a
pivot column if and only if A is the identity matrix.
T30† Suppose that A is a nonsingular matrix and A is row-equivalent to the matrix B.
Prove that B is nonsingular.
T31† Suppose that A is a square matrix of size n × n and that we know there is a single
vector b ∈ Cn such that the system LS(A, b) has a unique solution. Prove that A is a
56                              Ro b e rt B e e z e r                               §N M

nonsingular matrix. (Notice that this is very similar to Theorem NMUS, but is not exactly
the same.)
T90† Provide an alternative for the second half of the proof of Theorem NMUS, without
appealing to properties of the reduced row-echelon form of the coefficient matrix. In other
words, prove that if A is nonsingular, then LS(A, b) has a unique solution for every choice
of the constant vector b. Construct this proof without using Theorem REMEF or Theorem
RREFU.
Chapter V
Vectors

We have worked extensively in the last chapter with matrices, and some with vectors.
In this chapter we will develop the properties of vectors, while preparing to study
vector spaces (Chapter VS). Initially we will depart from our study of systems
of linear equations, but in Section LC we will forge a connection between linear
combinations and systems of linear equations in Theorem SLSLC. This connection
will allow us to understand systems of linear equations at a higher level, while
consequently discussing them less frequently.


Section VO
Vector Operations
In this section we define some new operations involving vectors, and collect some
basic properties of these operations. Begin by recalling our definition of a column
vector as an ordered list of complex numbers, written vertically (Definition CV).
The collection of all possible vectors of a fixed size is a commonly used set, so we
start with its definition.

Subsection CV
Column Vectors
Definition VSCV Vector Space of Column Vectors
The vector space Cm is the set of all column vectors (Definition CV) of size m with
entries from the set of complex numbers, C.                                      
    When a set similar to this is defined using only column vectors where all the
entries are from the real numbers, it is written as Rm and is known as Euclidean
m-space.
    The term vector is used in a variety of different ways. We have defined it as
an ordered list written vertically. It could simply be an ordered list of numbers,
and perhaps written as h2, 3, −1, 6i. Or it could be interpreted as a point in m
dimensions, such as (3, 4, −2) representing a point in three dimensions relative to x,
y and z axes. With an interpretation as a point, we can construct an arrow from the
origin to the point which is consistent with the notion that a vector has direction
and magnitude.
    All of these ideas can be shown to be related and equivalent, so keep that in mind
as you connect the ideas of this course with ideas from other disciplines. For now,
we will stick with the idea that a vector is just a list of numbers, in some particular
order.

Subsection VEASM
Vector Equality, Addition, Scalar Multiplication
We start our study of this set by first defining what it means for two vectors to be
the same.

                                          57
58                             Ro b e rt B e e z e r                             §VO

Definition CVE Column Vector Equality
Suppose that u, v ∈ Cm . Then u and v are equal, written u = v if
                     [u]i = [v]i                          1≤i≤m
                                                                                    
    Now this may seem like a silly (or even stupid) thing to say so carefully. Of
course two vectors are equal if they are equal for each corresponding entry! Well,
this is not as silly as it appears. We will see a few occasions later where the obvious
definition is not the right one. And besides, in doing mathematics we need to be very
careful about making all the necessary definitions and making them unambiguous.
And we have done that here.
    Notice now that the symbol “=” is now doing triple-duty. We know from our
earlier education what it means for two numbers (real or complex) to be equal, and
we take this for granted. In Definition SE we defined what it meant for two sets
to be equal. Now we have defined what it means for two vectors to be equal, and
that definition builds on our definition for when two numbers are equal when we
use the condition ui = vi for all 1 ≤ i ≤ m. So think carefully about your objects
when you see an equal sign and think about just which notion of equality you have
encountered. This will be especially important when you are asked to construct
proofs whose conclusion states that two objects are equal. If you have an electronic
copy of the book, such as the PDF version, searching on “Definition CVE” can be
an instructive exercise. See how often, and where, the definition is employed.
    OK, let us do an example of vector equality that begins to hint at the utility of
this definition.
Example VESE Vector equality for a system of equations
Consider the system of linear equations in Archetype B,
                             −7x1 − 6x2 − 12x3 = −33
                                   5x1 + 5x2 + 7x3 = 24
                                           x1 + 4x3 = 5
    Note the use of three equals signs — each indicates an equality of numbers (the
linear expressions are numbers when we evaluate them with fixed values of the
variable quantities). Now write the vector equality,
                          "                   # "        #
                            −7x1 − 6x2 − 12x3        −33
                             5x1 + 5x2 + 7x3    = 24 .
                                 x1 + 4x3             5
By Definition CVE, this single equality (of two column vectors) translates into three
simultaneous equalities of numbers that form the system of equations. So with this
new notion of vector equality we can become less reliant on referring to systems of
simultaneous equations. There is more to vector equality than just this, but this is a
good example for starters and we will develop it further.                           4
   We will now define two operations on the set Cm . By this we mean well-defined
procedures that somehow convert vectors into other vectors. Here are two of the
most basic definitions of the entire course.
Definition CVA Column Vector Addition
Suppose that u, v ∈ Cm . The sum of u and v is the vector u + v defined by
                  [u + v]i = [u]i + [v]i                   1≤i≤m
                                                                                    
    So vector addition takes two vectors of the same size and combines them (in a
natural way!) to create a new vector of the same size. Notice that this definition
is required, even if we agree that this is the obvious, right, natural or correct way
to do it. Notice too that the symbol ‘+’ is being recycled. We all know how to add
numbers, but now we have the same symbol extended to double-duty and we use
it to indicate how to add two new objects, vectors. And this definition of our new
meaning is built on our previous meaning of addition via the expressions ui + vi .
§VO                 A First Course in Linear Algebra                                 59

Think about your objects, especially when doing proofs. Vector addition is easy, here
is an example from C4 .
Example VA Addition of two vectors in C4
If
                                                       
                      2                                  −1
                   −3                                 5
               u=                                   v= 
                      4                                   2
                      2                                  −7
then
                            
                                          
                        2    −1    2 + (−1)    1
                       −3  5   −3 + 5   2 
                  u+v = + =             =
                        4     2      4+2   6 
                        2    −7    2 + (−7)    −5
                                                                                     4
   Our second operation takes two objects of different types, specifically a number
and a vector, and combines them to create another vector. In this context we call a
number a scalar in order to emphasize that it is not a vector.
Definition CVSM Column Vector Scalar Multiplication
Suppose u ∈ Cm and α ∈ C, then the scalar multiple of u by α is the vector αu
defined by
                     [αu]i = α [u]i                     1≤i≤m
                                                                                     
   Notice that we are doing a kind of multiplication here, but we are defining a new
type, perhaps in what appears to be a natural way. We use juxtaposition (smashing
two symbols together side-by-side) to denote this operation rather than using a
symbol like we did with vector addition. So this can be another source of confusion.
When two symbols are next to each other, are we doing regular old multiplication,
the kind we have done for years, or are we doing scalar vector multiplication, the
operation we just defined? Think about your objects — if the first object is a scalar,
and the second is a vector, then it must be that we are doing our new operation,
and the result of this operation will be another vector.
   Notice how consistency in notation can be an aid here. If we write scalars as
lower case Greek letters from the start of the alphabet (such as α, β, . . . ) and write
vectors in bold Latin letters from the end of the alphabet (u, v, . . . ), then we have
some hints about what type of objects we are working with. This can be a blessing
and a curse, since when we go read another book about linear algebra, or read an
application in another discipline (physics, economics, . . . ) the types of notation
employed may be very different and hence unfamiliar.
   Again, computationally, vector scalar multiplication is very easy.
Example CVSM Scalar multiplication in C5
If
                                   
                                     3
                                  1
                                   
                             u = −2
                                  4
                                    −1
and α = 6, then
                                                
                                 3      6(3)      18
                                1   6(1)   6 
                                                
                        αu = 6 −2 = 6(−2) = −12 .
                                4   6(4)   24 
                                −1     6(−1)     −6
                                                                                     4
60                             Ro b e rt B e e z e r                              §VO

Subsection VSP
Vector Space Properties
With definitions of vector addition and scalar multiplication we can state, and prove,
several properties of each operation, and some properties that involve their interplay.
We now collect ten of them here for later reference.
Theorem VSPCV Vector Space Properties of Column Vectors
Suppose that Cm is the set of column vectors of size m (Definition VSCV) with
addition and scalar multiplication as defined in Definition CVA and Definition CVSM.
Then

     • ACC Additive Closure, Column Vectors
       If u, v ∈ Cm , then u + v ∈ Cm .

     • SCC Scalar Closure, Column Vectors
       If α ∈ C and u ∈ Cm , then αu ∈ Cm .

     • CC Commutativity, Column Vectors
       If u, v ∈ Cm , then u + v = v + u.

     • AAC Additive Associativity, Column Vectors
       If u, v, w ∈ Cm , then u + (v + w) = (u + v) + w.

     • ZC Zero Vector, Column Vectors
      There is a vector, 0, called the zero vector, such that u + 0 = u for all u ∈ Cm .

     • AIC Additive Inverses, Column Vectors
       If u ∈ Cm , then there exists a vector −u ∈ Cm so that u + (−u) = 0.

     • SMAC Scalar Multiplication Associativity, Column Vectors
       If α, β ∈ C and u ∈ Cm , then α(βu) = (αβ)u.

     • DVAC Distributivity across Vector Addition, Column Vectors
       If α ∈ C and u, v ∈ Cm , then α(u + v) = αu + αv.

     • DSAC Distributivity across Scalar Addition, Column Vectors
       If α, β ∈ C and u ∈ Cm , then (α + β)u = αu + βu.

     • OC One, Column Vectors
       If u ∈ Cm , then 1u = u.

Proof. While some of these properties seem very obvious, they all require proof.
However, the proofs are not very interesting, and border on tedious. We will prove
one version of distributivity very carefully, and you can test your proof-building
skills on some of the others. We need to establish an equality, so we will do so by
beginning with one side of the equality, apply various definitions and theorems (listed
to the right of each step) to massage the expression from the left into the expression
on the right. Here we go with a proof of Property DSAC.
    For 1 ≤ i ≤ m,
              [(α + β)u]i = (α + β) [u]i               Definition CVSM
                         = α [u]i + β [u]i             Property DCN
                         = [αu]i + [βu]i               Definition CVSM
                         = [αu + βu]i                  Definition CVA


    Since the individual components of the vectors (α + β)u and αu + βu are equal
for all i, 1 ≤ i ≤ m, Definition CVE tells us the vectors are equal.           
§VO                 A First Course in Linear Algebra                                61

    Many of the conclusions of our theorems can be characterized as “identities,”
especially when we are establishing basic properties of operations such as those in
this section. Most of the properties listed in Theorem VSPCV are examples. So
some advice about the style we use for proving identities is appropriate right now.
Have a look at Proof Technique PI.
    Be careful with the notion of the vector −u. This is a vector that we add to u
so that the result is the particular vector 0. This is basically a property of vector
addition. It happens that we can compute −u using the other operation, scalar
multiplication. We can prove this directly by writing that
                        [−u]i = − [u]i = (−1) [u]i = [(−1)u]i
We will see later how to derive this property as a consequence of several of the ten
 properties listed in Theorem VSPCV.
     Similarly, we will often write something you would immediately recognize as
“vector subtraction.” This could be placed on a firm theoretical foundation — as you
 can do yourself with Exercise VO.T30.
     A final note. Property AAC implies that we do not have to be careful about how
we “parenthesize” the addition of vectors. In other words, there is nothing to be
 gained by writing (u + v) + (w + (x + y)) rather than u + v + w + x + y, since we
 get the same result no matter which order we choose to perform the four additions.
 So we will not be careful about using parentheses this way.


Reading Questions

1. Where have you seen vectors used before in other courses? How were they different?
2. In words only, when are two vectors equal?
3. Perform the following computation with vector operations
                                                
                                      1           7
                                   2 5 + (−3) 6
                                      0           5


Exercises
C10†   Compute
                                 2            1     −1
                                             
                               −3         2 3
                             4  4  + (−2) −5 +  0 
                                             
                               1          2 1
                                 0            4      2

C11†   Solve the given vector equation for x, or explain why no solution exists:
                                             
                                    1         2      11
                                3  2  + 4 0 =  6 
                                   −1         x      17

C12†   Solve the given vector equation for α, or explain why no solution exists:
                                             
                                   1         3       −1
                               α  2  + 4 4 =  0 
                                  −1         2        4

C13†   Solve the given vector equation for α, or explain why no solution exists:
                                       
                                    3       6        0
                                α  2  + 1 = −3
                                   −2       2        6

C14†   Find α and β that solve the vector equation.
                                            
                                    1       0      3
                                 α     +β      =
                                    0       1      2
62                              Ro b e rt B e e z e r                               §VO

C15†   Find α and β that solve the vector equation.
                                            
                                    2       1      5
                                 α     +β      =
                                    1       3      0

T05† Provide reasons (mostly vector space properties) as justification for each of the
seven steps of the following proof.

Theorem For any vectors u, v, w ∈ Cm , if u + v = u + w, then v = w.

Proof : Let u, v, w ∈ Cm , and suppose u + v = u + w.
                                    v =0+v
                                        = (−u + u) + v
                                        = −u + (u + v)
                                        = −u + (u + w)
                                        = (−u + u) + w
                                        =0+w
                                        =w

T06† Provide reasons (mostly vector space properties) as justification for each of the six
steps of the following proof.

Theorem For any vector u ∈ Cm , 0u = 0.

Proof : Let u ∈ Cm .



                                 0 = 0u + (−0u)
                                   = (0 + 0)u + (−0u)
                                   = (0u + 0u) + (−0u)
                                   = 0u + (0u + (−0u))
                                   = 0u + 0
                                   = 0u

T07† Provide reasons (mostly vector space properties) as justification for each of the six
steps of the following proof.

Theorem For any scalar c, c 0 = 0.

Proof : Let c be an arbitrary scalar.



                                  0 = c0 + (−c0)
                                    = c(0 + 0) + (−c0)
                                    = (c0 + c0) + (−c0)
                                    = c0 + (c0 + (−c0))
                                    = c0 + 0
                                    = c0

T13† Prove Property CC of Theorem VSPCV. Write your proof in the style of the proof
of Property DSAC given in this section.
T17 Prove Property SMAC of Theorem VSPCV. Write your proof in the style of the
proof of Property DSAC given in this section.
T18 Prove Property DVAC of Theorem VSPCV. Write your proof in the style of the
proof of Property DSAC given in this section.


Exercises T30, T31 and T32 are about making a careful definition of “vector subtraction”.
§VO               A First Course in Linear Algebra                                   63

 T30 Suppose u and v are two vectors in Cm . Define a new operation, called “sub-
 traction,” as the new vector denoted u − v and defined by
                   [u − v]i = [u]i − [v]i                  1≤i≤m
 Prove that we can express the subtraction of two vectors in terms of our two basic
 operations. More precisely, prove that u − v = u + (−1)v. So in a sense, subtraction is
 not something new and different, but is just a convenience. Mimic the style of similar
 proofs in this section.
 T31 Prove, by giving counterexamples, that vector subtraction is not commutative
 and not associative.
 T32 Prove that vector subtraction obeys a distributive property. Specifically, prove
 that α (u − v) = αu − αv.
 Can you give two different proofs? Distinguish your two proofs by using the alternate
 descriptions of vector subtraction provided by Exercise VO.T30.
64   Ro b e rt B e e z e r   §VO
Section LC
Linear Combinations
In Section VO we defined vector addition and scalar multiplication. These two
operations combine nicely to give us a construction known as a linear combination,
a construct that we will work with throughout this course.


Subsection LC
Linear Combinations
Definition LCCV Linear Combination of Column Vectors
Given n vectors u1 , u2 , u3 , . . . , un from Cm and n scalars α1 , α2 , α3 , . . . , αn , their
linear combination is the vector
                            α1 u1 + α2 u2 + α3 u3 + · · · + αn un
                                                                                              
    So this definition takes an equal number of scalars and vectors, combines them
using our two new operations (scalar multiplication and vector addition) and creates
a single brand-new vector, of the same size as the original vectors. When a definition
or theorem employs a linear combination, think about the nature of the objects that
go into its creation (lists of scalars and vectors), and the type of object that results
(a single vector). Computationally, a linear combination is pretty easy.
Example TLC Two linear combinations in C6
Suppose that
            α1 = 1              α2 = −4               α3 = 2              α4 = −1
and
                                                                            
               2                    6                     −5                     3
             4                   3                   2                   2
                                                                          
             −3                  0                   1                   −5
        u1 =                u2 =                u3 =                u4 =  
             1                   −2                  1                   7
             2                   1                   −3                  1
               9                    4                      0                    3
then their linear combination is
                                                                
                                      2           6        −5          3
                                     4        3       2        2
                                                                
                                     −3       0       1        −5
 α1 u1 + α2 u2 + α3 u3 + α4 u4 = (1)   + (−4)   + (2)   + (−1)  
                                     1        −2      1        7
                                     2        1       −3       1
                                      9           4         0          3
                                                            
                                   2      −24      −10     −3    −35
                                  4  −12  4  −2  −6 
                                                            
                                 −3  0   2   5   4 
                               = +          +     + =       
                                  1   8   2  −7  4 
                                  2   −4   −6  −1  −9 
                                   9      −16       0      −3    −10
    A different linear combination, of the same set of vectors, can be formed with
different scalars. Take
            β1 = 3               β2 = 0              β3 = 5               β4 = −1
and form the linear combination
                                                                
                                       2          6        −5          3
                                      4       3       2        2
                                                                
                                      −3      0       1        −5
  β1 u1 + β2 u2 + β3 u3 + β4 u4 = (3)   + (0)   + (5)   + (−1)  
                                      1       −2      1        7
                                      2       1       −3       1
                                       9          4        0           3

                                               65
66                              Ro b e rt B e e z e r                              §L C
                                                     
                                     6    0   −25     −3  −22
                                   12  0  10  −2  20 
                                                     
                                  −9 0  5   5   1 
                                 = + +        + =    
                                   3  0  5  −7  1 
                                   6  0 −15 −1 −10
                                    27    0     0     −3   24
    Notice how we could keep our set of vectors fixed, and use different sets of scalars
to construct different vectors. You might build a few new linear combinations of
u1 , u2 , u3 , u4 right now. We will be right here when you get back. What vectors
were you able to create? Do you think you could create the vector w with a “suitable”
choice of four scalars?
                                              
                                            13
                                           15 
                                              
                                           5 
                                        w=    
                                          −17
                                           2 
                                            25
Do you think you could create any possible vector from C6 by choosing the proper
scalars? These last two questions are very fundamental, and time spent considering
them now will prove beneficial later.                                           4
   Our next two examples are key ones, and a discussion about decompositions is
timely. Have a look at Proof Technique DC before studying the next two examples.
Example ABLC Archetype B as a linear combination
In this example we will rewrite Archetype B in the language of vectors, vector
equality and linear combinations. In Example VESE we wrote the system of m = 3
equations as the vector equality
                          "                   # "   #
                            −7x1 − 6x2 − 12x3    −33
                             5x1 + 5x2 + 7x3   = 24
                                 x1 + 4x3         5
     Now we will bust up the linear expressions on the left, first using vector addition,
                      "      # "        # "          # "         #
                        −7x1       −6x2       −12x3        −33
                         5x1 + 5x2 + 7x3               = 24
                          x1        0x2         4x3           5
    Now we can rewrite each of these vectors as a scalar multiple of a fixed vector,
where the scalar is one of the unknown variables, converting the left-hand side into
a linear combination
                        " #        " #       "     # "       #
                         −7         −6         −12       −33
                     x1 5 + x2 5 + x3 7 = 24
                           1         0          4         5
    We can now interpret the problem of solving the system of equations as determin-
ing values for the scalar multiples that make the vector equation true. In the analysis
of Archetype B, we were able to determine that it had only one solution. A quick
way to see this is to row-reduce the coefficient matrix to the 3 × 3 identity matrix
and apply Theorem NMRRI to determine that the coefficient matrix is nonsingular.
Then Theorem NMUS tells us that the system of equations has a unique solution.
This solution is
                 x1 = −3                  x2 = 5                 x3 = 2
   So, in the context of this example, we can express the fact that these values of
the variables are a solution by writing the linear combination,
                         " #         " #         "    # "       #
                           −7         −6          −12      −33
                    (−3) 5 + (5) 5 + (2) 7 = 24
                           1           0            4        5
    Furthermore, these are the only three scalars that will accomplish this equality,
since they come from a unique solution.
§L C               A First Course in Linear Algebra                                 67

   Notice how the three vectors in this example are the columns of the coefficient
matrix of the system of equations. This is our first hint of the important interplay
between the vectors that form the columns of a matrix, and the matrix itself. 4
   With any discussion of Archetype A or Archetype B we should be sure to contrast
with the other.
Example AALC Archetype A as a linear combination
As a vector equality, Archetype A can be written as
                             "               # " #
                               x1 − x2 + 2x3     1
                               2x1 + x2 + x3 = 8
                                  x1 + x2        5
   Now bust up the linear expressions on the left, first using vector addition,
                        "    # "       # "      # " #
                          x1       −x2      2x3         1
                          2x1 + x2 + x3 = 8
                          x1       x2       0x3         5
   Rewrite each of these vectors as a scalar multiple of a fixed vector, where the
scalar is one of the unknown variables, converting the left-hand side into a linear
combination
                           " #     " #         " # " #
                            1        −1         2      1
                        x1 2 + x2 1 + x3 1 = 8
                            1         1         0      5
   Row-reducing the augmented matrix for Archetype A leads to the conclusion
that the system is consistent and has free variables, hence infinitely many solutions.
So for example, the two solutions
                x1 = 2                 x2 = 3                 x3 = 1
                x1 = 3                 x2 = 2                 x3 = 0
can be used together to say that,
          " #       " #         " # " # " #   " #     " #
            1         −1          2  1   1     −1      2
       (2) 2 + (3) 1 + (1) 1 = 8 = (3) 2 + (2) 1 + (0) 1
            1          1          0  5   1      1      0
Ignore the middle of this equation, and move all the terms to the left-hand side,
        " #       " #         " #        " #         " #           " # " #
          1        −1          2          1            −1            2     0
     (2) 2 + (3) 1 + (1) 1 + (−3) 2 + (−2) 1 + (−0) 1 = 0
          1         1          0          1             1            0     0
Regrouping gives
                             " #     " #     " # " #
                              1       −1      2   0
                         (−1) 2 + (1) 1 + (1) 1 = 0
                              1       1       0   0
    Notice that these three vectors are the columns of the coefficient matrix for the
system of equations in Archetype A. This equality says there is a linear combination
of those columns that equals the vector of all zeros. Give it some thought, but this
says that
               x1 = −1                  x2 = 1                x3 = 1
is a nontrivial solution to the homogeneous system of equations with the coefficient
matrix for the original system in Archetype A. In particular, this demonstrates that
this coefficient matrix is singular.                                              4
   There is a lot going on in the last two examples. Come back to them in a while and
make some connections with the intervening material. For now, we will summarize
and explain some of this behavior with a theorem.
Theorem SLSLC Solutions to Linear Systems are Linear Combinations
Denote the columns of the m × n matrix A as the vectors A1 , A2 , A3 , . . . , An .
Then x ∈ Cn is a solution to the linear system of equations LS(A, b) if and only if
68                                  Ro b e rt B e e z e r                                   §L C

b equals the linear combination of the columns of A formed with the entries of x,
                      [x]1 A1 + [x]2 A2 + [x]3 A3 + · · · + [x]n An = b




Proof. The proof of this theorem is as much about a change in notation as it is
about making logical deductions. Write the system of equations LS(A, b) as
                           a11 x1 + a12 x2 + a13 x3 + · · · + a1n xn = b1
                           a21 x1 + a22 x2 + a23 x3 + · · · + a2n xn = b2
                           a31 x1 + a32 x2 + a33 x3 + · · · + a3n xn = b3
                                                                    ..
                                                                     .
                       am1 x1 + am2 x2 + am3 x3 + · · · + amn xn = bm

     Notice then that the entry of the coefficient matrix A in row i and column j has
 two names: aij as the coefficient of xj in equation i of the system and [Aj ]i as the
 i-th entry of the column vector in column j of the coefficient matrix A. Likewise,
 entry i of b has two names: bi from the linear system and [b]i as an entry of a
vector. Our theorem is an equivalence (Proof Technique E) so we need to prove both
“directions.”
    (⇐) Suppose we have the vector equality between b and the linear combination
of the columns of A. Then for 1 ≤ i ≤ m,
     bi = [b]i                                                             Definition CV
        = [[x]1 A1 + [x]2 A2 + [x]3 A3 + · · · + [x]n An ]i                Hypothesis
        = [[x]1 A1 ]i + [[x]2 A2 ]i + [[x]3 A3 ]i + · · · + [[x]n An ]i    Definition CVA
        = [x]1 [A1 ]i + [x]2 [A2 ]i + [x]3 [A3 ]i + · · · + [x]n [An ]i    Definition CVSM
        = [x]1 ai1 + [x]2 ai2 + [x]3 ai3 + · · · + [x]n ain                Definition CV
        = ai1 [x]1 + ai2 [x]2 + ai3 [x]3 + · · · + ain [x]n                Property CMCN

   This says that the entries of x form a solution to equation i of LS(A, b) for all
1 ≤ i ≤ m, in other words, x is a solution to LS(A, b).
    (⇒) Suppose now that x is a solution to the linear system LS(A, b). Then for
all 1 ≤ i ≤ m,
     [b]i = bi                                                             Definition CV
         = ai1 [x]1 + ai2 [x]2 + ai3 [x]3 + · · · + ain [x]n               Hypothesis
         = [x]1 ai1 + [x]2 ai2 + [x]3 ai3 + · · · + [x]n ain               Property CMCN
         = [x]1 [A1 ]i + [x]2 [A2 ]i + [x]3 [A3 ]i + · · · + [x]n [An ]i   Definition CV
         = [[x]1 A1 ]i + [[x]2 A2 ]i + [[x]3 A3 ]i + · · · + [[x]n An ]i   Definition CVSM
         = [[x]1 A1 + [x]2 A2 + [x]3 A3 + · · · + [x]n An ]i               Definition CVA
So the entries of the vector b, and the entries of the vector that is the linear
combination of the columns of A, agree for all 1 ≤ i ≤ m. By Definition CVE we see
that the two vectors are equal, as desired.                                     




    In other words, this theorem tells us that solutions to systems of equations are
linear combinations of the n column vectors of the coefficient matrix (Aj ) which
yield the constant vector b. Or said another way, a solution to a system of equations
LS(A, b) is an answer to the question “How can I form the vector b as a linear
combination of the columns of A?” Look through the archetypes that are systems
of equations and examine a few of the advertised solutions. In each case use the
solution to form a linear combination of the columns of the coefficient matrix and
verify that the result equals the constant vector (see Exercise LC.C21).
§L C               A First Course in Linear Algebra                                69

Subsection VFSS
Vector Form of Solution Sets

We have written solutions to systems of equations as column vectors. For example
Archetype B has the solution x1 = −3, x2 = 5, x3 = 2 which we write as
                                   " # " #
                                     x1      −3
                                x = x2 = 5
                                     x3       2

    Now, we will use column vectors and linear combinations to express all of the
solutions to a linear system of equations in a compact and understandable way.
First, here are two examples that will motivate our next theorem. This is a valuable
technique, almost the equal of row-reducing a matrix, so be sure you get comfortable
with it over the course of this section.

Example VFSAD Vector form of solutions for Archetype D
Archetype D is a linear system of 3 equations in 4 variables. Row-reducing the
augmented matrix yields
                                               
                               1   0 3 −2 4
                             0    1 1 −3 0
                               0   0 0 0 0
and we see r = 2 pivot columns. Also, D = {1, 2} so the dependent variables are
then x1 and x2 . F = {3, 4, 5} so the two free variables are x3 and x4 . We will
express a generic solution for the system by two slightly different methods, though
both arrive at the same conclusion.
    First, we will decompose (Proof Technique DC) a solution vector. Rearranging
each equation represented in the row-reduced form of the augmented matrix by
solving for the dependent variable in each row yields the vector equality,
                                 
             x1       4 − 3x3 + 2x4
            x2   −x3 + 3x4 
            x  =        x3       
              3
             x4            x4
Now we will use the definitions of column vector addition and scalar multiplication
to express this vector as a linear combination,
                                      
                     4       −3x3      2x4
                    0  −x3  3x4 
                 = +             +                  Definition CVA
                     0         x3   0 
                     0         0        x4
                                      
                     4          −3         2
                    0       −1       3
                 =   + x3   + x4                 Definition CVSM
                     0           1         0
                     0           0         1


   We will develop the same linear combination a bit quicker, using three steps.
While the method above is instructive, the method below will be our preferred
approach.
    Step 1. Write the vector of variables as a fixed vector, plus a linear combination
of n − r vectors, using the free variables as the scalars.
                                                     
                               x1
                             x                       
                         x =  2  =   + x3   + x4  
                               x3
                               x4

   Step 2. Use 0’s and 1’s to ensure equality for the entries of the vectors with
70                              Ro b e rt B e e z e r                              §L C

indices in F (corresponding to the free variables).
                                                
                             x1
                           x                    
                       x =  2  =   + x3   + x4  
                             x3       0         1     0
                             x4       0         0     1

    Step 3. For each dependent variable, use the augmented matrix to formulate an
equation expressing the dependent variable as a constant plus multiples of the free
variables. Convert this equation into entries of the vectors that ensure equality for
each dependent variable, one at a time.
                                                                     
                                              x1       4        −3          2
                                            x2                       
      x1 = 4 − 3x3 + 2x4       ⇒       x =   =   + x3   + x4  
                                              x3       0         1          0
                                              x4       0         0          1
                                                                     
                                              x1       4        −3          2
                                            x2  0          −1       3
      x2 = 0 − 1x3 + 3x4       ⇒       x =   =   + x3   + x4  
                                              x3       0         1          0
                                              x4       0         0          1

   This final form of a typical solution is especially pleasing and useful. For example,
we can build solutions quickly by choosing values for our free variables, and then
compute a linear combination. Such as
                                                                          
                                    x1        4          −3            2       −12
                                   x  0            −1          3 −17
   x3 = 2, x4 = −5 ⇒ x =  2  =   + (2)   + (−5)   = 
                                    x3        0           1            0         2 
                                    x4        0           0            1        −5
or,
                                                              
                                        x1    4         −3        2     7
                                       x2  0       −1      3 8
       x3 = 1, x4 = 3    ⇒         x =   =   + (1)   + (3)   =  
                                        x3    0          1        0     1
                                        x4    0          0        1     3

   You will find the second solution listed in the write-up for Archetype D, and you
might check the first solution by substituting it back into the original equations.

    While this form is useful for quickly creating solutions, it is even better because
it tells us exactly what every solution looks like. We know the solution set is infinite,
                                                                                  
                                                                                   −3
                                                                                 −1
which is pretty big, but now we can say that a solution is some multiple of  
                                                                                    1
                                                                                    0
                                               
                      2                           4
                     3                        0
plus a multiple of   plus the fixed vector  . Period. So it only takes us three
                      0                           0
                      1                           0
vectors to describe the entire infinite solution set, provided we also agree on how to
combine the three vectors into a linear combination.                                  4

      This is such an important and fundamental technique, we will do another example.

Example VFS Vector form of solutions
Consider a linear system of m = 5 equations in n = 7 variables, having the augmented
matrix A.
                                                                
                           2    1 −1 −2 2 1 5                21
                        1      1 −3 1 1 1 2                −5 
                                                                
                    A= 1       2 −8 5 1 1 −6 −15
                        3      3 −9 3 6 5 2 −24
                          −2 −1 1         2 1 1 −9 −30
§L C               A First Course in Linear Algebra                                71

   Row-reducing we obtain    the matrix
                                                              
                      1       0     2     −3   0   0     9  15
                    0        1    −5      4   0   0    −8 −10
                                                              
                B= 0        0     0      0   1   0    −6 11 
                                                              
                      0       0     0      0   0   1    7 −21
                      0       0     0      0   0   0     0  0
and we see r = 4 pivot columns. Also, D = {1, 2, 5, 6} so the dependent variables
are then x1 , x2 , x5 , and x6 . F = {3, 4, 7, 8} so the n − r = 3 free variables are
x3 , x4 and x7 . We will express a generic solution for the system by two different
methods: both a decomposition and a construction.
     First, we will decompose (Proof Technique DC) a solution vector. Rearranging
each equation represented in the row-reduced form of the augmented matrix by
solving for the dependent variable in each row yields the vector equality,
                                     
        x1        15 − 2x3 + 3x4 − 9x7
      x2  −10 + 5x3 − 4x4 + 8x7 
                                     
      x3                 x3          
                                     
      x4  =              x4          
      x               11 + 6x7       
       5                             
      x6             −21 − 7x7       
        x7                  x7
Now we will use the definitions of column vector addition and scalar multiplication
to decompose this generic solution vector as a linear combination,
                                              
               15        −2x3         3x4       −9x7
             −10  5x3  −4x4   8x7 
                                              
              0   x3   0   0 
                                              
          =  0  +  0  +  x4  +  0                   Definition CVA
              11   0   0   6x 
                                        7 
             −21  0   0  −7x7 
                0          0           0         x7
                                             
               15           −2           3         −9
             −10        5          −4       8
                                             
              0         1          0        0
                                             
          =  0  + x3  0  + x4  1  + x7  0           Definition CVSM
              11        0          0        6
                                             
             −21        0          0        −7
                0            0           0          1
We will now develop the same linear combination a bit quicker, using three steps.
While the method above is instructive, the method below will be our preferred
approach.
    Step 1. Write the vector of variables as a fixed vector, plus a linear combination
of n − r vectors, using the free variables as the scalars.
                                                       
                          x1
                        x2                             
                                                       
                        x3                             
                                                       
                    x = x4  =   + x3   + x4   + x7  
                        x                              
                         5                             
                        x6                             
                          x7
   Step 2. Use 0’s and 1’s to ensure equality for the entries of the vectors with
indices in F (corresponding to the free variables).
                                                    
                        x1
                       x2                           
                                                    
                       x3  0         1        0    0
                                                    
                   x = x4  = 0 + x3 0 + x4 1 + x7 0
                       x                            
                        5                           
                       x6                           
                        x7      0          0         0       1
72                             Ro b e rt B e e z e r                              §L C

    Step 3. For each dependent variable, use the augmented matrix to formulate an
equation expressing the dependent variable as a constant plus multiples of the free
variables. Convert this equation into entries of the vectors that ensure equality for
each dependent variable, one at a time.
              x1 = 15 − 2x3 + 3x4 − 9x7 ⇒
                                                    
                      x1      15        −2         3        −9
                    x2                              
                                                    
                    x3   0         1        0      0
                                                    
                x = x4  =  0  + x3  0  + x4 1 + x7  0 
                    x                               
                     5                              
                    x6                              
                      x7       0         0         0         1
              x2 = −10 + 5x3 − 4x4 + 8x7 ⇒
                                                      
                     x1       15         −2         3         −9
                    x2  −10         5       −4       8
                                                      
                    x3   0          1       0        0
                                                      
                x = x4  =  0  + x3  0  + x4  1  + x7  0 
                    x                                 
                     5                                
                    x6                                
                     x7       0           0         0          1
              x5 = 11 + 6x7 ⇒
                                                      
                      x1      15        −2          3         −9
                    x2  −10        5        −4       8
                                                      
                    x3   0         1        0        0
                                                      
                x = x4  =  0  + x3  0  + x4  1  + x7  0 
                    x   11         0        0        6
                     5                                
                    x6                                
                      x7      0          0          0          1
              x6 = −21 − 7x7 ⇒
                                                      
                     x1       15        −2          3         −9
                    x2  −10        5        −4       8
                                                      
                    x3   0         1        0        0
                                                      
                x = x4  =  0  + x3  0  + x4  1  + x7  0 
                    x   11         0        0        6
                     5                                
                    x6  −21        0        0        −7
                     x7       0          0          0          1

   This final form of a typical solution is especially pleasing and useful. For example,
we can build solutions quickly by choosing values for our free variables, and then
compute a linear combination. For example
       x3 = 2, x4 = −4, x7 = 3    ⇒
                                                           
              x1      15         −2            3          −9      −28
            x2  −10         5          −4         8   40 
                                                           
            x3   0          1          0         0  2 
                                                           
        x = x4  =  0  + (2)  0  + (−4)  1  + (3)  0  =  −4 
            x   11          0          0          6   29 
             5                                             
            x6  −21         0          0         −7 −42
              x7       0          0            0           1       3
or perhaps,
        x3 = 5, x4 = 2, x7 = 1   ⇒
                                                          
               x1       15        −2           3          −9       2
             x2  −10         5         −4         8   15 
                                                          
             x3   0          1         0         0  5 
                                                          
         x = x4  =  0  + (5)  0  + (2)  1  + (1)  0  =  2 
             x   11          0         0          6   17 
              5                                           
             x6  −21         0         0         −7 −28
               x7        0         0           0           1       1
§L C                A First Course in Linear Algebra                                        73

or even,
        x3 = 0, x4 = 0, x7 = 0   ⇒
                                                           
               x1       15        −2           3          −9       15
             x2  −10         5         −4         8  −10
                                                           
             x3   0          1         0         0  0 
                                                           
         x = x4  =  0  + (0)  0  + (0)  1  + (0)  0  =  0 
             x   11          0         0          6   11 
              5                                            
             x6  −21         0         0         −7 −21
               x7        0         0           0           1        0
    So we can compactly express all of the solutions to this linear system with just 4
fixed vectors, provided we agree how to combine them in a linear combinations to
create solution vectors.
    Suppose you were told that the vector w below was a solution to this system of
equations. Could you turn the problem around and write w as a linear combination
of the four vectors c, u1 , u2 , u3 ? (See Exercise LC.M11.)
                                                                    
            100              15               −2              3              −9
          −75            −10             5            −4           8
                                                                    
           7              0              1            0            0
                                                                    
     w= 9          c= 0             u1 =  0      u2 =  1      u3 =  0 
          −37             11             0            0            6
                                                                    
           35            −21             0            0            −7
            −8                0                0              0               1
    Did you think a few weeks ago that you could so quickly and easily list all the
solutions to a linear system of 5 equations in 7 variables?
    We will now formalize the last two (important) examples as a theorem. The
statement of this theorem is a bit scary, and the proof is scarier. For now, be sure to
convice yourself, by working through the examples and exercises, that the statement
just describes the procedure of the two immediately previous examples.
Theorem VFSLS Vector Form of Solutions to Linear Systems
Suppose that [ A | b] is the augmented matrix for a consistent linear system LS(A, b)
of m equations in n variables. Let B be a row-equivalent m × (n + 1) matrix
in reduced row-echelon form. Suppose that B has r pivot columns, with indices
D = {d1 , d2 , d3 , . . . , dr }, while the n − r non-pivot columns have indices in F =
{f1 , f2 , f3 , . . . , fn−r , n + 1}. Define vectors c, uj , 1 ≤ j ≤ n − r of size n by
                                       (
                                         0          if i ∈ F
                                [c]i =
                                         [B]k,n+1 if i ∈ D, i = dk
                                       
                                       
                                       1            if i ∈ F , i = fj
                               [uj ]i = 0            if i ∈ F , i 6= fj .
                                       
                                       − [B]
                                               k,fj  if i ∈ D, i = dk
   Then the set of solutions to the system of equations LS(A, b) is
   S = { c + α1 u1 + α2 u2 + α3 u3 + · · · + αn−r un−r | α1 , α2 , α3 , . . . , αn−r ∈ C}

Proof. First, LS(A, b) is equivalent to the linear system of equations that has the
matrix B as its augmented matrix (Theorem REMES), so we need only show that S
is the solution set for the system with B as its augmented matrix. The conclusion of
this theorem is that the solution set is equal to the set S, so we will apply Definition
SE.
    We begin by showing that every element of S is indeed a solution to the system.
Let α1 , α2 , α3 , . . . , αn−r be one choice of the scalars used to describe elements of
S. So an arbitrary element of S, which we will consider as a proposed solution is
                   x = c + α1 u1 + α2 u2 + α3 u3 + · · · + αn−r un−r
   When r + 1 ≤ ` ≤ m, row ` of the matrix B is a zero row, so the equation
represented by that row is always true, no matter which solution vector we propose.
So concentrate on rows representing equations 1 ≤ ` ≤ r. We evaluate equation ` of
74                                  Ro b e rt B e e z e r                                      §L C

the system represented by B with the proposed solution vector x and refer to the
value of the left-hand side of the equation as β` ,
                β` = [B]`1 [x]1 + [B]`2 [x]2 + [B]`3 [x]3 + · · · + [B]`n [x]n
     Since [B]`di = 0 for all 1 ≤ i ≤ r, except that [B]`d` = 1, we see that β` simplifies
to
      β` = [x]d` + [B]`f1 [x]f1 + [B]`f2 [x]f2 + [B]`f3 [x]f3 + · · · + [B]`fn−r [x]fn−r
     Notice that for 1 ≤ i ≤ n − r
       [x]fi = [c]fi + α1 [u1 ]fi + α2 [u2 ]fi + · · · + αi [ui ]fi + · · · + αn−r [un−r ]fi
             = 0 + α1 (0) + α2 (0) + · · · + αi (1) + · · · + αn−r (0)
             = αi
     So β` simplifies further, and we expand the first term
     β` = [x]d` + [B]`f1 α1 + [B]`f2 α2 + [B]`f3 α3 + · · · + [B]`fn−r αn−r
        = [c + α1 u1 + α2 u2 + α3 u3 + · · · + αn−r un−r ]d` +
            [B]`f1 α1 + [B]`f2 α2 + [B]`f3 α3 + · · · + [B]`fn−r αn−r
        = [c]d` + α1 [u1 ]d` + α2 [u2 ]d` + α3 [u3 ]d` + · · · + αn−r [un−r ]d` +
            [B]`f1 α1 + [B]`f2 α2 + [B]`f3 α3 + · · · + [B]`fn−r αn−r
        = [B]`,n+1 +
            α1 (− [B]`f1 ) + α2 (− [B]`f2 ) + α3 (− [B]`f3 ) + · · · + αn−r (− [B]`fn−r )+
            [B]`f1 α1 + [B]`f2 α2 + [B]`f3 α3 + · · · + [B]`fn−r αn−r
        = [B]`,n+1
   So β` began as the left-hand side of equation ` of the system represented by B
and we now know it equals [B]`,n+1 , the constant term for equation ` of this system.
So the arbitrarily chosen vector from S makes every equation of the system true,
and therefore is a solution to the system. So all the elements of S are solutions to
the system.
   For the second half of the proof, assume that x is a solution vector for the system
having B as its augmented matrix. For convenience and clarity, denote the entries of
x by xi , in other words, xi = [x]i . We desire to show that this solution vector is also
an element of the set S. Begin with the observation that a solution vector’s entries
makes equation ` of the system true for all 1 ≤ ` ≤ m,
               [B]`,1 x1 + [B]`,2 x2 + [B]`,3 x3 + · · · + [B]`,n xn = [B]`,n+1
    When ` ≤ r, the pivot columns of B have zero entries in row ` with the exception
of column d` , which will contain a 1. So for 1 ≤ ` ≤ r, equation ` simplifies to
     1xd` + [B]`,f1 xf1 + [B]`,f2 xf2 + [B]`,f3 xf3 + · · · + [B]`,fn−r xfn−r = [B]`,n+1
     This allows us to write,
     [x]d` = xd`
          = [B]`,n+1 − [B]`,f1 xf1 − [B]`,f2 xf2 − [B]`,f3 xf3 − · · · − [B]`,fn−r xfn−r
          = [c]d` + xf1 [u1 ]d` + xf2 [u2 ]d` + xf3 [u3 ]d` + · · · + xfn−r [un−r ]d`
                                                                   
          = c + xf1 u1 + xf2 u2 + xf3 u3 + · · · + xfn−r un−r d
                                                                       `

    This tells us that the entries of the solution vector x corresponding to dependent
variables (indices in D), are equal to those of a vector in the set S. We still need to
check the other entries of the solution vector x corresponding to the free variables
(indices in F ) to see if they are equal to the entries of the same vector in the set S.
To this end, suppose i ∈ F and i = fj . Then
 [x]i = xi = xfj
       = 0 + 0xf1 + 0xf2 + 0xf3 + · · · + 0xfj−1 + 1xfj + 0xfj+1 + · · · + 0xfn−r
       = [c]i + xf1 [u1 ]i + xf2 [u2 ]i + xf3 [u3 ]i + · · · + xfj [uj ]i + · · · + xfn−r [un−r ]i
§L C                 A First Course in Linear Algebra                                    75
                                                
       = c + xf1 u1 + xf2 u2 + · · · + xfn−r un−r i
    So entries of x and c + xf1 u1 + xf2 u2 + · · · + xfn−r un−r are equal and therefore
by Definition CVE they are equal vectors. Since xf1 , xf2 , xf3 , . . . , xfn−r are scalars,
this shows us that x qualifies for membership in S. So the set S contains all of the
solutions to the system.                                                                 


    Note that both halves of the proof of Theorem VFSLS indicate that αi = [x]fi .
In other words, the arbitrary scalars, αi , in the description of the set S actually have
more meaning — they are the values of the free variables [x]fi , 1 ≤ i ≤ n − r. So we
will often exploit this observation in our descriptions of solution sets.
    Theorem VFSLS formalizes what happened in the three steps of Example VFSAD.
The theorem will be useful in proving other theorems, and it it is useful since it tells
us an exact procedure for simply describing an infinite solution set. We could program
a computer to implement it, once we have the augmented matrix row-reduced and
have checked that the system is consistent. By Knuth’s definition, this completes
our conversion of linear equation solving from art into science. Notice that it even
applies (but is overkill) in the case of a unique solution. However, as a practical
matter, I prefer the three-step process of Example VFSAD when I need to describe
an infinite solution set. So let us practice some more, but with a bigger example.
Example VFSAI Vector form           of solutions for Archetype I
Archetype I is a linear system of   m = 4 equations in n = 7 variables. Row-reducing
the augmented matrix yields
                                                               
                          1 4        0     0   2 1       −3    4
                        0 0         1     0   1 −3       5    2
                                                               
                        0 0         0     1   2 −6       6    1
                          0 0        0     0   0 0        0    0
and we see r = 3 pivot columns, with indices D = {1, 3, 4}. So the r = 3 dependent
variables are x1 , x3 , x4 . The non-pivot columns have indices in F = {2, 5, 6, 7, 8},
so the n − r = 4 free variables are x2 , x5 , x6 , x7 .
    Step 1. Write the vector of variables (x) as a fixed vector (c), plus a linear
combination of n − r = 4 vectors (u1 , u2 , u3 , u4 ), using the free variables as the
scalars.
                                                          
                       x1
                     x2                                   
                                                          
                     x3                                   
                                                          
               x = x4  =   + x2   + x5   + x6   + x7  
                     x                                    
                      5                                   
                     x6                                   
                       x7
    Step 2. For each free variable, use 0’s and 1’s to ensure equality for the corre-
sponding entry of the vectors. Take note of the pattern of 0’s and 1’s at this stage,
because this is the best look you will have at it. We will state an important theorem
in the next section and the proof will essentially rely on this observation.
                                                            
                    x1
                   x2  0          1       0        0       0
                                                            
                   x3                                       
                                                            
              x = x4  =   + x2   + x5   + x6   + x7  
                   x  0           0       1        0       0
                    5                                       
                   x6  0          0       0        1       0
                    x7       0          0          0         0        1
    Step 3. For each dependent variable, use the augmented matrix to formulate an
equation expressing the dependent variable as a constant plus multiples of the free
variables. Convert this equation into entries of the vectors that ensure equality for
each dependent variable, one at a time.
           x1 = 4 − 4x2 − 2x5 − 1x6 + 3x7       ⇒
76                             Ro b e rt B e e z e r                              §L C
                                                 
               x1      4        −4       −2       −1       3
              x2  0        1      0      0      0
                                                 
              x3                                 
                                                 
          x = x4  =   + x2   + x5   + x6   + x7  
              x  0         0      1      0      0
               5                                 
              x6  0        0      0      1      0
               x7      0         0       0        0        1

         x3 = 2 + 0x2 − x5 + 3x6 − 5x7 ⇒
                                                 
                x1      4        −4      −2       −1        3
              x2  0        1      0      0      0
                                                 
              x3  2        0      −1     3      −5
                                                 
          x = x4  =   + x2   + x5   + x6   + x7  
              x  0         0      1      0      0
               5                                 
              x6  0        0      0      1      0
                x7      0         0      0        0        1

         x4 = 1 + 0x2 − 2x5 + 6x6 − 6x7 ⇒
                                                      
                x1      4        −4        −2        −1          3
              x2  0        1        0       0        0
                                                      
              x3  2        0        −1      3        −5
                                                      
          x = x4  = 1 + x2  0  + x5 −2 + x6  6  + x7 −6
              x  0         0        1       0        0
               5                                      
              x6  0        0        0       1        0
                x7      0         0        0          0         1


    We can now use this final expression to quickly build solutions to the system.
You might try to recreate each of the solutions listed in the write-up for Archetype
I. (Hint: look at the values of the free variables in each solution, and notice that the
vector c has 0’s in these locations.)
    Even better, we have a description of the infinite solution set, based on just 5
vectors, which we combine in linear combinations to produce solutions.
    Whenever we discuss Archetype I you know that is your cue to go work through
Archetype J by yourself. Remember to take note of the 0/1 pattern at the conclusion
of Step 2. Have fun — we won’t go anywhere while you’re away.                         4
   This technique is so important, that we will do one more example. However, an
important distinction will be that this system is homogeneous.
Example VFSAL Vector form of solutions for Archetype L
Archetype L is presented simply as the 5 × 5 matrix
                                                     
                              −2 −1 −2 −4           4
                            −6 −5 −4 −4            6 
                                                     
                        L =  10    7     7   10 −13
                            −7 −5 −6 −9 10 
                              −4 −3 −4 −6           6
   We will employ this matrix here as the coefficient matrix of a homogeneous
system and reference this matrix as L. So we are solving the homogeneous system
LS(L, 0) having m = 5 equations in n = 5 variables. If we built the augmented
matrix, we would add a sixth column to L containing all zeros. As we did row
operations, this sixth column would remain all zeros. So instead we will row-reduce
the coefficient matrix, and mentally remember the missing sixth column of zeros.
This row-reduced matrix is
                                                    
                               1   0    0    1 −2
                             0    1    0 −2 2 
                                                    
                                                    
                             0    0    1    2 −1
                             0    0    0    0     0
                               0   0    0    0     0
§L C                A First Course in Linear Algebra                                  77

and we see r = 3 pivot columns, with indices D = {1, 2, 3}. So the r = 3 dependent
variables are x1 , x2 , x3 . The non-pivot columns have indices F = {4, 5}, so the
n − r = 2 free variables are x4 , x5 . Notice that if we had included the all-zero vector
of constants to form the augmented matrix for the system, then the index 6 would
have appeared in the set F , and subsequently would have been ignored when listing
the free variables. So nothing is lost by not creating an augmented matrix (in the
case of a homogenous system). And maybe it is an improvement, since now every
index in F can be used to reference a variable of the linear system.

   Step 1. Write the vector of variables (x) as a fixed vector (c), plus a linear
combination of n − r = 2 vectors (u1 , u2 ), using the free variables as the scalars.
                                                   
                             x1
                           x2                      
                                                   
                       x = x3  =   + x4   + x5  
                           x                       
                              4
                             x5

   Step 2. For each free variable, use 0’s and 1’s to ensure equality for the corre-
sponding entry of the vectors. Take note of the pattern of 0’s and 1’s at this stage,
even if it is not as illuminating as in other examples.
                                                   
                               x1
                              x2                   
                                                   
                          x = x3  =   + x4   + x5  
                              x  0          1     0
                                 4
                               x5       0        0       1

    Step 3. For each dependent variable, use the augmented matrix to formulate an
equation expressing the dependent variable as a constant plus multiples of the free
variables. Do not forget about the “missing” sixth column being full of zeros. Convert
this equation into entries of the vectors that ensure equality for each dependent
variable, one at a time.
                                                                      
                                             x1       0        −1           2
                                           x2                         
                                                                      
      x1 = 0 − 1x4 + 2x5       ⇒       x = x3  =   + x4   + x5  
                                           x  0          1           0
                                               4
                                             x5       0         0           1
                                                                      
                                             x1       0        −1            2
                                           x2  0         2           −2
                                                                      
      x2 = 0 + 2x4 − 2x5       ⇒       x = x3  =   + x4   + x5  
                                           x  0          1           0
                                               4
                                             x5       0         0            1
                                                                      
                                             x1       0        −1            2
                                           x2  0         2           −2
                                                                      
      x3 = 0 − 2x4 + 1x5       ⇒       x = x3  = 0 + x4 −2 + x5  1 
                                           x  0          1           0
                                               4
                                             x5       0         0            1
The vector c will always have 0’s in the entries corresponding to free variables.
However, since we are solving a homogeneous system, the row-reduced augmented
matrix has zeros in column n + 1 = 6, and hence all the entries of c are zero. So we
can write
                                                            
                 x1             −1          2          −1            2
               x2           2         −2        2          −2
                                                            
           x = x3  = 0 + x4 −2 + x5  1  = x4 −2 + x5  1 
               x            1         0         1          0
                   4
                 x5              0          1           0            1

    It will always happen that the solutions to a homogeneous system has c = 0
(even in the case of a unique solution?). So our expression for the solutions is a
bit more pleasing. In this example it says that the solutions are all possible linear
78                               Ro b e rt B e e z e r                                §L C
                                                        
                                          −1               2
                                        2              −2
                                                        
combinations of the two vectors u1 = −2 and u2 =  1 , with no mention of
                                        1              0
                                           0               1
any fixed vector entering into the linear combination.
    This observation will motivate our next section and the main definition of that
section, and after that we will conclude the section by formalizing this situation.4

Subsection PSHS
Particular Solutions, Homogeneous Solutions
The next theorem tells us that in order to find all of the solutions to a linear system
of equations, it is sufficient to find just one solution, and then find all of the solutions
to the corresponding homogeneous system. This explains part of our interest in the
null space, the set of all solutions to a homogeneous system.
Theorem PSPHS Particular Solution Plus Homogeneous Solutions
Suppose that w is one solution to the linear system of equations LS(A, b). Then y
is a solution to LS(A, b) if and only if y = w + z for some vector z ∈ N (A).

Proof. Let A1 , A2 , A3 , . . . , An be the columns of the coefficient matrix A.
   (⇐) Suppose y = w + z and z ∈ N (A). Then
     b = [w]1 A1 + [w]2 A2 + [w]3 A3 + · · · + [w]n An              Theorem SLSLC
       = [w]1 A1 + [w]2 A2 + [w]3 A3 + · · · + [w]n An + 0          Property ZC
       = [w]1 A1 + [w]2 A2 + [w]3 A3 + · · · + [w]n An              Theorem SLSLC
           + [z]1 A1 + [z]2 A2 + [z]3 A3 + · · · + [z]n An
       = ([w]1 + [z]1 ) A1 + ([w]2 + [z]2 ) A2                      Theorem VSPCV
           ([w]3 + [z]3 ) A3 + · · · + ([w]n + [z]n ) An
       = + [w + z]1 A1 + [w + z]2 A2 + · · · + [w + z]n An          Definition CVA
       = [y]1 A1 + [y]2 A2 + [y]3 A3 + · · · + [y]n An              Definition of y
Applying Theorem SLSLC we see that the vector y is a solution to LS(A, b).
  (⇒) Suppose y is a solution to LS(A, b). Then
     0=b−b
      = [y]1 A1 + [y]2 A2 + [y]3 A3 + · · · + [y]n An               Theorem SLSLC
          − ([w]1 A1 + [w]2 A2 + [w]3 A3 + · · · + [w]n An )
      = ([y]1 − [w]1 ) A1 + ([y]2 − [w]2 ) A2                       Theorem VSPCV
          + ([y]3 − [w]3 ) A3 + · · · + ([y]n − [w]n ) An
      = [y − w]1 A1 + [y − w]2 A2                                   Definition CVA
          + [y − w]3 A3 + · · · + [y − w]n An
By Theorem SLSLC we see that the vector y − w is a solution to the homogeneous
system LS(A, 0) and by Definition NSM, y − w ∈ N (A). In other words, y − w = z
for some vector z ∈ N (A). Rewritten, this is y = w + z, as desired.         

    After proving Theorem NMUS we commented (insufficiently) on the negation of
one half of the theorem. Nonsingular coefficient matrices lead to unique solutions for
every choice of the vector of constants. What does this say about singular matrices?
A singular matrix A has a nontrivial null space (Theorem NMTNS). For a given
vector of constants, b, the system LS(A, b) could be inconsistent, meaning there
are no solutions. But if there is at least one solution (w), then Theorem PSPHS tells
us there will be infinitely many solutions because of the role of the infinite null space
for a singular matrix. So a system of equations with a singular coefficient matrix
never has a unique solution. Notice that this is the contrapositive of the statement
in Exercise NM.T31. With a singular coefficient matrix, either there are no solutions,
or infinitely many solutions, depending on the choice of the vector of constants (b).
§L C               A First Course in Linear Algebra                                79

Example PSHS Particular solutions, homogeneous solutions, Archetype D
Archetype D is a consistent system of equations with a nontrivial null space. Let
A denote the coefficient matrix of this system. The write-up for this system begins
with three solutions,
                                                               
                     0                       4                     7
                   1                     0                    8
              y1 =                 y2 =                 y3 =  
                     2                       0                     1
                     1                       0                     3

    We will choose to have y1 play the role of w in the statement of Theorem PSPHS,
any one of the three vectors listed here (or others) could have been chosen. To
illustrate the theorem, we should be able to write each of these three solutions as
the vector w plus a solution to the corresponding homogeneous system of equations.
Since 0 is always a solution to a homogeneous system we can easily write
                                    y1 = w = w + 0.

    The vectors y2 and y3 will require a bit more effort. Solutions to the homogeneous
system LS(A, 0) are exactly the elements of the null space of the coefficient matrix,
which by an application of Theorem VFSLS is
                                                            
                                    −3          2               
                                                                
                                   −1        3
                    N (A) = x3   + x4   x3 , x4 ∈ C
                             
                                     1          0               
                                                                 
                                      0          1

   Then
                                           
             4     0      4    0           −3         2
            0 1 −1 1             −1       3
       y2 =   =   +   =   + (−2)   + (−1)   = w + z2
             0     2     −2    2            1         0
             0     1     −1    1            0         1
where
                                               
                             4          −3         2
                            −1       −1       3
                       z2 =   = (−2)   + (−1)  
                             −2          1         0
                             −1          0         1
is obviously a solution of the homogeneous system since it is written as a linear
combination of the vectors describing the null space of the coefficient matrix (or as
a check, you could just evaluate the equations in the homogeneous system with z2 ).
   Again
                                         
              7     0      7    0           −3      2
             8 1  7  1            −1    3
        y3 =   =   +   =   + (−1)   + 2   = w + z3
              1     2     −1    2            1      0
              3     1      2    1            0      1
where
                                              
                               7          −3      2
                              7        −1    3
                         z3 =   = (−1)   + 2  
                               −1         1       0
                                2         0       1
is obviously a solution of the homogeneous system since it is written as a linear
combination of the vectors describing the null space of the coefficient matrix (or as
a check, you could just evaluate the equations in the homogeneous system with z2 ).
   Here is another view of this theorem, in the context of this example. Grab two
new solutions of the original system of equations, say
                                                        
                            11                            −4
                          0                            2
                    y4 =                          y5 =  
                           −3                              4
                           −1                              2
80                                Ro b e rt B e e z e r                               §L C

and form their difference,
                                          
                                    11    −4    15
                                    0   2  −2
                               u =   −   =  .
                                    −3     4    −7
                                    −1     2    −3
    It is no accident that u is a solution to the homogeneous system (check this!). In
other words, the difference between any two solutions to a linear system of equations
is an element of the null space of the coefficient matrix. This is an equivalent way to
state Theorem PSPHS. (See Exercise MM.T50).                                          4
   The ideas of this subsection will appear again in Chapter LT when we discuss
pre-images of linear transformations (Definition PI).

Reading Questions

1. Earlier, a reading question asked you to solve the system of equations
                                        2x1 + 3x2 − x3 = 0
                                         x1 + 2x2 + x3 = 3
                                        x1 + 3x2 + 3x3 = 7
     Use a linear combination to rewrite this system of equations as a vector equality.
2. Find a linear combination of the vectors
                                          
                                     1      2     −1 
                                S =  3  , 0 ,  3 
                                     −1     4     −5 
                           
                           1
   that equals the vector −9.
                           11
3. The matrix below is the augmented matrix of a system of equations, row-reduced to
   reduced row-echelon form. Write the vector form of the solutions to the system.
                                                          
                               1 3 0          6    0     9
                              0 0 1 −2 0 −8
                                                          

                               0 0 0          0    1     3


Exercises
C21† Consider each archetype that is a system of equations. For individual solutions
listed (both for the original system and the corresponding homogeneous system) express
the vector of constants as a linear combination of the columns of the coefficient matrix, as
guaranteed by Theorem SLSLC. Verify this equality by computing the linear combination.
For systems with no solutions, recognize that it is then impossible to write the vector of
constants as a linear combination of the columns of the coefficient matrix. Note too, for
homogeneous systems, that the solutions give rise to linear combinations that equal the
zero vector.

Archetype A, Archetype B, Archetype C, Archetype D, Archetype E, Archetype F, Archetype
G, Archetype H, Archetype I, Archetype J
C22† Consider each archetype that is a system of equations. Write elements of the solution
set in vector form, as guaranteed by Theorem VFSLS.

Archetype A, Archetype B, Archetype C, Archetype D, Archetype E, Archetype F, Archetype
G, Archetype H, Archetype I, Archetype J
C40†     Find the vector form of the solutions to the system of equations below.
                                     2x1 − 4x2 + 3x3 + x5 = 6
                             x1 − 2x2 − 2x3 + 14x4 − 4x5 = 15
                                 x1 − 2x2 + x3 + 2x4 + x5 = −1
                                  −2x1 + 4x2 − 12x4 + x5 = −7
§L C                 A First Course in Linear Algebra                                    81

C41†    Find the vector form of the solutions to the system of equations below.
             −2x1 − 1x2 − 8x3 + 8x4 + 4x5 − 9x6 − 1x7 − 1x8 − 18x9 = 3
               3x1 − 2x2 + 5x3 + 2x4 − 2x5 − 5x6 + 1x7 + 2x8 + 15x9 = 10
                            4x1 − 2x2 + 8x3 + 2x5 − 14x6 − 2x8 + 2x9 = 36
                           −1x1 + 2x2 + 1x3 − 6x4 + 7x6 − 1x7 − 3x9 = −8
                   3x1 + 2x2 + 13x3 − 14x4 − 1x5 + 5x6 − 1x8 + 12x9 = 15
             −2x1 + 2x2 − 2x3 − 4x4 + 1x5 + 6x6 − 2x7 − 2x8 − 15x9 = −7

M10†    Example TLC asks if the vector
                                             
                                           13
                                          15 
                                          5 
                                             
                                       w=
                                         −17
                                              
                                          2 
                                           25
can be written as a linear combination of the four vectors
                                                        −5
                                                                         
                2                   6                                         3
              4                 3                  2                  2
              −3                0                  1                  −5
                                                                       
        u1 =               u2 =              u3 =                u4 =  
              1                 −2                 1                  7
              2                 1                  −3                 1
                9                   4                    0                    3
Can it? Can any vector in C6 be written as a linear combination of the four vectors
u1 , u2 , u3 , u4 ?
M11† At the end of Example VFS, the vector w is claimed to be a solution to the linear
system under discussion. Verify that w really is a solution. Then determine the four scalars
that express w as a linear combination of c, u1 , u2 , u3 .
82   Ro b e rt B e e z e r   §L C
Section SS
Spanning Sets
In this section we will provide an extremely compact way to describe an infinite set
of vectors, making use of linear combinations. This will give us a convenient way to
describe the solution set of a linear system, the null space of a matrix, and many
other sets of vectors.



Subsection SSV
Span of a Set of Vectors
In Example VFSAL we saw the solution set of a homogeneous system described
as all possible linear combinations of two particular vectors. This is a useful way
to construct or describe infinite sets of vectors, so we encapsulate the idea in a
definition.
Definition SSCV Span of a Set of Column Vectors
Given a set of vectors S = {u1 , u2 , u3 , . . . , up }, their span, hSi, is the set of all
possible linear combinations of u1 , u2 , u3 , . . . , up . Symbolically,
           hSi = { α1 u1 + α2 u2 + α3 u3 + · · · + αp up | αi ∈ C, 1 ≤ i ≤ p}
                 ( p                             )
                   X
               =       αi ui αi ∈ C, 1 ≤ i ≤ p
                     i=1

                                                                                         
    The span is just a set of vectors, though in all but one situation it is an infinite
set. (Just when is it not infinite?) So we start with a finite collection of vectors S (p
of them to be precise), and use this finite set to describe an infinite set of vectors,
hSi. Confusing the finite set S with the infinite set hSi is one of the most persistent
problems in understanding introductory linear algebra. We will see this construction
repeatedly, so let us work through some examples to get comfortable with it. The
most obvious question about a set is if a particular item of the correct type is in the
set, or not in the set.
Example ABS A basic span
Consider the set of 5 vectors, S, from C4
                                                     
                           1       2     7           1       −1 
                                                                
                          1  1   3            1      0
                    S =  ,  ,  ,              −1 ,   9
                         3
                                   2     5                      
                                                                 
                            1      −1     −5          2        0
and consider the infinite set of vectors hSi formed from all possible linear combinations
of the elements of S. Here are four vectors we definitely know are elements of hSi,
since we will construct them in accordance with Definition SSCV,
                                                               
                1            2               7           1            −1       −4
               1         1            3          1          0 2
      w = (2)   + (1)   + (−1)   + (2)   + (3)   =  
                3            2               5          −1             9        28
                1           −1              −5           2             0        10
                                                      
             1           2          7         1        −1    −26
            1        1        3       1        0   −6 
    x = (5)   + (−6)   + (−3)   + (4)   + (2)   = 
             3           2          5        −1         9      2 
             1          −1         −5         2         0     34
                                                  
                1          2         7        1         −1     7
               1       1       3       1       0 4
       y = (1)   + (0)   + (1)   + (0)   + (1)   =  
                3          2         5        −1         9    17
                1         −1        −5        2          0    −4

                                            83
84                                Ro b e rt B e e z e r                                 §S S
                                                   
                 1         2          7        1         −1    0
                1       1       3       1        0  0
        z = (0)   + (0)   + (0)   + (0)   + (0)   =  
                 3         2          5        −1         9    0
                 1         −1        −5        2          0    0

    The purpose of a set is to collect objects with some common property, and to
exclude objects without that property. So the most fundamental question about a
set is if a given object is an element of the set or not. Let us learn more about hSi
by investigating which vectors are elements of the set, and which are not.
                             
                        −15
                       −6 
    First, is u =              an element of hSi? We are asking if there are scalars
                         19 
                          5
α1 , α2 , α3 , α4 , α5 such that
                                                                  
               1            2          7          1         −1           −15
            1         1          3       1        0            −6 
         α1   + α2   + α3   + α4   + α5   = u = 
               3            2          5         −1          9            19 
               1          −1          −5          2          0             5

    Applying Theorem SLSLC we recognize the search for these scalars as a solution
to a linear system of equations with augmented matrix
                                                     
                            1 2      7   1 −1 −15
                          1 1       3   1    0   −6 
                          3 2       5 −1 9       19 
                            1 −1 −5 2         0    5
which row-reduces to
                                                             
                              1      0   −1     0     3    10
                            0       1   4      0    −1    −9
                                                             
                            0       0    0     1    −2    −7
                              0      0    0     0     0    0

    At this point, we see that the system is consistent (Theorem RCLS), so we know
there is a solution for the five scalars α1 , α2 , α3 , α4 , α5 . This is enough evidence for
us to say that u ∈ hSi. If we wished further evidence, we could compute an actual
solution, say
        α1 = 2          α2 = 1           α3 = −2          α4 = −3           α5 = 2

     This particular solution allows us to write
                                                        
           1          2             7             1   −1       −15
          1       1           3            1 0       −6 
      (2)   + (1)   + (−2)   + (−3)   + (2)   = u = 
           3          2             5             −1   9        19 
           1         −1            −5             2    0         5
making it even more obvious that u ∈ hSi.
                                    
                                     3
                                   1
    Let us do it again. Is v =   an element of hSi? We are asking if there are
                                     2
                                    −1
scalars α1 , α2 , α3 , α4 , α5 such that
                                                         
              1             2          7   1          −1            3
            1         1          3  1        0          1
        α1   + α2   + α3   + α4   + α5   = v =  
              3             2          5   −1          9            2
              1            −1          −5  2           0           −1

    Applying Theorem SLSLC we recognize the search for these scalars as a solution
to a linear system of equations with augmented matrix
                                                    
                            1 2      7   1 −1 3
                          1 1       3   1    0    1
                          3 2       5 −1 9        2
                            1 −1 −5 2         0 −1
§S S                A First Course in Linear Algebra                                  85

which row-reduces to
                                                             
                             1     0     −1    0    3       0
                           0      1      4    0   −1       0
                                                             
                           0      0      0    1   −2       0
                             0     0      0    0    0       1
   At this point, we see that the system is inconsistent by Theorem RCLS, so we
know there is not a solution for the five scalars α1 , α2 , α3 , α4 , α5 . This is enough
evidence for us to say that v 6∈ hSi. End of story.                                    4
Example SCAA Span of the columns of Archetype A
Begin with the finite set of three vectors of size 3
                                           (" # " # " #)
                                               1     −1 2
                     S = {u1 , u2 , u3 } =     2 , 1 , 1
                                               1     1  0
and consider the infinite set hSi. The vectors of S could have been chosen to be
anything, but for reasons that will become clear later, we have chosen the three
columns of the coefficient matrix in Archetype A.
    First, as an example, note that
                            " #        " #         " # " #
                             1           −1         2     22
                    v = (5) 2 + (−3) 1 + (7) 1 = 14
                             1            1         0      2
is in hSi, since it is a linear combination of u1 , u2 , u3 . We write this succinctly as
v ∈ hSi. There is nothing magical about the scalars α1 = 5, α2 = −3, α3 = 7, they
could have been chosen to be anything. So repeat this part of the example yourself,
using different values of α1 , α2 , α3 . What happens if you choose all three scalars to
be zero?
     So we know how to quickly construct sample elements of the set hSi. A slightly
different question arises when you are handed"a #   vector of the correct size and asked
                                                  1
if it is an element of hSi. For example, is w = 8 in hSi? More succinctly, w ∈ hSi?
                                                  5
     To answer this question, we will look for scalars α1 , α2 , α3 so that
                               α1 u1 + α2 u2 + α3 u3 = w
By Theorem SLSLC solutions to this vector equation are solutions to the system of
equations
                                       α1 − α2 + 2α3 = 1
                                       2α1 + α2 + α3 = 8
                                              α1 + α2 = 5
Building the augmented matrix for      this linear system, and row-reducing, gives
                                                   
                              1         0     1 3
                           0           1 −1 2
                              0         0     0 0
   This system has infinitely many solutions (there is a free variable in x3 ), but all
we need is one solution vector. The solution,
                α1 = 2                   α2 = 3                   α3 = 1
tells us that
                               (2)u1 + (3)u2 + (1)u3 = w
so we are convinced that w really is in hSi. Notice that there are an infinite number
of ways to answer this question affirmatively. We could choose a different solution,
this time choosing the free variable to be zero,
                α1 = 3                   α2 = 2                   α3 = 0
86                              Ro b e rt B e e z e r                                §S S

shows us that
                              (3)u1 + (2)u2 + (0)u3 = w
    Verifying the arithmetic in this second solution will make it obvious that w is in
this span. And of course, we now realize that there are an infinite number of ways
to realize w as element of hSi.
                                                                           " #
                                                                            2
    Let us ask the same type of question again, but this time with y = 4 , i.e. is
                                                                            3
y ∈ hSi?
    So we will look for scalars α1 , α2 , α3 so that
                               α1 u1 + α2 u2 + α3 u3 = y
By Theorem SLSLC solutions to this vector equation are the solutions to the system
of equations
                                      α1 − α2 + 2α3 = 2
                                      2α1 + α2 + α3 = 4
                                             α1 + α2 = 3
Building the augmented matrix for     this linear system, and row-reducing, gives
                                                  
                             1        0     1    0
                          0          1 −1 0 
                             0         0    0     1
    This system is inconsistent (there is a pivot column in the last column, Theorem
RCLS), so there are no scalars α1 , α2 , α3 that will create a linear combination of
u1 , u2 , u3 that equals y. More precisely, y 6∈ hSi.
    There are three things to observe in this example. (1) It is easy to construct
vectors in hSi. (2) It is possible that some vectors are in hSi (e.g. w), while others
are not (e.g. y). (3) Deciding if a given vector is in hSi leads to solving a linear
system of equations and asking if the system is consistent.
    With a computer program in hand to solve systems of linear equations, could
you create a program to decide if a vector was, or was not, in the span of a given set
of vectors? Is this art or science?
    This example was built on vectors from the columns of the coefficient matrix of
Archetype A. Study the determination that v ∈ hSi and see if you can connect it
with some of the other properties of Archetype A.                                   4
   Having analyzed Archetype A in Example SCAA, we will of course subject
Archetype B to a similar investigation.
Example SCAB Span of the columns of Archetype B
Begin with the finite set of three vectors of size 3 that are the columns of the
coefficient matrix in Archetype B,
                                          (" # " # "        #)
                                            −7   −6     −12
                    R = {v1 , v2 , v3 } =    5 , 5 ,     7
                                             1    0      4
and consider the infinite set hRi.
   First, as an example, note that
                         " #       " #   "    # "    #
                           −7       −6    −12     −2
                 x = (2) 5 + (4) 5 + (−3) 7    = 9
                            1        0     4     −10
is in hRi, since it is a linear combination of v1 , v2 , v3 . In other words, x ∈ hRi. Try
some different values of α1 , α2 , α3 yourself, and see what vectors you can create as
elements of hRi.
                                                                                 "    #
                                                                                  −33
    Now ask if a given vector is an element of hRi. For example, is z = 24 in
                                                                                   5
§S S                 A First Course in Linear Algebra                               87

hRi? Is z ∈ hRi?
   To answer this question, we will look for scalars α1 , α2 , α3 so that
                             α1 v1 + α2 v2 + α3 v3 = z
By Theorem SLSLC solutions to this vector equation are the solutions to the system
of equations
                              −7α1 − 6α2 − 12α3 = −33
                                 5α1 + 5α2 + 7α3 = 24
                                         α1 + 4α3 = 5
Building the augmented matrix for this linear system, and row-reducing, gives
                                             
                           1     0    0 −3
                         0      1    0     5
                           0     0    1     2
   This system has a unique solution,
                  α1 = −3               α2 = 5                 α3 = 2
telling us that
                            (−3)v1 + (5)v2 + (2)v3 = z
so we are convinced that z really is in hRi. Notice that in this case we have only
one way to answer the question affirmatively since the solution is unique.
                                               " #
                                                 −7
   Let us ask about another vector, say is x = 8 in hRi? Is x ∈ hRi?
                                                 −3
   We desire scalars α1 , α2 , α3 so that
                             α1 v1 + α2 v2 + α3 v3 = x
By Theorem SLSLC solutions to this vector equation are the solutions to the system
of equations
                              −7α1 − 6α2 − 12α3 = −7
                                 5α1 + 5α2 + 7α3 = 8
                                         α1 + 4α3 = −3
Building the augmented matrix for this    linear system, and row-reducing, gives
                                                 
                            1    0        0    1
                         0      1        0    2
                            0    0        1    −1
   This system has a unique solution,
                  α1 = 1              α2 = 2                 α3 = −1
telling us that
                            (1)v1 + (2)v2 + (−1)v3 = x
so we are convinced that x really is in hRi. Notice that in this case we again have
only one way to answer the question affirmatively since the solution is again unique.
   We could continue to test other vectors for membership in hRi, but there is no
point. A question about membership in hRi inevitably leads to a system of three
equations in the three variables α1 , α2 , α3 with a coefficient matrix whose columns
are the vectors v1 , v2 , v3 . This particular coefficient matrix is nonsingular, so by
Theorem NMUS, the system is guaranteed to have a solution. (This solution is
unique, but that is not critical here.) So no matter which vector we might have
chosen for z, we would have been certain to discover that it was an element of hRi.
Stated differently, every vector of size 3 is in hRi, or hRi = C3 .
   Compare this example with Example SCAA, and see if you can connect z with
88                                   Ro b e rt B e e z e r                                 §S S

some aspects of the write-up for Archetype B.                                                4

Subsection SSNS
Spanning Sets of Null Spaces
We saw in Example VFSAL that when a system of equations is homogeneous the
solution set can be expressed in the form described by Theorem VFSLS where the
vector c is the zero vector. We can essentially ignore this vector, so that the remainder
of the typical expression for a solution looks like an arbitrary linear combination,
where the scalars are the free variables and the vectors are u1 , u2 , u3 , . . . , un−r .
Which sounds a lot like a span. This is the substance of the next theorem.
Theorem SSNS Spanning Sets for Null Spaces
Suppose that A is an m × n matrix, and B is a row-equivalent matrix in re-
duced row-echelon form. Suppose that B has r pivot columns, with indices given
by D = {d1 , d2 , d3 , . . . , dr }, while the n − r non-pivot columns have indices F =
{f1 , f2 , f3 , . . . , fn−r , n + 1}. Construct the n − r vectors zj , 1 ≤ j ≤ n − r of size
n,
                                        
                                        
                                        1          if i ∈ F , i = fj
                                [zj ]i = 0          if i ∈ F , i 6= fj
                                        
                                        − [B]
                                              k,fj  if i ∈ D, i = dk
     Then the null space of A is given by
                              N (A) = h{z1 , z2 , z3 , . . . , zn−r }i

Proof. Consider the homogeneous system with A as a coefficient matrix, LS(A, 0).
Its set of solutions, S, is by Definition NSM, the null space of A, N (A). Let B 0
denote the result of row-reducing the augmented matrix of this homogeneous system.
Since the system is homogeneous, the final column of the augmented matrix will be
all zeros, and after any number of row operations (Definition RO), the column will
still be all zeros. So B 0 has a final column that is totally zeros.
     Now apply Theorem VFSLS to B 0 , after noting that our homogeneous system
must be consistent (Theorem HSC). The vector c has zeros for each entry that has
an index in F . For entries with their index in D, the value is − [B 0 ]k,n+1 , but for
B 0 any entry in the final column (index n + 1) is zero. So c = 0. The vectors zj ,
1 ≤ j ≤ n − r are identical to the vectors uj , 1 ≤ j ≤ n − r described in Theorem
VFSLS. Putting it all together and applying Definition SSCV in the final step,
 N (A) = S
        = { c + α1 u1 + α2 u2 + α3 u3 + · · · + αn−r un−r | α1 , α2 , α3 , . . . , αn−r ∈ C}
        = { α1 u1 + α2 u2 + α3 u3 + · · · + αn−r un−r | α1 , α2 , α3 , . . . , αn−r ∈ C}
        = h{z1 , z2 , z3 , . . . , zn−r }i
                                                                                             

    Notice that the hypotheses of Theorem VFSLS and Theorem SSNS are slightly
different. In the former, B is the row-reduced version of an augmented matrix of
a linear system, while in the latter, B is the row-reduced version of an arbitrary
matrix. Understanding this subtlety now will avoid confusion later.
Example SSNS Spanning set of a null space
Find a set of vectors, S, so that the null space of the matrix A below is the span of
S, that is, hSi = N (A).
                                                         
                                   1   3     3 −1 −5
                                2     5     7     1    1
                           A=
                                   1   1     5     1    5
                                 −1 −4 −2 0             4
    The null space of A is the set of all solutions to the homogeneous system LS(A, 0).
If we find the vector form of the solutions to this homogeneous system (Theorem
§S S                A First Course in Linear Algebra                                  89

VFSLS) then the vectors uj , 1 ≤ j ≤ n − r in the linear combination are exactly the
vectors zj , 1 ≤ j ≤ n − r described in Theorem SSNS. So we can mimic Example
VFSAL to arrive at these vectors (rather than being a slave to the formulas in the
statement of the theorem).
   Begin by row-reducing A. The      result is
                                                       
                            1         0    6     0    4
                          0          1   −1     0   −2
                                                       
                          0          0    0     1    3
                            0         0    0     0    0

     With D = {1, 2, 4} and F = {3, 5} we recognize that x3 and x5 are free variables
and we can interpret each nonzero row as an expression for the dependent variables
x1 , x2 , x4 (respectively) in the free variables x3 and x5 . With this we can write the
vector form of a solution vector as
                                                          
                       x1       −6x3 − 4x5            −6         −4
                     x2   x3 + 2x5              1        2
                                                          
                     x3  =        x3      = x3  1  + x5  0 
                     x   −3x                    0        −3
                        4               5
                       x5            x5                0          1

   Then in the notation of Theorem SSNS,
                                                           
                          −6                                 −4
                        1                                 2
                                                           
                   z1 =  1                           z2 =  0 
                        0                                 −3
                           0                                  1
and
                                                       
                                               −6         −4 
                                            *
                                             
                                              1            +
                                                         2 
                                                              
                                                        
                     N (A) = h{z1 , z2 }i =    1 ,     0
                                             
                                              0       −3
                                             
                                                             
                                                              
                                                0          1
                                                                                      4

Example NSDS Null space directly as a span
Let us express the null space of A as the span of a set of vectors, applying Theorem
SSNS as economically as possible, without reference to the underlying homogeneous
system of equations (in contrast to Example SSNS).
                                                            
                               2    1     5    1     5     1
                            1      1     3    1     6 −1
                                                            
                        A = −1 1 −1 0               4 −3
                            −3 2 −4 −4 −7 0 
                               3 −1 5          2     2     3

   Theorem SSNS creates vectors for the span by first row-reducing the matrix in
question. The row-reduced version of A is
                                                      
                              1    0 2 0 −1 2
                           0      1 1 0       3 −1
                                                      
                       B= 0      0 0 1       4 −2
                                                       
                           0      0 0 0       0     0
                              0    0 0 0       0     0

   We will mechanically follow the prescription of Theorem SSNS. Here we go, in
two big steps.
    First, the non-pivot columns have indices F = {3, 5, 6}, so we will construct the
n − r = 6 − 3 = 3 vectors with a pattern of zeros and ones dictated by the indices
in F . This is the realization of the first two lines of the three-case definition of the
90                              Ro b e rt B e e z e r                              §S S

vectors zj , 1 ≤ j ≤ n − r.
                                                                     
                                                                   
                                                                   
                   1                     0                        0
              z1 =                  z2 =                     z3 =  
                                                                   
                   0                     1                        0
                    0                       0                          1
    Each of these vectors arises due to the presence of a column that is not a pivot
column. The remaining entries of each vector are the entries of the non-pivot column,
negated, and distributed into the empty slots in order (these slots have indices in
the set D, so also refer to pivot columns). This is the realization of the third line of
the three-case definition of the vectors zj , 1 ≤ j ≤ n − r.
                                                                 
                    −2                         1                      −2
                  −1                     −3                     1
                                                                 
                  1                      0                      0
             z1 =                   z2 =                  z3 =  
                  0                      −4                     2
                  0                      1                      0
                     0                         0                       1
     So, by Theorem SSNS, we have
                                                             
                                            −2
                                                        1        −2 
                                          *
                                           
                                            −1       −3      1 
                                                                      +
                                                              
                                             1        0      0
              N (A) = h{z1 , z2 , z3 }i =     0 ,     −4 ,   2
                                           
                                                             
                                           
                                            0        1       0 
                                                                      
                                                                      
                                           
                                                                     
                                                                      
                                               0         0         1
    We know that the null space of A is the solution set of the homogeneous system
LS(A, 0), but nowhere in this application of Theorem SSNS have we found occasion
to reference the variables or equations of this system. These details are all buried in
the proof of Theorem SSNS.                                                           4
   Here is an example that will simultaneously exercise the span construction and
Theorem SSNS, while also pointing the way to the next section.
Example SCAD Span of the columns of Archetype D
Begin with the set of four vectors of size 3
                                         (" # " # " #              "     #)
                                              2   1   7                −7
             T = {w1 , w2 , w3 , w4 } =      −3 , 4 , −5 ,             −6
                                              1   1   4                −5
and consider the infinite set W = hT i. The vectors of T have been chosen as the
four columns of the coefficient matrix in Archetype D. Check that the vector
                                           
                                            2
                                          3
                                     z2 =  
                                            0
                                            1
is a solution to the homogeneous system LS(D, 0) (it is the vector z2 provided by
the description of the null space of the coefficient matrix D from Theorem SSNS).
    Applying Theorem SLSLC, we can write the linear combination,
                              2w1 + 3w2 + 0w3 + 1w4 = 0
which we can solve for w4 ,
                               w4 = (−2)w1 + (−3)w2 .
    This equation says that whenever we encounter the vector w4 , we can replace it
with a specific linear combination of the vectors w1 and w2 . So using w4 in the set
T , along with w1 and w2 , is excessive. An example of what we mean here can be
illustrated by the computation,
              5w1 + (−4)w2 + 6w3 + (−3)w4
                   = 5w1 + (−4)w2 + 6w3 + (−3) ((−2)w1 + (−3)w2 )
§S S                A First Course in Linear Algebra                                   91

                   = 5w1 + (−4)w2 + 6w3 + (6w1 + 9w2 )
                   = 11w1 + 5w2 + 6w3
    So what began as a linear combination of the vectors w1 , w2 , w3 , w4 has been
reduced to a linear combination of the vectors w1 , w2 , w3 . A careful proof using
our definition of set equality (Definition SE) would now allow us to conclude that
this reduction is possible for any vector in W , so
                                 W = h{w1 , w2 , w3 }i
So the span of our set of vectors, W , has not changed, but we have described it by
the span of a set of three vectors, rather than four. Furthermore, we can achieve yet
another, similar, reduction.
   Check that the vector
                                             
                                             −3
                                            −1
                                      z1 =  
                                              1
                                              0
is a solution to the homogeneous system LS(D, 0) (it is the vector z1 provided by
the description of the null space of the coefficient matrix D from Theorem SSNS).
Applying Theorem SLSLC, we can write the linear combination,
                             (−3)w1 + (−1)w2 + 1w3 = 0
which we can solve for w3 ,
                                   w3 = 3w1 + 1w2
   This equation says that whenever we encounter the vector w3 , we can replace it
with a specific linear combination of the vectors w1 and w2 . So, as before, the vector
w3 is not needed in the description of W , provided we have w1 and w2 available. In
particular, a careful proof (such as is done in Example RSC5) would show that
                                   W = h{w1 , w2 }i
    So W began life as the span of a set of four vectors, and we have now shown
(utilizing solutions to a homogeneous system) that W can also be described as the
span of a set of just two vectors. Convince yourself that we cannot go any further.
In other words, it is not possible to dismiss either w1 or w2 in a similar fashion and
winnow the set down to just one vector.
    What was it about the original set of four vectors that allowed us to declare
certain vectors as surplus? And just which vectors were we able to dismiss? And
why did we have to stop once we had two vectors remaining? The answers to these
questions motivate “linear independence,” our next section and next definition, and
so are worth considering carefully now.                                             4


Reading Questions

1. Let S be the set of three vectors below.
                                          
                                      1        3       4 
                                S =  2  , −4 , −2
                                      −1       2       1 
                                                 
                                                  −1
   Let W = hSi be the span of S. Is the vector  8  in W ? Give an explanation of the
                                                  −4
   reason for your answer.
                                                          
                                                           6
2. Use S and W from the previous question. Is the vector  5  in W ? Give an explanation
                                                          −1
   of the reason for your answer.

3. For the matrix A below, find a set S so that hSi = N (A), where N (A) is the null space
92                                Ro b e rt B e e z e r                                 §S S

     of A. (See Theorem SSNS.)
                                                          
                                          1    3    1    9
                                     A = 2    1   −3    8
                                          1    1   −1    5

Exercises
C22† For each archetype that is a system of equations, consider the corresponding homoge-
neous system of equations. Write elements of the solution set to these homogeneous systems
in vector form, as guaranteed by Theorem VFSLS. Then write the null space of the co-
efficient matrix of each system as the span of a set of vectors, as described in Theorem SSNS.

Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J
C23† Archetype K and Archetype L are defined as matrices. Use Theorem SSNS directly
to find a set S so that hSi is the null space of the matrix. Do not make any reference to
the associated homogeneous system of equations in your solution.
                                 2        3                              5
                                                                      
                            
                              −1  2                                 8 
                                             
C40† Suppose that S =   ,   . Let W = hSi and let x =                     . Is x ∈ W ?
                             3
                                        −2                            −12
                                 4        1                              −5
If so, provide an explicit linear combination that demonstrates this.
                                 2        3                             5
                                                                  
                             
                               −1  2                               1
                                             
     †
C41 Suppose that S =   ,   . Let W = hSi and let y =  . Is y ∈ W ? If
                              3
                                        −2                            3
                                 4        1                              5
so, provide an explicit linear combination that demonstrates this.
                             
                            2        1        3            1
                                                          
                       
                       −1  1  −1                 −1
                       
                                                 
                                                  
C42† Suppose R =  3  ,  2  ,  0  . Is y = −8 in hRi?
                                                       

                        4
                       
                       
                       
                                  2      3 
                                                 
                                                  
                                                          −4
                                    −1       −2            −3
                                                  
                            0
                             
                            2        1        3          1
                                                          
                       
                       −1  1  −1                 1
                       
                                                 
                                                  
     †
C43 Suppose R =  3  ,  2  ,  0  . Is z = 5 in hRi?
                                                    
                       
                         4   2   3              3
                                                 
                                    −1       −2
                                                 
                            0                             1
                                                                            
                              −1        3     1    −6                                −5
     †
C44 Suppose that S =  2  , 1 , 5 ,  5  . Let W = hSi and let y =  3 .
                              1         2     4    1                                 0
Is y ∈ W ? If so, provide an explicit linear combination that demonstrates this.
                                                                              
                              −1        3      1   −6                                 2
     †
C45 Suppose that S =  2  , 1 , 5 ,  5  . Let W = hSi and let w = 1.
                              1         2      4    1                                 3
Is w ∈ W ? If so, provide an explicit linear combination that demonstrates this.
C50† Let A be the matrix below.
     1. Find a set S so that N (A) = hSi.
                 3
                
               −5
     2. If z =  , then show directly that z ∈ N (A).
                 1
                 2
     3. Write z as a linear combination of the vectors in S.

                                                         
                                       2       3   1    4
                                   A= 1       2   1    3
                                      −1       0   1    1

C60† For the matrix A below, find a set of vectors S so that the span of S equals the
null space of A, hSi = N (A).
                                                    
                                  1     1     6   −8
                              A= 1    −2     0    1
                                 −2     1    −6    7
§S S                     A First Course in Linear Algebra                                    93

M10†        Consider the set of all size 2 vectors in the Cartesian plane R2 .

   1. Give a geometric description of the span of a single vector.

   2. How can you tell if two vectors span the entire plane, without doing any row reduction
      or calculation?

M11†        Consider the set of all size 3 vectors in Cartesian 3-space R3 .

   1. Give a geometric description of the span of a single vector.

   2. Describe the possibilities for the span of two vectors.

   3. Describe the possibilities for the span of three vectors.
                                 
                      1             2
       †
M12         Let u =  3  and v = −2.
                     −2             1

   1. Find a vector w1 , different from u and v, so that h{u, v, w1 }i = h{u, v}i.

   2. Find a vector w2 so that h{u, v, w2 }i =
                                             6 h{u, v}i.

M20 In Example SCAD we began with the four columns of the coefficient matrix of
Archetype D, and used these columns in a span construction. Then we methodically argued
that we could remove the last column, then the third column, and create the same set by
just doing a span construction with the first two columns. We claimed we could not go any
further, and had removed as many vectors as possible. Provide a convincing argument for
why a third vector cannot be removed.
M21† In the spirit of Example SCAD, begin with the four columns of the coefficient
matrix of Archetype C, and use these columns in a span construction to build the set S.
Argue that S can be expressed as the span of just three of the columns of the coefficient
matrix (saying exactly which three) and in the spirit of Exercise SS.M20 argue that no one
of these three vectors can be removed and still have a span construction create S.
T10†       Suppose that v1 , v2 ∈ Cm . Prove that
                               h{v1 , v2 }i = h{v1 , v2 , 5v1 + 3v2 }i
    †
T20 Suppose that S is a set of vectors from Cm . Prove that the zero vector, 0, is an
element of hSi.
T21        Suppose that S is a set of vectors from Cm and x, y ∈ hSi. Prove that x + y ∈ hSi.
T22        Suppose that S is a set of vectors from Cm , α ∈ C, and x ∈ hSi. Prove that αx ∈ hSi.
94   Ro b e rt B e e z e r   §S S
Section LI
Linear Independence
“Linear independence” is one of the most fundamental conceptual ideas in linear
algebra, along with the notion of a span. So this section, and the subsequent Section
LDS, will explore this new idea.



Subsection LISV
Linearly Independent Sets of Vectors

Theorem SLSLC tells us that a solution to a homogeneous system of equations is
a linear combination of the columns of the coefficient matrix that equals the zero
vector. We used just this situation to our advantage (twice!) in Example SCAD
where we reduced the set of vectors used in a span construction from four down to
two, by declaring certain vectors as surplus. The next two definitions will allow us
to formalize this situation.

Definition RLDCV Relation of Linear Dependence for Column Vectors
Given a set of vectors S = {u1 , u2 , u3 , . . . , un }, a true statement of the form
                       α1 u1 + α2 u2 + α3 u3 + · · · + αn un = 0
is a relation of linear dependence on S. If this statement is formed in a trivial
fashion, i.e. αi = 0, 1 ≤ i ≤ n, then we say it is the trivial relation of linear
dependence on S.                                                               

Definition LICV Linear Independence of Column Vectors
The set of vectors S = {u1 , u2 , u3 , . . . , un } is linearly dependent if there is
a relation of linear dependence on S that is not trivial. In the case where the
only relation of linear dependence on S is the trivial one, then S is a linearly
independent set of vectors.                                                        

    Notice that a relation of linear dependence is an equation. Though most of it is a
linear combination, it is not a linear combination (that would be a vector). Linear
independence is a property of a set of vectors. It is easy to take a set of vectors,
and an equal number of scalars, all zero, and form a linear combination that equals
the zero vector. When the easy way is the only way, then we say the set is linearly
independent. Here are a couple of examples.

Example LDS Linearly dependent set in C5
Consider the set of n = 4 vectors from C5 ,
                                                  
                           
                              2       1    2            −6 
                           
                           −1  2   1              7 
                                                             
                                                   
                      S =  3  , −1 , −3 ,         −1
                           
                              5 6                 0 
                                                             
                            1
                                                            
                                                             
                               2       2    1             1
    To determine linear independence we first form a relation of linear dependence,
                                                  
                      2          1            2          −6
                    −1       2         1         7
                                                  
                 α1  3  + α2 −1 + α3 −3 + α4 −1 = 0
                    1        5         6         0
                      2          2            1          1
    We know that α1 = α2 = α3 = α4 = 0 is a solution to this equation, but
that is of no interest whatsoever. That is always the case, no matter what four
vectors we might have chosen. We are curious to know if there are other, nontrivial,
solutions. Theorem SLSLC tells us that we can find such solutions as solutions to the
homogeneous system LS(A, 0) where the coefficient matrix has these four vectors

                                           95
96                             Ro b e rt B e e z e r                            §L I

as columns, which we then row-reduce
                                                                  
                    2    1   2 −6                  1     0   0    −2
                  −1 2      1    7              0     1   0     4
                                     RREF                         
             A =  3 −1 −3 −1 −−−−→             0     0   1    −3
                                                                     
                  1              0        
                         5   6                     0     0   0     0
                    2    2   1    1                0     0   0     0
   We could solve this homogeneous system completely, but for this example all we
need is one nontrivial solution. Setting the lone free variable to any nonzero value,
such as x4 = 1, yields the nontrivial solution
                                           
                                             2
                                          −4
                                      x= 
                                             3
                                             1
completing our application of Theorem SLSLC, we have
                                             
                     2             1       2       −6
                   −1         2      1      7
                                             
                 2  3  + (−4) −1 + 3 −3 + 1 −1 = 0
                   1          5      6      0
                     2             2       1         1
   This is a relation of linear dependence on S that is not trivial, so we conclude
that S is linearly dependent.                                                    4
Example LIS Linearly independent set in       C5
Consider the set of n = 4 vectors from C5 ,
                                                 
                           
                              2       1        2       −6 
                           
                           −1  2         1       7 
                                                            
                                                  
                      T =  3  , −1 ,      −3 ,   −1
                           
                              5          6       1 
                                                            
                            1
                                                           
                                                            
                               2       2        1        1
     To determine linear independence we first form a relation of linear dependence,
                                                   
                       2          1            2          −6
                     −1       2         1         7
                                                   
                  α1  3  + α2 −1 + α3 −3 + α4 −1 = 0
                     1        5         6         1
                       2          2            1          1
    We know that α1 = α2 = α3 = α4 = 0 is a solution to this equation, but
that is of no interest whatsoever. That is always the case, no matter what four
vectors we might have chosen. We are curious to know if there are other, nontrivial,
solutions. Theorem SLSLC tells us that we can find such solutions as solution to the
homogeneous system LS(B, 0) where the coefficient matrix has these four vectors
as columns. Row-reducing this coefficient matrix yields,
                                                                 
                      2   1    2 −6               1     0   0     0
                   −1 2       1    7          0      1   0     0
                                       RREF  
                                                                    
              B =  3 −1 −3 −1 −−−−→  0               0   1     0
                   1     5    6    1                             
                                                  0     0   0     1
                      2   2    1    1             0     0   0     0
    From the form of this matrix, we see that there are no free variables, so the
solution is unique, and because the system is homogeneous, this unique solution is
the trivial solution. So we now know that there is but one way to combine the four
vectors of T into a relation of linear dependence, and that one way is the easy and
obvious way. In this situation we say that the set, T , is linearly independent. 4
   Example LDS and Example LIS relied on solving a homogeneous system of
equations to determine linear independence. We can codify this process in a time-
saving theorem.
Theorem LIVHS Linearly Independent Vectors and Homogeneous Systems
§L I                A First Course in Linear Algebra                                  97

Suppose that S = {v1 , v2 , v3 , . . . , vn } ⊆ Cm is a set of vectors and A is the m × n
matrix whose columns are the vectors in S. Then S is a linearly independent set if
and only if the homogeneous system LS(A, 0) has a unique solution.
Proof. (⇐) Suppose that LS(A, 0) has a unique solution. Since it is a homogeneous
system, this solution must be the trivial solution x = 0. By Theorem SLSLC, this
means that the only relation of linear dependence on S is the trivial one. So S is
linearly independent.
    (⇒) We will prove the contrapositive. Suppose that LS(A, 0) does not have a
unique solution. Since it is a homogeneous system, it is consistent (Theorem HSC),
and so must have infinitely many solutions (Theorem PSSLS). One of these infinitely
many solutions must be nontrivial (in fact, almost all of them are), so choose one.
By Theorem SLSLC this nontrivial solution will give a nontrivial relation of linear
dependence on S, so we can conclude that S is a linearly dependent set.          
   Since Theorem LIVHS is an equivalence, we can use it to determine the linear
independence or dependence of any set of column vectors, just by creating a matrix
and analyzing the row-reduced form. Let us illustrate this with two more examples.
Example LIHS Linearly independent, homogeneous system
Is the set of vectors
                              
                         
                            2       6     4 
                         
                         −1  2   3    
                                
                      S =  3  , −1 , −4
                         
                             3   5    
                          4
                                             
                                              
                             2       4     1
linearly independent or linearly dependent?
    Theorem LIVHS suggests we study the matrix, A, whose columns are the vectors
in S. Specifically, we are interested in the size of the solution set for the homogeneous
system LS(A, 0), so we row-reduce A.
                                                                  
                             2     6     4              1     0    0
                          −1 2          3           0      1    0
                                           RREF    
                                                                     
                                                                     
                     A =  3 −1 −4 −−−−→  0                 0    1 
                          4                         
                                   3     5              0     0    0
                             2     4     1              0     0    0
    Now, r = 3, so there are n − r = 3 − 3 = 0 free variables and we see that LS(A, 0)
has a unique solution (Theorem HSC, Theorem FVCS). By Theorem LIVHS, the
set S is linearly independent.                                                      4
Example LDHS Linearly dependent, homogeneous system
Is the set of vectors
                              
                         
                           2       6      4 
                                             
                         
                         −1  2   3   
                               
                      S =  3  , −1 , −4
                         
                            3  −1    
                          4
                                            
                                             
                            2       4      2
linearly independent or linearly dependent?
    Theorem LIVHS suggests we study the matrix, A, whose columns are the vectors
in S. Specifically, we are interested in the size of the solution set for the homogeneous
system LS(A, 0), so we row-reduce A.

                                                               
                        2        6     4              1    0   −1
                      −1        2     3                        
                                         RREF      0    1    1
                    A= 3       −1    −4 −−−−→ 
                                                     0    0    0
                      4              −1       
                                 3                    0    0    0
                        2        4     2              0    0    0
   Now, r = 2, so there are n − r = 3 − 2 = 1 free variables and we see that LS(A, 0)
has infinitely many solutions (Theorem HSC, Theorem FVCS). By Theorem LIVHS,
the set S is linearly dependent.                                                   4
98                              Ro b e rt B e e z e r                               §L I

    As an equivalence, Theorem LIVHS gives us a straightforward way to determine
if a set of vectors is linearly independent or dependent.
    Review Example LIHS and Example LDHS. They are very similar, differing only
in the last two slots of the third vector. This resulted in slightly different matrices
when row-reduced, and slightly different values of r, the number of nonzero rows.
Notice, too, that we are less interested in the actual solution set, and more interested
in its form or size. These observations allow us to make a slight improvement in
Theorem LIVHS.
Theorem LIVRN Linearly Independent Vectors, r and n
Suppose that S = {v1 , v2 , v3 , . . . , vn } ⊆ Cm is a set of vectors and A is the m × n
matrix whose columns are the vectors in S. Let B be a matrix in reduced row-echelon
form that is row-equivalent to A and let r denote the number of pivot columns in B.
Then S is linearly independent if and only if n = r.

Proof. Theorem LIVHS says the linear independence of S is equivalent to the
homogeneous linear system LS(A, 0) having a unique solution. Since LS(A, 0) is
consistent (Theorem HSC) we can apply Theorem CSRN to see that the solution is
unique exactly when n = r.                                                  

    So now here is an example of the most straightforward way to determine if a set
of column vectors is linearly independent or linearly dependent. While this method
can be quick and easy, do not forget the logical progression from the definition
of linear independence through homogeneous system of equations which makes it
possible.
Example LDRN Linearly dependent, r and n
Is the set of vectors
                                                     
                        
                          2     9    1   −3                 6 
                        
                                                          −2
                               1 
                        
                         −1  −6  1 
                                                           
                                                                
                                                                
                          3  −2 1  4               1
                     S =  ,  ,  ,  ,               4
                        
                         1   3  0  2               
                        
                         0   2  0  1               3 
                                                                
                                                                
                        
                                                               
                                                                
                           3     1    1    2                 2
linearly independent or linearly dependent?
    Theorem LIVRN suggests we place these vectors into a matrix as columns and
analyze the row-reduced version of the matrix,
                                                               
               2   9 1 −3 6                    1 0    0   0 −1
             −1 −6 1 1 −2                 0   1    0   0     1
                                                               
              3 −2 1 4                     
                                  1  RREF  0   0    1   0     2
             1                      −−−−→                      
                  3 0 2          4                              
             0                               0 0    0   1     1 
                   2 0 1          3        0   0    0   0     0
               3   1 1 2          2            0 0    0   0     0
   Now we need only compute that r = 4 < 5 = n to recognize, via Theorem LIVRN
that S is a linearly dependent set. Boom!                                    4
Example LLDS Large linearly dependent set in C4
Consider the set of n = 9 vectors from C4 ,
                                                            
           −1       7       1      0       5 2             3        1       −6 
                                                                               
           3   1   2  4 −2  1                  0      1     −1
   R =  ,  ,  ,  ,  ,  ,                       −3 ,   5 ,   1 .
         1
                   −3      −1      2       4 −6                                
                                                                                
             2       6      −2      9       3 4             1        3       1
   To employ Theorem LIVHS, we form         a 4 × 9 matrix, C, whose columns are the
vectors in R
                                                              
                    −1 7     1 0            5     2     3 1 −6
                   3   1    2 4            −2     1     0 1 −1
              C=                                                .
                     1 −3 −1 2               4    −6    −3 5 1 
                     2  6 −2 9               3     4     1 3 1
§L I               A First Course in Linear Algebra                                 99

    To determine if the homogeneous system LS(C, 0) has a unique solution or not,
we would normally row-reduce this matrix. But in this particular example, we can
do better. Theorem HMVEI tells us that since the system is homogeneous with
n = 9 variables in m = 4 equations, and n > m, there must be infinitely many
solutions. Since there is not a unique solution, Theorem LIVHS says the set is linearly
dependent.                                                                           4
   The situation in Example LLDS is slick enough to warrant formulating as a
theorem.
Theorem MVSLD More Vectors than Size implies Linear Dependence
Suppose that S = {u1 , u2 , u3 , . . . , un } ⊆ Cm and n > m. Then S is a linearly
dependent set.

Proof. Form the m × n matrix A whose columns are ui , 1 ≤ i ≤ n. Consider the
homogeneous system LS(A, 0). By Theorem HMVEI this system has infinitely many
solutions. Since the system does not have a unique solution, Theorem LIVHS says
the columns of A form a linearly dependent set, as desired.                  

Subsection LINM
Linear Independence and Nonsingular Matrices
We will now specialize to sets of n vectors from Cn . This will put Theorem MVSLD
off-limits, while Theorem LIVHS will involve square matrices. Let us begin by
contrasting Archetype A and Archetype B.
Example LDCAA Linearly dependent columns in Archetype A
Archetype A is a system of linear equations with coefficient matrix,
                                     "          #
                                       1 −1 2
                                A= 2 1 1
                                       1 1 0
    Do the columns of this matrix form a linearly independent or dependent set? By
Example S we know that A is singular. According to the definition of nonsingular
matrices, Definition NM, the homogeneous system LS(A, 0) has infinitely many
solutions. So by Theorem LIVHS, the columns of A form a linearly dependent set. 4
Example LICAB Linearly independent columns in Archetype B
Archetype B is a system of linear equations with coefficient matrix,
                                   "              #
                                     −7 −6 −12
                              B= 5        5     7
                                     1    0     4
    Do the columns of this matrix form a linearly independent or dependent set?
By Example NM we know that B is nonsingular. According to the definition of
nonsingular matrices, Definition NM, the homogeneous system LS(A, 0) has a unique
solution. So by Theorem LIVHS, the columns of B form a linearly independent set.
4
    That Archetype A and Archetype B have opposite properties for the columns
of their coefficient matrices is no accident. Here is the theorem, and then we will
update our equivalences for nonsingular matrices, Theorem NME1.
Theorem NMLIC Nonsingular Matrices have Linearly Independent Columns
Suppose that A is a square matrix. Then A is nonsingular if and only if the columns
of A form a linearly independent set.

Proof. This is a proof where we can chain together equivalences, rather than proving
the two halves separately.

  A nonsingular ⇐⇒ LS(A, 0) has a unique solution                 Definition NM
                  ⇐⇒ columns of A are linearly independent        Theorem LIVHS
100                                Ro b e rt B e e z e r                            §L I

                                                                                      

      Here is the update to Theorem NME1.
Theorem NME2 Nonsingular Matrix Equivalences, Round 2
Suppose that A is a square matrix. The following are equivalent.

  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, N (A) = {0}.

  4. The linear system LS(A, b) has a unique solution for every possible choice of
     b.

  5. The columns of A form a linearly independent set.

Proof. Theorem NMLIC is yet another equivalence for a nonsingular matrix, so we
can add it to the list in Theorem NME1.                                       

Subsection NSSLI
Null Spaces, Spans, Linear Independence
In Subsection SS.SSNS we proved Theorem SSNS which provided n − r vectors
that could be used with the span construction to build the entire null space of a
matrix. As we have hinted in Example SCAD, and as we will see again going forward,
linearly dependent sets carry redundant vectors with them when used in building
a set as a span. Our aim now is to show that the vectors provided by Theorem
SSNS form a linearly independent set, so in one sense they are as efficient as possible
a way to describe the null space. Notice that the vectors zj , 1 ≤ j ≤ n − r first
appear in the vector form of solutions to arbitrary linear systems (Theorem VFSLS).
The exact same vectors appear again in the span construction in the conclusion of
Theorem SSNS. Since this second theorem specializes to homogeneous systems the
only real difference is that the vector c in Theorem VFSLS is the zero vector for a
homogeneous system. Finally, Theorem BNS will now show that these same vectors
are a linearly independent set. We will set the stage for the proof of this theorem
with a moderately large example. Study the example carefully, as it will make it
easier to understand the proof.
Example LINSB Linear independence of null space basis
Suppose that we are interested in the null space of a 3×7 matrix, A, which row-reduces
to
                                                            
                              1     0 −2 4 0 3 9
                       B=0         1     5 6 0 7 1
                               0      0     0   0   1      8   −5
    The set F = {3, 4, 6, 7} is the set of indices for our four free variables that would
be used in a description of the solution set for the homogeneous system LS(A, 0).
Applying Theorem SSNS we can begin to construct a set of four vectors whose span
is the null space of A, a set of vectors we will reference as T .
                                                         
                                                  
                                                                           
                                                         
                                                                           
                                                                            
                                                  
                                                 *          
                                                  
                                                   1 0 0 0
                                                                            
                                                                            +
                                                           
         N (A) = hT i = h{z1 , z2 , z3 , z4 }i =    0 , 1 , 0 , 0
                                                  
                                                                  
                                                  
                                                                  
                                                  
                                                   0 0 1 0
                                                                           
                                                  
                                                                           
                                                                            
                                                      0      0     0      1
    So far, we have constructed as much of these individual vectors as we can, based
just on the knowledge of the contents of the set F . This has allowed us to determine
the entries in slots 3, 4, 6 and 7, while we have left slots 1, 2 and 5 blank. Without
§L I                A First Course in Linear Algebra                               101

doing any more, let us ask if T is linearly independent? Begin with a relation of
linear dependence on T , and see what we can learn about the scalars,
                       0 = α1 z1 + α2 z2 + α3 z3 + α4 z4
                                                  
                     0
                    0                             
                                                  
                    0       1        0       0   0
                                                  
                    0 = α1 0 + α2 1 + α3 0 + α4 0
                    0                             
                                                  
                    0       0        0       1   0
                     0          0         0         0     1
                                    
                                     
                                     
                            α1   0   0   0  α1 
                                     
                          =  0  + α2  +  0  +  0  = α2 
                                     
                                     
                             0   0  α3   0  α3 
                              0      0        0      α4      α4
    Applying Definition CVE to the two ends of this chain of equalities, we see that
α1 = α2 = α3 = α4 = 0. So the only relation of linear dependence on the set T is a
trivial one. By Definition LICV the set T is linearly independent. The important
feature of this example is how the “pattern of zeros and ones” in the four vectors
led to the conclusion of linear independence.                                     4
    The proof of Theorem BNS is really quite straightforward, and relies on the
“pattern of zeros and ones” that arise in the vectors zi , 1 ≤ i ≤ n − r in the entries
 that arise with the locations of the non-pivot columns. Play along with Example
 LINSB as you study the proof. Also, take a look at Example VFSAD, Example
VFSAI and Example VFSAL, especially at the conclusion of Step 2 (temporarily
 ignore the construction of the constant vector, c). This proof is also a good first
 example of how to prove a conclusion that states a set is linearly independent.
Theorem BNS Basis for Null Spaces
Suppose that A is an m × n matrix, and B is a row-equivalent matrix in reduced
row-echelon form with r pivot columns. Let D = {d1 , d2 , d3 , . . . , dr } and F =
{f1 , f2 , f3 , . . . , fn−r } be the sets of column indices where B does and does not
(respectively) have pivot columns. Construct the n − r vectors zj , 1 ≤ j ≤ n − r of
size n as
                                       
                                       
                                       1          if i ∈ F , i = fj
                               [zj ]i = 0          if i ∈ F , i 6= fj
                                       
                                       − [B]
                                              k,fj if i ∈ D, i = dk
    Define the set S = {z1 , z2 , z3 , . . . , zn−r }.Then

   1. N (A) = hSi.

   2. S is a linearly independent set.

Proof. Notice first that the vectors zj , 1 ≤ j ≤ n − r are exactly the same as the
n − r vectors defined in Theorem SSNS. Also, the hypotheses of Theorem SSNS are
the same as the hypotheses of the theorem we are currently proving. So it is then
simply the conclusion of Theorem SSNS that tells us that N (A) = hSi. That was
the easy half, but the second part is not much harder. What is new here is the claim
that S is a linearly independent set.
     To prove the linear independence of a set, we need to start with a relation of
linear dependence and somehow conclude that the scalars involved must all be zero,
i.e. that the relation of linear dependence only happens in the trivial fashion. So to
establish the linear independence of S, we start with
                      α1 z1 + α2 z2 + α3 z3 + · · · + αn−r zn−r = 0.
102                                  Ro b e rt B e e z e r                                  §L I

   For each j, 1 ≤ j ≤ n − r, consider the equality of the individual entries of the
vectors on both sides of this equality in position fj ,
      0 = [0]fj
       = [α1 z1 + α2 z2 + α3 z3 + · · · + αn−r zn−r ]fj                  Definition CVE
       = [α1 z1 ]fj + [α2 z2 ]fj + [α3 z3 ]fj + · · · + [αn−r zn−r ]fj   Definition CVA
       = α1 [z1 ]fj + α2 [z2 ]fj + α3 [z3 ]fj + · · · +
            αj−1 [zj−1 ]fj + αj [zj ]fj + αj+1 [zj+1 ]fj + · · · +
            αn−r [zn−r ]fj                                               Definition CVSM
       = α1 (0) + α2 (0) + α3 (0) + · · · +
            αj−1 (0) + αj (1) + αj+1 (0) + · · · + αn−r (0)              Definition of zj
       = αj
    So for all j, 1 ≤ j ≤ n − r, we have αj = 0, which is the conclusion that tells
us that the only relation of linear dependence on S = {z1 , z2 , z3 , . . . , zn−r } is the
trivial one. Hence, by Definition LICV the set is linearly independent, as desired.
Example NSLIL Null space spanned by linearly independent set, Archetype L
In Example VFSAL we previewed Theorem SSNS by finding a set of two vectors
such that their span was the null space for the matrix in Archetype L. Writing the
matrix as L, we have
                                        
                                         −1      2 
                                   *
                                                    +
                                      2  −2    
                                          
                          N (L) =      −2 ,  1 
                                     
                                         0     
                                      1
                                                    
                                                     
                                          0      1
Solving the homogeneous system LS(L, 0) resulted in recognizing x4 and x5 as the
free variables. So look in entries 4 and 5 of the two vectors above and notice the
pattern of zeros and ones that provides the linear independence of the set.     4

Reading Questions
1. Let S be the set of three vectors below.
                                         
                                     1     3      4 
                               S =  2  , −4 , −2
                                     −1    2      1 
   Is S linearly independent or linearly dependent? Explain why.
2. Let S be the set of three vectors below.
                                          
                                      1    3      4 
                                S = −1 , 2 ,  3 
                                      0    2     −4 
   Is S linearly independent or linearly dependent? Explain why.
3. Is the matrix below singular or nonsingular? Explain your answer using only the final
   conclusion you reached in the previous question, along with one new theorem.
                                                   
                                         1   3   4
                                      −1 2      3
                                         0   2 −4

Exercises

Determine if the sets of vectors in Exercises C20–C25 are linearly independent or linearly
dependent. When the set is linearly dependent, exhibit a nontrivial relation of linear
dependence.
                  
           1           2      1 
       †     −2 , −1 , 5
   C20
           1           3      0 
§L I                A First Course in Linear Algebra                                    103

             −1         3         7 
                              
           
            2       3      3
                                    
   C21†      ,      −1 ,   −6
            4
                                    
                                     
              2         3         4
                               
            −2         1       3      −5    4 
   C22†     −1 ,     0 ,   3 , −4 , 4
            −1        −1       6      −6    7 
                               
              1         3         2     1 
                       
           
           −2      3       1  0
           
                                          
                                           
   C23†      2 ,     1 ,    2  , 1
                              
                               −1 2
            5
           
                   2                  
                                          
                       −4
                                           
              3                   1     2
                                        −1 
                               
              1         3         4
                       
           
            2      2       4   2 
           
                                           
                                            
   C24†     −1 ,    −1 ,    −2   ,  −1
                              
                                  
                                2  −2
            0
           
                   2                   
                                           
                                            
              1         2         3      0
                             
              2         4        10 
                       
           
            1      −2     −7
           
                                    
                                     
   C25†      3 ,     1 ,   0
                            

            −1
           
                   3       10 
                                     
                                    
                                     
              2           2       4

C30† For the matrix B below, find a set      S that is linearly independent and spans the
null space of B, that is, N (B) = hSi.
                                                        
                                        −3   1   −2    7
                                 B = −1     2    1   4 
                                         1   1    2   −1

C31† For the matrix A below, find a linearly     independent set S so that the null space of
A is spanned by S, that is, N (A) = hSi.
                                     −1 −2        2   1   5
                                                           
                                   1    2        1   1   5
                               A=
                                      3  6        1   2   7
                                      2  4        0   1   2

C32† Find a set of column vectors, T , such that (1) the span of T is the null space of B,
hT i = N (B) and (2) T is a linearly independent set.
                                                        
                                       2    1    1    1
                                B = −4 −3       1    −7
                                       1    1   −1     3

C33† Find a set S so that S is linearly independent and N (A) = hSi, where N (A) is the
null space of the matrix A below.
                                                       
                                    2 3     3   1     4
                              A = 1 1 −1 −1 −3
                                    3 2 −8 −1         1

C50 Consider each archetype that is a system of equations and consider the solutions
listed for the homogeneous version of the archetype. (If only the trivial solution is listed,
then assume this is the only solution to the system.) From the solution set, determine if
the columns of the coefficient matrix form a linearly independent or linearly dependent
set. In the case of a linearly dependent set, use one of the sample solutions to provide
a nontrivial relation of linear dependence on the set of columns of the coefficient matrix
(Definition RLD). Indicate when Theorem MVSLD applies and connect this with the
number of variables and equations in the system of equations.

Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J
C51 For each archetype that is a system of equations consider the homogeneous version.
Write elements of the solution set in vector form (Theorem VFSLS) and from this extract
the vectors zj described in Theorem BNS. These vectors are used in a span construction
to describe the null space of the coefficient matrix for each archetype. What does it mean
when we write a null space as h{ }i?
104                                   Ro b e rt B e e z e r                               §L I


Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J
C52 For each archetype that is a system of equations consider the homogeneous version.
Sample solutions are given and a linearly independent spanning set is given for the null
space of the coefficient matrix. Write each of the sample solutions individually as a lin-
ear combination of the vectors in the spanning set for the null space of the coefficient matrix.

Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J
C60† For the matrix A below, find a set of vectors S so that (1) S is linearly independent,
and (2) the span of S equals the null space of A, hSi = N (A). (See Exercise SS.C60.)
                                                        
                                       1    1     6   −8
                               A= 1       −2     0    1
                                      −2    1   −6     7

M20†    Suppose that S = {v1 , v2 , v3 } is a set of three vectors from C873 . Prove that the
set
                   T = {2v1 + 3v2 + v3 , v1 − v2 − 2v3 , 2v1 + v2 − v3 }
is linearly dependent.
M21† Suppose that S = {v1 , v2 , v3 } is a linearly independent set of three vectors from
C873 . Prove that the set
                   T = {2v1 + 3v2 + v3 , v1 − v2 + 2v3 , 2v1 + v2 − v3 }
is linearly independent.
M50† Consider the set of vectors from C3 , W , given below. Find a set T that contains
three vectors from W and such that W = hT i.
                                            *         +
                                              2     −1     1     3      0 
          W = h{v1 , v2 , v3 , v4 , v5 }i =   1 , −1 , 2 , 1 ,  1 
                                              1      1     3     3     −3 

M51† Consider the subspace W = h{v1 , v2 , v3 , v4 }i. Find a set S so that (1) S is a
subset of W , (2) S is linearly independent, and (3) W = hSi. Write each vector not included
in S as a linear combination of the vectors that are in S.
                                                                         
                  1                     4                  −3                   2
          v1 = −1              v2 = −4           v3 =  2           v4 = 1
                  2                     8                  −7                   7

T10 Prove that if a set of vectors contains the zero vector, then the set is linearly
dependent. (Ed. “The zero vector is death to linearly independent sets.”)
T12 Suppose that S is a linearly independent set of vectors, and T is a subset of S, T ⊆ S
(Definition SSET). Prove that T is linearly independent.
T13 Suppose that T is a linearly dependent set of vectors, and T is a subset of S, T ⊆ S
(Definition SSET). Prove that S is linearly dependent.
T15†    Suppose that {v1 , v2 , v3 , . . . , vn } is a set of vectors. Prove that
                           {v1 − v2 , v2 − v3 , v3 − v4 , . . . , vn − v1 }
is a linearly dependent set.
T20†    Suppose that {v1 , v2 , v3 , v4 } is a linearly independent set in C35 . Prove that
                      {v1 , v1 + v2 , v1 + v2 + v3 , v1 + v2 + v3 + v4 }
is a linearly independent set.
T50† Suppose that A is an m × n matrix with linearly independent columns and the
linear system LS(A, b) is consistent. Show that this system has a unique solution. (Notice
that we are not requiring A to be square.)
Section LDS
Linear Dependence and Spans
In any linearly dependent set there is always one vector that can be written as a
linear combination of the others. This is the substance of the upcoming Theorem
DLDS. Perhaps this will explain the use of the word “dependent.” In a linearly
dependent set, at least one vector “depends” on the others (via a linear combination).
    Indeed, because Theorem DLDS is an equivalence (Proof Technique E) some
authors use this condition as a definition (Proof Technique D) of linear dependence.
Then linear independence is defined as the logical opposite of linear dependence. Of
course, we have chosen to take Definition LICV as our definition, and then follow
with Theorem DLDS as a theorem.

Subsection LDSS
Linearly Dependent Sets and Spans
If we use a linearly dependent set to construct a span, then we can always create
the same infinite set with a starting set that is one vector smaller in size. We will
illustrate this behavior in Example RSC5. However, this will not be possible if we
build a span from a linearly independent set. So in a certain sense, using a linearly
independent set to formulate a span is the best possible way — there are not any
extra vectors being used to build up all the necessary linear combinations. OK, here
is the theorem, and then the example.
Theorem DLDS Dependency in Linearly Dependent Sets
Suppose that S = {u1 , u2 , u3 , . . . , un } is a set of vectors. Then S is a linearly
dependent set if and only if there is an index t, 1 ≤ t ≤ n such that ut is a linear
combination of the vectors u1 , u2 , u3 , . . . , ut−1 , ut+1 , . . . , un .

Proof. (⇒) Suppose that S is linearly dependent, so there exists a nontrivial relation
of linear dependence by Definition LICV. That is, there are scalars, αi , 1 ≤ i ≤ n,
which are not all zero, such that
                         α1 u1 + α2 u2 + α3 u3 + · · · + αn un = 0.
Since the αi cannot all be zero, choose one, say αt , that is nonzero. Then,
       −1
ut =          (−αt ut )                                                   Property MICN
       αt
       −1
   =          (α1 u1 + · · · + αt−1 ut−1 + αt+1 ut+1 + · · · + αn un )    Theorem VSPCV
       αt
       −α1                   −αt−1          −αt+1                 −αn
   =           u1 + · · · +         ut−1 +         ut+1 + · · · +      un Theorem VSPCV
        αt                     αt             αt                   αt
                              αi
   Since the values of α        t
                                  are again scalars, we have expressed ut as a linear combi-
nation of the other elements of S.
   (⇐) Assume that the vector ut is a linear combination of the other vectors in
S. Write this linear combination, denoting the relevant scalars as β1 , β2 , . . . , βt−1 ,
βt+1 , . . . , βn , as
            ut = β1 u1 + β2 u2 + · · · + βt−1 ut−1 + βt+1 ut+1 + · · · + βn un
   Then we have
   β1 u1 + · · · + βt−1 ut−1 + (−1)ut + βt+1 ut+1 + · · · + βn un
         = ut + (−1)ut                                                  Theorem VSPCV
         = (1 + (−1)) ut                                                Property DSAC
         = 0ut                                                          Property AICN
         =0                                                             Definition CVSM
    So the scalars β1 , β2 , β3 , . . . , βt−1 , βt = −1, βt+1 , . . . , βn provide a nontrivial
linear combination of the vectors in S, thus establishing that S is a linearly dependent
set (Definition LICV).                                                                        

                                             105
106                             Ro b e rt B e e z e r                             §L D S

    This theorem can be used, sometimes repeatedly, to whittle down the size of a
set of vectors used in a span construction. We have seen some of this already in
Example SCAD, but in the next example we will detail some of the subtleties.
Example RSC5 Reducing a span in C5
Consider the set of n = 4 vectors from C5 ,
                                                               
                                       
                                         1          2        0       4 
                                                                        
                                       
                                        2        1      −7    1
                                                                        
                                                                
             R = {v1 , v2 , v3 , v4 } = −1 ,      3 ,    6 ,   2
                                       
                                        3        1     −11    1
                                                                        
                                       
                                                                       
                                                                        
                                          2          2        −2      6
and define V = hRi.
   To employ Theorem LIVHS, we form a 5 × 4 matrix, D,                 and row-reduce to
understand solutions to the homogeneous system LS(D, 0),
                                                                      
                       1 2     0   4           1   0    0              4
                     2 1 −7 1              0    1    0              0
                                    RREF                              
               D = −1 3       6   2 −−−−→  0   0    1              1
                                                                         
                     3 1 −11 1             0    0    0              0
                       2 2 −2 6                0   0    0              0
   We can find infinitely many solutions to this system, most of them nontrivial,
and we choose any one we like to build a relation of linear dependence on R. Let us
begin with x4 = 1, to find the solution
                                        
                                        −4
                                       0
                                       −1
                                         1
      So we can write the relation of linear dependence,
                          (−4)v1 + 0v2 + (−1)v3 + 1v4 = 0
    Theorem DLDS guarantees that we can solve this relation of linear dependence
for some vector in R, but the choice of which one is up to us. Notice however that
v2 has a zero coefficient. In this case, we cannot choose to solve for v2 . Maybe some
other relation of linear dependence would produce a nonzero coefficient for v2 if we
just had to solve for this vector. Unfortunately, this example has been engineered to
always produce a zero coefficient here, as you can see from solving the homogeneous
system. Every solution has x2 = 0!
    OK, if we are convinced that we cannot solve for v2 , let us instead solve for v3 ,
                      v3 = (−4)v1 + 0v2 + 1v4 = (−4)v1 + 1v4
      We now claim that this particular equation will allow us to write
                    V = hRi = h{v1 , v2 , v3 , v4 }i = h{v1 , v2 , v4 }i
in essence declaring v3 as surplus for the task of building V as a span. This claim
is an equality of two sets, so we will use Definition SE to establish it carefully. Let
R0 = {v1 , v2 , v4 } and V 0 = hR0 i. We want to show that V = V 0 .
    First show that V 0 ⊆ V . Since every vector of R0 is in R, any vector we can
construct in V 0 as a linear combination of vectors from R0 can also be constructed
as a vector in V by the same linear combination of the same vectors in R. That was
easy, now turn it around.
    Next show that V ⊆ V 0 . Choose any v from V . So there are scalars α1 , α2 , α3 , α4
such that
                    v = α1 v1 + α2 v2 + α3 v3 + α4 v4
                      = α1 v1 + α2 v2 + α3 ((−4)v1 + 1v4 ) + α4 v4
                      = α1 v1 + α2 v2 + ((−4α3 )v1 + α3 v4 ) + α4 v4
                      = (α1 − 4α3 ) v1 + α2 v2 + (α3 + α4 ) v4 .
      This equation says that v can then be written as a linear combination of the
§L D S              A First Course in Linear Algebra                               107

vectors in R0 and hence qualifies for membership in V 0 . So V ⊆ V 0 and we have
established that V = V 0 .
    If R0 was also linearly dependent (it is not), we could reduce the set even further.
Notice that we could have chosen to eliminate any one of v1 , v3 or v4 , but somehow v2
is essential to the creation of V since it cannot be replaced by any linear combination
of v1 , v3 or v4 .                                                                   4



Subsection COV
Casting Out Vectors
In Example RSC5 we used four vectors to create a span. With a relation of linear
dependence in hand, we were able to “toss out” one of these four vectors and create
the same span from a subset of just three vectors from the original set of four. We did
have to take some care as to just which vector we tossed out. In the next example,
we will be more methodical about just how we choose to eliminate vectors from a
linearly dependent set while preserving a span.
Example COV Casting out vectors
We begin with a set S containing seven vectors from C4 ,
                                                        
                 1       4       0     −1        0       7          −9 
                                                                       
                 2   8  −1  3   9  −13                  7
         S =  ,  ,  ,  ,  ,                         ,
              0
                         0       2     −3       −4      12        −8
                                                                        
                                                                        
                 −1      −4       2      4        8      −31         37
and define W = hSi.
    The set S is obviously linearly dependent by Theorem MVSLD, since we have
n = 7 vectors from C4 . So we can slim down S some, and still create W as the span
of a smaller set of vectors.
    As a device for identifying relations of linear dependence among the vectors of S,
we place the seven column vectors of S into a matrix as columns,
                                                                          
                                           1   4     0 −1 0        7    −9
                                         2    8 −1 3         9 −13 7 
         A = [A1 |A2 |A3 | . . . |A7 ] = 
                                           0   0     2 −3 −4 12 −8
                                          −1 −4 2        4    8 −31 37
    By Theorem SLSLC a nontrivial solution to LS(A, 0) will give us a nontrivial
relation of linear dependence (Definition RLDCV) on the columns of A (which are
the elements of the set S). The row-reduced form for A is the matrix
                                                         
                              1 4 0        0 2 1 −3
                            0 0 1         0 1 −3 5 
                       B=  0 0 0
                                                          
                                           1 2 −6 6 
                              0 0 0        0 0 0        0
so we can easily create solutions to the homogeneous system LS(A, 0) using the
free variables x2 , x5 , x6 , x7 . Any such solution will provide a relation of linear
dependence on the columns of B. These solutions will allow us to solve for one
column vector as a linear combination of some others, in the spirit of Theorem
DLDS, and remove that vector from the set. We will set about forming these linear
combinations methodically.
    Set the free variable x2 = 1, and set the other free variables to zero. Then a
solution to LS(A, 0) is
                                             
                                             −4
                                            1
                                             
                                            0
                                             
                                        x= 0 
                                            0
                                             
                                            0
                                              0
108                            Ro b e rt B e e z e r                           §L D S

which can be used to create the linear combination
                (−4)A1 + 1A2 + 0A3 + 0A4 + 0A5 + 0A6 + 0A7 = 0

    This can then be arranged and solved for A2 , resulting in A2 expressed as a
linear combination of {A1 , A3 , A4 },
                               A2 = 4A1 + 0A3 + 0A4

    This means that A2 is surplus, and we can create W just as well with a smaller
set with this vector removed,
                          W = h{A1 , A3 , A4 , A5 , A6 , A7 }i

  Technically, this set equality for W requires a proof, in the spirit of Example
RSC5, but we will bypass this requirement here, and in the next few paragraphs.

    Now, set the free variable x5 = 1, and set the other free variables to zero. Then
a solution to LS(B, 0) is
                                          
                                           −2
                                         0
                                          
                                         −1
                                          
                                     x = −2
                                         1
                                          
                                         0
                                            0
which can be used to create the linear combination
             (−2)A1 + 0A2 + (−1)A3 + (−2)A4 + 1A5 + 0A6 + 0A7 = 0

    This can then be arranged and solved for A5 , resulting in A5 expressed as a
linear combination of {A1 , A3 , A4 },
                               A5 = 2A1 + 1A3 + 2A4

    This means that A5 is surplus, and we can create W just as well with a smaller
set with this vector removed,
                            W = h{A1 , A3 , A4 , A6 , A7 }i

   Do it again, set the free variable x6 = 1, and set the other free variables to zero.
Then a solution to LS(B, 0) is
                                           
                                           −1
                                          0
                                           
                                          3
                                           
                                      x= 6 
                                          0
                                           
                                          1
                                            0
which can be used to create the linear combination
                (−1)A1 + 0A2 + 3A3 + 6A4 + 0A5 + 1A6 + 0A7 = 0

    This can then be arranged and solved for A6 , resulting in A6 expressed as a
linear combination of {A1 , A3 , A4 },
                           A6 = 1A1 + (−3)A3 + (−6)A4
This means that A6 is surplus, and we can create W just as well with a smaller set
with this vector removed,
                              W = h{A1 , A3 , A4 , A7 }i

      Set the free variable x7 = 1, and set the other free variables to zero. Then a
§L D S               A First Course in Linear Algebra                                    109

solution to LS(B, 0) is
                                              
                                             3
                                            0
                                             
                                            −5
                                             
                                        x = −6
                                            0
                                             
                                            0
                                             1
which can be used to create the linear combination
              3A1 + 0A2 + (−5)A3 + (−6)A4 + 0A5 + 0A6 + 1A7 = 0
    This can then be arranged and solved for A7 , resulting in A7 expressed as a
linear combination of {A1 , A3 , A4 },
                               A7 = (−3)A1 + 5A3 + 6A4
    This means that A7 is surplus, and we can create W just as well with a smaller
set with this vector removed,
                                  W = h{A1 , A3 , A4 }i
    You might think we could keep this up, but we have run out of free variables.
And not coincidentally, the set {A1 , A3 , A4 } is linearly independent (check this!).
It should be clear how each free variable was used to eliminate the a column from
the set used to span the column space, as this will be the essence of the proof of the
next theorem. The column vectors in S were not chosen entirely at random, they are
the columns of Archetype I. See if you can mimic this example using the columns of
Archetype J. Go ahead, we’ll go grab a cup of coffee and be back before you finish
up.
    For extra credit, notice that the vector
                                            
                                             3
                                           9
                                       b= 
                                             1
                                             4
is the vector of constants in the definition of Archetype I. Since the system LS(A, b)
is consistent, we know by Theorem SLSLC that b is a linear combination of the
columns of A, or stated equivalently, b ∈ W . This means that b must also be a
linear combination of just the three columns A1 , A3 , A4 . Can you find such a linear
combination? Did you notice that there is just a single (unique) answer? Hmmmm.
4
  Example COV deserves your careful attention, since this important example
motivates the following very fundamental theorem.
Theorem BS Basis of a Span
Suppose that S = {v1 , v2 , v3 , . . . , vn } is a set of column vectors. Define W = hSi
and let A be the matrix whose columns are the vectors from S. Let B be the reduced
row-echelon form of A, with D = {d1 , d2 , d3 , . . . , dr } the set of indices for the pivot
columns of B. Then

   1. T = {vd1 , vd2 , vd3 , . . . vdr } is a linearly independent set.

   2. W = hT i.

Proof. To prove that T is linearly independent, begin with a relation of linear
dependence on T ,
                       0 = α1 vd1 + α2 vd2 + α3 vd3 + . . . + αr vdr
and we will try to conclude that the only possibility for the scalars αi is that they are
all zero. Denote the non-pivot columns of B by F = {f1 , f2 , f3 , . . . , fn−r }. Then
we can preserve the equality by adding a big fat zero to the linear combination,
  0 = α1 vd1 + α2 vd2 + α3 vd3 + . . . + αr vdr + 0vf1 + 0vf2 + 0vf3 + . . . + 0vfn−r
110                                 Ro b e rt B e e z e r                                §L D S

    By Theorem SLSLC, the scalars in this linear combination (suitably reordered)
are a solution to the homogeneous system LS(A, 0). But notice that this is the
solution obtained by setting each free variable to zero. If we consider the description of
a solution vector in the conclusion of Theorem VFSLS, in the case of a homogeneous
system, then we see that if all the free variables are set to zero the resulting solution
vector is trivial (all zeros). So it must be that αi = 0, 1 ≤ i ≤ r. This implies by
Definition LICV that T is a linearly independent set.
    The second conclusion of this theorem is an equality of sets (Definition SE).
Since T is a subset of S, any linear combination of elements of the set T can also
be viewed as a linear combination of elements of the set S. So hT i ⊆ hSi = W . It
remains to prove that W = hSi ⊆ hT i.
    For each k, 1 ≤ k ≤ n − r, form a solution x to LS(A, 0) by setting the free
variables as follows:
       xf1 = 0        xf2 = 0      xf3 = 0       ...      xfk = 1       ...      xfn−r = 0
      By Theorem VFSLS, the remainder of this solution vector is given by,
     xd1 = − [B]1,fk       xd2 = − [B]2,fk      xd3 = − [B]3,fk       ...     xdr = − [B]r,fk
      From this solution, we obtain a relation of linear dependence on the columns of
A,
         − [B]1,fk vd1 − [B]2,fk vd2 − [B]3,fk vd3 − . . . − [B]r,fk vdr + 1vfk = 0
which can be arranged as the equality
               vfk = [B]1,fk vd1 + [B]2,fk vd2 + [B]3,fk vd3 + . . . + [B]r,fk vdr
    Now, suppose we take an arbitrary element, w, of W = hSi and write it as a
linear combination of the elements of S, but with the terms organized according to
the indices in D and F ,
         w = α1 vd1 + α2 vd2 + . . . + αr vdr + β1 vf1 + β2 vf2 + . . . + βn−r vfn−r
      From the above, we can replace each vfj by a linear combination of the vdi ,
       w = α1 vd1 + α2 vd2 + . . . + αr vdr +
                                                                       
        β1 [B]1,f1 vd1 + [B]2,f1 vd2 + [B]3,f1 vd3 + . . . + [B]r,f1 vdr +
                                                                       
        β2 [B]1,f2 vd1 + [B]2,f2 vd2 + [B]3,f2 vd3 + . . . + [B]r,f2 vdr +
                ..
                 .
                                                                                   
         βn−r [B]1,fn−r vd1 + [B]2,fn−r vd2 + [B]3,fn−r vd3 + . . . + [B]r,fn−r vdr

With repeated applications of several of the properties of Theorem VSPCV we can
rearrange this expression as,
                                                                           
     = α1 + β1 [B]1,f1 + β2 [B]1,f2 + β3 [B]1,f3 + . . . + βn−r [B]1,fn−r vd1 +
                                                                          
         α2 + β1 [B]2,f1 + β2 [B]2,f2 + β3 [B]2,f3 + . . . + βn−r [B]2,fn−r vd2 +
                ..
                 .
                                                                              
             αr + β1 [B]r,f1 + β2 [B]r,f2 + β3 [B]r,f3 + . . . + βn−r [B]r,fn−r vdr
This mess expresses the vector w as a linear combination of the vectors in
                                 T = {vd1 , vd2 , vd3 , . . . vdr }
thus saying that w ∈ hT i. Therefore, W = hSi ⊆ hT i.                                           

    In Example COV, we tossed-out vectors one at a time. But in each instance,
we rewrote the offending vector as a linear combination of those vectors with the
column indices of the pivot columns of the reduced row-echelon form of the matrix
of columns. In the proof of Theorem BS, we accomplish this reduction in one big
step. In Example COV we arrived at a linearly independent set at exactly the same
§L D S               A First Course in Linear Algebra                             111

moment that we ran out of free variables to exploit. This was not a coincidence, it
is the substance of our conclusion of linear independence in Theorem BS.
    Here is a straightforward application of Theorem BS.
Example RSC4 Reducing a span in C4
Begin with a set of five vectors from C4 ,
                                                  
                             1     2      2       7       0 
                                                            
                             1 2  0        1      2
                      S =  ,  ,  ,         −1 ,   5
                            2
                                   4      −1                
                                                             
                              1     2      1       4       1
and let W = hSi. To arrive at a (smaller) linearly independent set, follow the
procedure described in Theorem BS. Place the vectors from S into a matrix as
columns, and row-reduce,
                                                          
                1 2 2      7 0             1 2 0 1 2
              1 2 0       1 2 RREF  0 0 1 3 −1
              2 4 −1 −1 5 −−−−→       0 0 0 0 0
                                                             
                1 2 1      4 1             0 0 0 0 0
      Columns 1 and 3 are the pivot columns   (D = {1, 3}) so the set
                                            
                                       1       2 
                                                  
                                       1    0
                                T =  ,      −1
                                      2
                                                  
                                                   
                                        1       1
is linearly independent and hT i = hSi = W . Boom!
     Since the reduced row-echelon form of a matrix is unique (Theorem RREFU),
the procedure of Theorem BS leads us to a unique set T . However, there is a wide
variety of possibilities for sets T that are linearly independent and which can be
employed in a span to create W . Without proof, we list two other possibilities:
                                         
                                         2      2 
                                                    
                                   0    2  0 
                                 T =  ,  
                                       4
                                               −1  
                                          2      1
                                         
                                         3     −1 
                                                    
                                   ∗    1  1 
                                 T =  ,  
                                       1
                                                3  
                                          2      0
   Can you prove that T 0 and T ∗ are linearly independent sets and W = hSi =
hT i = hT ∗ i?
  0
                                                                            4
Example RES Reworking elements of a span
Begin with a set of five vectors from C4 ,
                                                     
                           2      −1      −8  3           −10 
                                                               
                          1  1  −1  1              −1 
                   R =  ,  ,  ,  ,                 −1 
                         3
                                   0      −9 −1                
                                                                
                            2       1      −4 −2             4
      It is easy to create elements of X = hRi — we will create one at random,
                                                          
                    2           −1        −8       3         −10         9
                   1         1       −1    1         −1   2 
             y = 6   + (−7)   + 1   + 6   + 2              =
                    3            0        −9      −1          −1   1 
                    2            1        −4      −2           4        −3
   We know we can replace R by a smaller set (since it is obviously linearly dependent
by Theorem MVSLD) that will create the same span. Here goes,
                                                                       
           2 −1 −8 3 −10                        1     0 −3 0 −1
          1 1 −1 1            −1  RREF  0          1     2    0      2
          3 0 −9 −1 −1  −−−−→             0
                                                                          
                                                      0     0    1 −2
           2 1 −4 −2            4               0     0     0    0      0
112                              Ro b e rt B e e z e r                           §L D S

      So, if we collect the first, second and fourth vectors from R,
                                           
                                        2     −1      3 
                                                          
                                        1  1   1 
                                P =  ,  ,  
                                       3
                                               0     −1  
                                         2      1     −2
then P is linearly independent and hP i = hRi = X by Theorem BS. Since we built
y as an element of hRi it must also be an element of hP i. Can we write y as a linear
combination of just the three vectors in P ? The answer is, of course, yes. But let us
compute an explicit linear combination just for fun. By Theorem SLSLC we can get
such a linear combination by solving a system of equations with the column vectors
of R as the columns of a coefficient matrix, and y as the vector of constants.
   Employing an augmented matrix to solve this system,
                                                               
                    2 −1 3        9             1    0     0    1
                  1 1       1    2  RREF  0       1     0 −1
                  3 0 −1 1  −−−−→          0
                                                                  
                                                     0     1    2
                    2 1 −2 −3                   0    0     0    0
      So we see, as expected, that
                                          
                         2          −1      3     9
                        1        1     1 2
                      1   + (−1)   + 2   =   = y
                         3          0       −1    1
                         2          1       −2    −3
    A key feature of this example is that the linear combination that expresses y as a
linear combination of the vectors in P is unique. This is a consequence of the linear
independence of P . The linearly independent set P is smaller than R, but still just
(barely) big enough to create elements of the set X = hRi. There are many, many
ways to write y as a linear combination of the five vectors in R (the appropriate
system of equations to verify this claim yields two free variables in the description
of the solution set), yet there is precisely one way to write y as a linear combination
of the three vectors in P .                                                          4


Reading Questions

1. Let S be the linearly dependent set of three vectors below.
                                        1       1       5 
                                                     
                                  
                                     10  1  23 
                                                           
                              S=            ,  , 
                                   100         1      203 
                                                          
                                                           
                                      1000      1     2003
   Write one vector from S as a linear combination of the other two and include this
   vector equality in your response. (You should be able to do this on sight, rather than
   doing some computations.) Convert this expression into a nontrivial relation of linear
   dependence on S.
2. Explain why the word “dependent” is used in the definition of linear dependence.
3. Suppose that Y = hP i = hQi, where P is a linearly dependent set and Q is linearly
   independent. Would you rather use P or Q to describe Y ? Why?


Exercises
C20† Let T be the set of columns of the matrix B below. Define W = hT i. Find a set R
so that (1) R has 3 vectors, (2) R is a subset of T , and (3) W = hRi.
                                                         
                                        −3 1 −2         7
                                 B = −1 2        1     4 
                                         1   1    2    −1

C40 Verify that the set R0 = {v1 , v2 , v4 } at the end of Example RSC5 is linearly
independent.
§L D S                A First Course in Linear Algebra                                  113

C50† Consider the set of vectors from C3 , W , given below. Find a linearly independent
set T that contains three vectors from W and such that hW i = hT i.
                                                     
                                             2     −1     1     3      0 
             W = {v1 , v2 , v3 , v4 , v5 } = 1 , −1 , 2 , 1 ,  1 
                                             1      1     3     3     −3 

C51†     Given the set S below, find a linearly independent set T so that hT i = hSi.
                                       
                                 2         3      1      5 
                           S = −1 , 0 ,  1  , −1
                                 2         1     −1      3 

C52† Let W be the span of the set of vectors S below, W = hSi. Find a set T so that 1)
the span of T is W , hT i = W , (2) T is a linearly independent set, and (3) T is a subset of
S.
                                     
                             1          2        4       3    3 
                        S =  2  , −3 ,  1  , 1 , −1
                             −1         1       −1       1    0 
                                                   
                                             1         3    4     3 
C55† Let T be the set of vectors T = −1 , 0 , 2 , 0 . Find two different
                                             2         1    3     6 
subsets of T , named R and S, so that R and S each contain three vectors, and so that
hRi = hT i and hSi = hT i. Prove that both R and S are linearly independent.
C70 Reprise Example RES by creating a new version of the vector y. In other words,
form a new, different linear combination of the vectors in R to create a new vector y (but
do not simplify the problem too much by choosing any of the five new scalars to be zero).
Then express this new y as a combination of the vectors in P .
M10 At the conclusion of Example RSC4 two alternative solutions, sets T 0 and T ∗ , are
proposed. Verify these claims by proving that hT i = hT 0 i and hT i = hT ∗ i.
T40† Suppose that v1 and v2 are any two vectors from Cm . Prove the following set
equality.
                             h{v1 , v2 }i = h{v1 + v2 , v1 − v2 }i
114   Ro b e rt B e e z e r   §L D S
Section O
Orthogonality
In this section we define a couple more operations with vectors, and prove a few
theorems. At first blush these definitions and results will not appear central to what
follows, but we will make use of them at key points in the remainder of the course
(such as Section MINM, Section OD). Because we have chosen to use C as our set of
scalars, this subsection is a bit more, uh, . . . complex than it would be for the real
numbers. We will explain as we go along how things get easier for the real numbers
R. If you have not already, now would be a good time to review some of the basic
properties of arithmetic with complex numbers described in Section CNO. With
that done, we can extend the basics of complex number arithmetic to our study of
vectors in Cm .

Subsection CAV
Complex Arithmetic and Vectors
We know how the addition and multiplication of complex numbers is employed in
defining the operations for vectors in Cm (Definition CVA and Definition CVSM).
We can also extend the idea of the conjugate to vectors.
Definition CCCV Complex Conjugate of a Column Vector
Suppose that u is a vector from Cm . Then the conjugate of the vector, u, is defined
by
                      [u]i = [u]i                     1≤i≤m
                                                                                     
   With this definition we can show that the conjugate of a column vector behaves
as we would expect with regard to vector addition and scalar multiplication.
Theorem CRVA Conjugation Respects Vector Addition
Suppose x and y are two vectors from Cm . Then
                                     x+y =x+y

Proof. For each 1 ≤ i ≤ m,
               [x + y]i = [x + y]i                   Definition CCCV
                        = [x]i + [y]i                Definition CVA
                        = [x]i + [y]i                Theorem CCRA
                        = [x]i + [y]i                Definition CCCV
                        = [x + y]i                   Definition CVA
   Then by Definition CVE we have x + y = x + y.                                     

Theorem CRSM Conjugation Respects Vector Scalar Multiplication
Suppose x is a vector from Cm , and α ∈ C is a scalar. Then
                                        αx = α x

Proof. For 1 ≤ i ≤ m,
                  [αx]i = [αx]i                    Definition CCCV
                        = α [x]i                   Definition CVSM
                        = α [x]i                   Theorem CCRM
                        = α [x]i                   Definition CCCV
                        = [α x]i                   Definition CVSM


   Then by Definition CVE we have αx = α x.                                          

                                          115
116                             Ro b e rt B e e z e r                              §O

   These two theorems together tell us how we can “push” complex conjugation
through linear combinations.

Subsection IP
Inner products
Definition IP Inner Product
Given the vectors u, v ∈ Cm the inner product of u and v is the scalar quantity
in C,
                                                                       m
                                                                       X
      hu, vi = [u]1 [v]1 + [u]2 [v]2 + [u]3 [v]3 + · · · + [u]m [v]m =   [u]i [v]i
                                                                      i=1
                                                                                    
    This operation is a bit different in that we begin with two vectors but produce a
scalar. Computing one is straightforward.
Example CSIP Computing some inner products
The inner product of
                "        #                                    "     #
                  2 + 3i                                     1 + 2i
            u = 5 + 2i         and                       v = −4 + 5i
                 −3 + i                                      0 + 5i
is
           hu, vi = (2 + 3i)(1 + 2i) + (5 + 2i)(−4 + 5i) + (−3 + i)(0 + 5i)
                  = (2 − 3i)(1 + 2i) + (5 − 2i)(−4 + 5i) + (−3 − i)(0 + 5i)
                  = (8 + i) + (−10 + 33i) + (5 − 15i)
                  = 3 + 19i
      The inner product of
                                                                
                                                                  
                       2                                        3
                     4                                      1
                                                             
                w = −3                 and                x= 0 
                     2                                      −1
                       8                                       −2
is
                hw, xi = (2)3 + (4)1 + (−3)0 + (2)(−1) + (8)(−2)
                       = 2(3) + 4(1) + (−3)0 + 2(−1) + 8(−2) = −8.
                                                                                    4
    In the case where the entries of our vectors are all real numbers (as in the second
part of Example CSIP), the computation of the inner product may look familiar and
be known to you as a dot product or scalar product. So you can view the inner
product as a generalization of the scalar product to vectors from Cm (rather than
Rm ).
    Note that we have chosen to conjugate the entries of the first vector listed in the
inner product, while it is almost equally feasible to conjugate entries from the second
vector instead. In particular, prior to Version 2.90, we did use the latter definition,
and this has now changed to the former, with resulting adjustments propogated
up through Section CB (only). However, conjugating the first vector leads to much
nicer formulas for certain matrix decompositions and also shortens some proofs.
    There are several quick theorems we can now prove, and they will each be useful
later.
Theorem IPVA Inner Product and Vector Addition
Suppose u, v, w ∈ Cm . Then

     1. hu + v, wi = hu, wi + hv, wi

     2. hu, v + wi = hu, vi + hu, wi
§O               A First Course in Linear Algebra                                       117

Proof. The proofs of the two parts are very similar, with the second one requiring
just a bit more effort due to the conjugation that occurs. We will prove part 1 and
you can prove part 2 (Exercise O.T10).
                         m
                         X
         hu + v, wi =          [u + v]i [w]i                           Definition IP
                         i=1
                         Xm               
                    =           [u]i + [v]i [w]i                       Definition CVA
                         i=1
                         Xm               
                    =           [u]i + [v]i [w]i                       Theorem CCRA
                         i=1
                         Xm
                    =          [u]i [w]i + [v]i [w]i                   Property DCN
                         i=1
                         Xm                    m
                                               X
                    =          [u]i [w]i +           [v]i [w]i         Property CACN
                         i=1                   i=1
                    = hu, wi + hv, wi                                  Definition IP
                                                                                         
Theorem IPSM Inner Product and Scalar Multiplication
Suppose u, v ∈ Cm and α ∈ C. Then

  1. hαu, vi = α hu, vi
  2. hu, αvi = α hu, vi

Proof. The proofs of the two parts are very similar, with the second one requiring
just a bit more effort due to the conjugation that occurs. We will prove part 1 and
you can prove part 2 (Exercise O.T11).
                            m
                            X
              hαu, vi =           [αu]i [v]i                     Definition IP
                            i=1
                            Xm
                        =         α [u]i [v]i                    Definition CVSM
                            i=1
                            Xm
                        =         α [u]i [v]i                    Theorem CCRM
                            i=1
                              Xm
                        =α           [u]i [v]i                   Property DCN
                               i=1
                        = α hu, vi                               Definition IP
                                                                                         
Theorem IPAC Inner Product is Anti-Commutative
Suppose that u and v are vectors in Cm . Then hu, vi = hv, ui.
Proof.
                         m
                         X
              hu, vi =         [u]i [v]i                         Definition IP
                         i=1
                         Xm
                    =          [u]i [v]i                         Theorem CCT
                         i=1
                         Xm
                    =          [u]i [v]i                         Theorem CCRM
                         i=1
                            m
                                              !
                            X
                    =             [u]i [v]i                      Theorem CCRA
                            i=1
118                                            Ro b e rt B e e z e r                       §O

                                   m
                                                     !
                                   X
                          =              [v]i [u]i                     Property CMCN
                                   i=1

                          = hv, ui                                     Definition IP


                                                                                           


Subsection N
Norm
If treating linear algebra in a more geometric fashion, the length of a vector occurs
naturally, and is what you would expect from its name. With complex numbers, we
will define a similar function. Recall that if c is a complex number, then |c| denotes
its modulus (Definition MCN).
Definition NV Norm of a Vector
The norm of the vector u is the scalar quantity in C
                                                                         v
                     q                                                   um
                                                                         uX
                         |[u]1 | + |[u]2 | + |[u]3 | + · · · + |[u]m | = t
                                2         2         2                 2            2
             kuk =                                                          |[u]i |
                                                                                  i=1

                                                                                           
      Computing a norm is also easy to do.
Example CNSV Computing the norm of some vectors
The norm of
                                    
                               3 + 2i
                              1 − 6i
                          u=
                               2 + 4i
                                2+i
is
                                 q
                                                2            2            2        2
                     kuk =           |3 + 2i| + |1 − 6i| + |2 + 4i| + |2 + i|
                                 √                       √       √
                             =       13 + 37 + 20 + 5 = 75 = 5 3
      The norm of
                                                          
                                                         3
                                                       −1
                                                        
                                                     v= 2 
                                                       4
                                                        −3
is
             q                               p                        √
                 2      2     2     2      2
     kvk =    |3| + |−1| + |2| + |4| + |−3| = 32 + 12 + 22 + 42 + 32 = 39.
                                                                                           4
   Notice how the norm of a vector with real number entries is just the length of
the vector. Inner products and norms are related by the following theorem.
Theorem IPN Inner Products and Norms
                                           2
Suppose that u is a vector in Cm . Then kuk = hu, ui.

Proof.
                      v           2
                       um
                       uX
                kuk = t
                     2           2
                          |[u]i |                                     Definition NV
                                     i=1
                             m
                             X             2
                         =         |[u]i |                             Inverse functions
                             i=1
§O                 A First Course in Linear Algebra                                 119

                        m
                        X
                    =         [u]i [u]i                      Definition MCN
                        i=1
                    = hu, ui                                 Definition IP
                                                                                      

   When our vectors have entries only from the real numbers Theorem IPN says
that the dot product of a vector with itself is equal to the length of the vector
squared.
Theorem PIP Positive Inner Products
Suppose that u is a vector in Cm . Then hu, ui ≥ 0 with equality if and only if u = 0.

Proof. From the proof of Theorem IPN we see that
                                     2          2       2                2
                  hu, ui = |[u]1 | + |[u]2 | + |[u]3 | + · · · + |[u]m |
    Since each modulus is squared, every term is positive, and the sum must also be
positive. (Notice that in general the inner product is a complex number and cannot
be compared with zero, but in the special case of hu, ui the result is a real number.)
    The phrase, “with equality if and only if” means that we want to show that
the statement hu, ui = 0 (i.e. with equality) is equivalent (“if and only if”) to the
statement u = 0.
    If u = 0, then it is a straightforward computation to see that hu, ui = 0. In the
other direction, assume that hu, ui = 0. As before, hu, ui is a sum of moduli. So we
have
                                          2         2       2                2
                0 = hu, ui = |[u]1 | + |[u]2 | + |[u]3 | + · · · + |[u]m |
   Now we have a sum of squares equaling zero, so each term must be zero. Then
by similar logic, |[u]i | = 0 will imply that [u]i = 0, since 0 + 0i is the only complex
number with zero modulus. Thus every entry of u is zero and so u = 0, as desired.

     Notice that Theorem PIP contains three implications:
                                      u ∈ Cm ⇒ hu, ui ≥ 0
                                          u = 0 ⇒ hu, ui = 0
                                   hu, ui = 0 ⇒ u = 0
   The results contained in Theorem PIP are summarized by saying “the inner
product is positive definite.”

Subsection OV
Orthogonal Vectors
“Orthogonal” is a generalization of “perpendicular.” You may have used mutually
perpendicular vectors in a physics class, or you may recall from a calculus class that
perpendicular vectors have a zero dot product. We will now extend these ideas into
the realm of higher dimensions and complex scalars.
Definition OV Orthogonal Vectors
A pair of vectors, u and v, from Cm are orthogonal if their inner product is zero,
that is, hu, vi = 0.                                                            
Example TOV Two orthogonal vectors
The vectors
                                                                  
                 2 + 3i                                         1−i
                4 − 2i                                      2 + 3i
            u=                                             v=
                  1+i                                         4 − 6i
                  1+i                                            1
are orthogonal since
       hu, vi = (2 − 3i)(1 − i) + (4 + 2i)(2 + 3i) + (1 − i)(4 − 6i) + (1 − i)(1)
              = (−1 − 5i) + (2 + 16i) + (−2 − 10i) + (1 − i)
120                                   Ro b e rt B e e z e r                                 §O

                = 0 + 0i.
   We extend this definition to whole sets by requiring vectors to be pairwise
orthogonal. Despite using the same word, careful thought about what objects you
are using will eliminate any source of confusion.
Definition OSV Orthogonal Set of Vectors
Suppose that S = {u1 , u2 , u3 , . . . , un } is a set of vectors from Cm . Then S is
an orthogonal set if every pair of different vectors from S is orthogonal, that is
hui , uj i = 0 whenever i 6= j.                                                    
      We now define the prototypical orthogonal set, which we will reference repeatedly.
Definition SUV Standard Unit Vectors
Let ej ∈ Cm , 1 ≤ j ≤ m denote the column vectors defined by
                                       (
                                        0 if i 6= j
                              [ej ]i =
                                        1 if i = j
      Then the set
                           {e1 , e2 , e3 , . . . , em } = { ej | 1 ≤ j ≤ m}
is the set of standard unit vectors in Cm .                                                 
    Notice that ej is identical to column j of the m×m identity matrix Im (Definition
IM) and is a pivot column for Im , since the identity matrix is in reduced row-echelon
form. These observations will often be useful. We will reserve the notation ei for these
vectors. It is not hard to see that the set of standard unit vectors is an orthogonal
set.
Example SUVOS Standard Unit Vectors are an Orthogonal Set
Compute the inner product of two distinct vectors from the set of standard unit
vectors (Definition SUV), say ei , ej , where i 6= j,
            hei , ej i = 00 + 00 + · · · + 10 + · · · + 00 + · · · + 01 + · · · + 00 + 00
                     = 0(0) + 0(0) + · · · + 1(0) + · · · + 0(1) + · · · + 0(0) + 0(0)
                     =0
      So the set {e1 , e2 , e3 , . . . , em } is an orthogonal set.                         4
Example AOS An orthogonal set
The set
                                                                               
                              1+i              1 + 5i       −7 + 34i          −2 − 4i 
                                                                                       
                               1             6 + 5i     −8 − 23i         6+i 
      {x1 , x2 , x3 , x4 } =      ,         −7 − i ,   −10 + 22i ,       4 + 3i 
                             1−i
                                                                                       
                                                                                        
                                i               1 − 6i       30 + 13i            6−i
is an orthogonal set.
    Since the inner product is anti-commutative (Theorem IPAC) we can test pairs
of different vectors in any order. If the result is zero, then it will also be zero if the
inner product is computed in the opposite order. This means there are six different
pairs of vectors to use in an inner product computation. We will do two and you
can practice your inner products on the other four.
hx1 , x3 i = (1 − i)(−7 + 34i) + (1)(−8 − 23i) + (1 + i)(−10 + 22i) + (−i)(30 + 13i)
            = (27 + 41i) + (−8 − 23i) + (−32 + 12i) + (13 − 30i)
            = 0 + 0i
and
hx2 , x4 i = (1 − 5i)(−2 − 4i) + (6 − 5i)(6 + i) + (−7 + i)(4 + 3i) + (1 + 6i)(6 − i)
            = (−22 + 6i) + (41 − 24i) + (−31 − 17i) + (12 + 35i)
            = 0 + 0i
   So far, this section has seen lots of definitions, and lots of theorems establishing
un-surprising consequences of those definitions. But here is our first theorem that
§O                    A First Course in Linear Algebra                                          121

suggests that inner products and orthogonal vectors have some utility. It is also one
of our first illustrations of how to arrive at linear independence as the conclusion of
a theorem.
Theorem OSLI Orthogonal Sets are Linearly Independent
Suppose that S is an orthogonal set of nonzero vectors. Then S is linearly independent.

Proof. Let S = {u1 , u2 , u3 , . . . , un } be an orthogonal set of nonzero vectors. To
prove the linear independence of S, we can appeal to the definition (Definition LICV)
and begin with an arbitrary relation of linear dependence (Definition RLDCV),
                          α1 u1 + α2 u2 + α3 u3 + · · · + αn un = 0.
     Then, for every 1 ≤ i ≤ n, we have
  αi hui , ui i
         = α1 (0) + α2 (0) + · · · + αi hui , ui i + · · · + αn (0)        Property ZCN
         = α1 hui , u1 i + · · · + αi hui , ui i + · · · + αn hui , un i   Definition OSV
         = hui , α1 u1 i + hui , α2 u2 i + · · · + hui , αn un i           Theorem IPSM
         = hui , α1 u1 + α2 u2 + α3 u3 + · · · + αn un i                   Theorem IPVA
         = hui , 0i                                                        Definition RLDCV
         =0                                                                Definition IP
    Because ui was assumed to be nonzero, Theorem PIP says hui , ui i is nonzero
and thus αi must be zero. So we conclude that αi = 0 for all 1 ≤ i ≤ n in any
relation of linear dependence on S. But this says that S is a linearly independent
set since the only way to form a relation of linear dependence is the trivial way
(Definition LICV). Boom!                                                        


Subsection GSP
Gram-Schmidt Procedure
The Gram-Schmidt Procedure is really a theorem. It says that if we begin with a
linearly independent set of p vectors, S, then we can do a number of calculations
with these vectors and produce an orthogonal set of p vectors, T , so that hSi = hT i.
Given the large number of computations involved, it is indeed a procedure to do all
the necessary computations, and it is best employed on a computer. However, it also
has value in proofs where we may on occasion wish to replace a linearly independent
set by an orthogonal set.
    This is our first occasion to use the technique of “mathematical induction” for a
proof, a technique we will see again several times, especially in Chapter D. So study
the simple example described in Proof Technique I first.
Theorem GSP Gram-Schmidt Procedure
Suppose that S = {v1 , v2 , v3 , . . . , vp } is a linearly independent set of vectors in
Cm . Define the vectors ui , 1 ≤ i ≤ p by
                  hu1 , vi i      hu2 , vi i      hu3 , vi i               hui−1 , vi i
     ui = vi −               u1 −            u2 −            u3 − · · · −                ui−1
                  hu1 , u1 i      hu2 , u2 i      hu3 , u3 i              hui−1 , ui−1 i
   Let T = {u1 , u2 , u3 , . . . , up }. Then T is an orthogonal set of nonzero vectors,
and hT i = hSi.

Proof. We will prove the result by using induction on p (Proof Technique I). To
begin, we prove that T has the desired properties when p = 1. In this case u1 = v1
and T = {u1 } = {v1 } = S. Because S and T are equal, hSi = hT i. Equally trivial,
T is an orthogonal set. If u1 = 0, then S would be a linearly dependent set, a
contradiction.
   Suppose that the theorem is true for any set of p − 1 linearly independent
vectors. Let S = {v1 , v2 , v3 , . . . , vp } be a linearly independent set of p vectors.
Then S 0 = {v1 , v2 , v3 , . . . , vp−1 } is also linearly independent. So we can apply
122                                  Ro b e rt B e e z e r                                    §O

the theorem to S 0 and construct the vectors T 0 = {u1 , u2 , u3 , . . . , up−1 }. T 0 is
therefore an orthogonal set of nonzero vectors and hS 0 i = hT 0 i. Define
                hu1 , vp i      hu2 , vp i      hu3 , vp i               hup−1 , vp i
   up = vp −               u1 −            u2 −            u3 − · · · −                up−1
                hu1 , u1 i      hu2 , u2 i      hu3 , u3 i              hup−1 , up−1 i
and let T = T 0 ∪ {up }. We need to now show that T has several properties by
building on what we know about T 0 . But first notice that the above equation has
no problems with the denominators (hui , ui i) being zero, since the ui are from T 0 ,
which is composed of nonzero vectors.
   We show that hT i = hSi, by first establishing that hT i ⊆ hSi. Suppose x ∈ hT i,
so
                            x = a1 u1 + a2 u2 + a3 u3 + · · · + ap up
The term ap up is a linear combination of vectors from T 0 and the vector vp , while the
remaining terms are a linear combination of vectors from T 0 . Since hT 0 i = hS 0 i, any
term that is a multiple of a vector from T 0 can be rewritten as a linear combination
of vectors from S 0 . The remaining term ap vp is a multiple of a vector in S. So we
see that x can be rewritten as a linear combination of vectors from S, i.e. x ∈ hSi.
    To show that hSi ⊆ hT i, begin with y ∈ hSi, so
                            y = a1 v1 + a2 v2 + a3 v3 + · · · + ap vp
    Rearrange our defining equation for up by solving for vp . Then the term ap vp
is a multiple of a linear combination of elements of T . The remaining terms are a
linear combination of v1 , v2 , v3 , . . . , vp−1 , hence an element of hS 0 i = hT 0 i. Thus
these remaining terms can be written as a linear combination of the vectors in T 0 .
So y is a linear combination of vectors from T , i.e. y ∈ hT i.
    The elements of T 0 are nonzero, but what about up ? Suppose to the contrary
that up = 0,
                  hu1 , vp i        hu2 , vp i        hu3 , vp i               hup−1 , vp i
 0 = up = vp −               u1 −              u2 −              u3 − · · · −                up−1
                  hu1 , u1 i        hu2 , u2 i        hu3 , u3 i              hup−1 , up−1 i
       hu1 , vp i        hu2 , vp i        hu3 , vp i               hup−1 , vp i
  vp =            u1 +              u2 +              u3 + · · · +                up−1
       hu1 , u1 i        hu2 , u2 i        hu3 , u3 i              hup−1 , up−1 i
    Since hS 0 i = hT 0 i we can write the vectors u1 , u2 , u3 , . . . , up−1 on the right side
of this equation in terms of the vectors v1 , v2 , v3 , . . . , vp−1 and we then have the
vector vp expressed as a linear combination of the other p − 1 vectors in S, implying
that S is a linearly dependent set (Theorem DLDS), contrary to our lone hypothesis
about S.
    Finally, it is a simple matter to establish that T is an orthogonal set, though it
will not appear so simple looking. Think about your objects as you work through
the following — what is a vector and what is a scalar. Since T 0 is an orthogonal set
by induction, most pairs of elements in T are already known to be orthogonal. We
just need to test “new” inner products, between up and ui , for 1 ≤ i ≤ p − 1. Here
we go, using summation notation,
               *            p−1
                                              +
                            X   huk , vp i
  hui , up i = ui , vp −                   uk
                                huk , uk i
                            k=1
                            *     p−1
                                                     +
                                  X    huk , vp i
             = hui , vp i − ui ,                  uk                    Theorem IPVA
                                       huk , uk i
                                  k=1
                            p−1 
                            X                        
                                       huk , vp i
             = hui , vp i −       ui ,            uk                    Theorem IPVA
                                       huk , uk i
                            k=1
                          p−1
                          X    huk , vp i
           = hui , vp i −                  hui , uk i                   Theorem IPSM
                               huk , uk i
                          k=1
                          hui , vp i              X huk , vp i
           = hui , vp i −            hui , ui i −                (0)    Induction Hypothesis
                          hui , ui i                  huk , uk i
                                                 k6=i
§O                  A First Course in Linear Algebra                                   123
                                          X
            = hui , vp i − hui , vp i −          0
                                          k6=i

            =0
                                                                                         


Example GSTV Gram-Schmidt of three vectors
We will illustrate the Gram-Schmidt process with three vectors. Begin with the
linearly independent (check this!) set
                                         ("   # "     # " #)
                                            1      −i    0
                   S = {v1 , v2 , v3 } =   1+i ,   1   , i
                                            1     1+i    i
     Then
                               " #
                              1
                  u1 = v1 = 1 + i
                              1
                                             "       #
                            hu1 , v2 i      1 −2 − 3i
                  u2 = v2 −            u1 =    1−i
                            hu1 , u1 i      4 2 + 5i
                                                              "        #
                            hu1 , v3 i      hu2 , v3 i      1 −3 − i
                  u3 = v3 −            u1 −            u2 =     1 + 3i
                            hu1 , u1 i      hu2 , u2 i      11 −1 − i
and
                                     ("       #   "       #   "        #)
                                            1    1 −2 − 3i   1 −3 − i
             T = {u1 , u2 , u3 } =         1+i ,    1−i ,       1 + 3i
                                            1    4 2 + 5i   11 −1 − i

is an orthogonal set (which you can check) of nonzero vectors and hT i = hSi (all
by Theorem GSP). Of course, as a by-product of orthogonality, the set T is also
linearly independent (Theorem OSLI).                                          4
     One final definition related to orthogonal vectors.
Definition ONS OrthoNormal Set
Suppose S = {u1 , u2 , u3 , . . . , un } is an orthogonal set of vectors such that kui k = 1
for all 1 ≤ i ≤ n. Then S is an orthonormal set of vectors.                               
   Once you have an orthogonal set, it is easy to convert it to an orthonormal set
— multiply each vector by the reciprocal of its norm, and the resulting vector will
have norm 1. This scaling of each vector will not affect the orthogonality properties
(apply Theorem IPSM).
Example ONTV Orthonormal set, three vectors
The set
                              ("   #   "       #   "        #)
                                 1    1 −2 − 3i   1 −3 − i
        T = {u1 , u2 , u3 } =   1+i ,    1−i ,       1 + 3i
                                 1    4 2 + 5i   11 −1 − i
from Example GSTV is an orthogonal set.
   We compute the norm of each vector,
                                                                        √
                                      1√                                  2
         ku1 k = 2           ku2 k =     11                     ku3 k = √
                                      2                                   11
     Converting each vector to a norm of 1, yields an orthonormal set,
                            "     #
                          1    1
                    w1 =      1+i
                          2    1
                                  "        #          "        #
                             1 1 −2 − 3i           1    −2 − 3i
                    w2 = 1 √         1−i = √             1−i
                          2 11
                                 4 2 + 5i       2 11 2 + 5i
124                               Ro b e rt B e e z e r                               §O

                                   "       #      "        #
                             1    1 −3 − i     1   −3 − i
                    w3 =   √         1 + 3i = √     1 + 3i
                           √2    11 −1 − i      22 −1 − i
                            11
                                                                                       4
Example ONFV Orthonormal set, four vectors
As an exercise convert the linearly independent set
                                                         
                        1+i          i        i           −1 − i 
                                                                 
                         1  1 + i  −i                i 
                 S=           ,  −1  , −1 + i ,      1 
                      1−i
                                                                 
                                                                  
                           i         −i        1            −1
to an orthogonal set via the Gram-Schmidt Process (Theorem GSP) and then scale
the vectors to norm 1 to create an orthonormal set. You should get the same set you
would if you scaled the orthogonal set of Example AOS to become an orthonormal
set.                                                                             4
    We will see orthonormal sets again in Subsection MINM.UM. They are intimately
related to unitary matrices (Definition UM) through Theorem CUMOS. Some of
the utility of orthonormal sets is captured by Theorem COB in Subsection B.OBC.
Orthonormal sets appear once again in Section OD where they are key in orthonormal
diagonalization.

Reading Questions

1. Is the set
                                       
                                  1        5       8 
                                   −1 ,  3  ,  4 
                                  2       −1      −2 
   an orthogonal set? Why?
2. What is the distinction between an orthogonal set and an orthonormal set?
3. What is nice about the output of the Gram-Schmidt process?

Exercises
C20 Complete Example AOS by verifying that the four remaining inner products are
zero.
C21 Verify that the set T created in Example GSTV by the Gram-Schmidt Procedure is
an orthogonal set.
M60 Suppose that {u, v, w} ⊆ Cn is an orthonormal set. Prove that u + v is not
orthogonal to v + w.
T10    Prove part 2 of the conclusion of Theorem IPVA.
T11    Prove part 2 of the conclusion of Theorem IPSM.
T20† Suppose that u, v, w ∈ Cn , α, β ∈ C and u is orthogonal to both v and w. Prove
that u is orthogonal to αv + βw.
T30 Suppose that the set S in the hypothesis of Theorem GSP is not just linearly
independent, but is also orthogonal. Prove that the set T created by the Gram-Schmidt
procedure is equal to S. (Note that we are getting a stronger conclusion than hT i = hSi —
the conclusion is that T = S.) In other words, it is pointless to apply the Gram-Schmidt
procedure to a set that is already orthogonal.
T31 Suppose that the set S is linearly independent. Apply the Gram-Schmidt procedure
(Theorem GSP) twice, creating first the linearly independent set T1 from S, and then
creating T2 from T1 . As a consequence of Exercise O.T30, prove that T1 = T2 . In other
words, it is pointless to apply the Gram-Schmidt procedure twice.
Chapter M
Matrices

We have made frequent use of matrices for solving systems of equations, and we
have begun to investigate a few of their properties, such as the null space and
nonsingularity. In this chapter, we will take a more systematic approach to the study
of matrices.


Section MO
Matrix Operations
In this section we will back up and start simple. We begin with a definition of a
totally general set of matrices, and see where that takes us.

Subsection MEASM
Matrix Equality, Addition, Scalar Multiplication
Definition VSM Vector Space of m × n Matrices
The vector space Mmn is the set of all m × n matrices with entries from the set of
complex numbers.                                                                
   Just as we made, and used, a careful definition of equality for column vectors, so
too, we have precise definitions for matrices.
Definition ME Matrix Equality
The m × n matrices A and B are equal, written A = B provided [A]ij = [B]ij for
all 1 ≤ i ≤ m, 1 ≤ j ≤ n.                                                   
    So equality of matrices translates to the equality of complex numbers, on an
entry-by-entry basis. Notice that we now have yet another definition that uses the
symbol “=” for shorthand. Whenever a theorem has a conclusion saying two matrices
are equal (think about your objects), we will consider appealing to this definition as
a way of formulating the top-level structure of the proof.
    We will now define two operations on the set Mmn . Again, we will overload a
symbol (‘+’) and a convention (juxtaposition for scalar multiplication).
Definition MA Matrix Addition
Given the m × n matrices A and B, define the sum of A and B as an m × n matrix,
written A + B, according to
            [A + B]ij = [A]ij + [B]ij              1 ≤ i ≤ m, 1 ≤ j ≤ n
                                                                                    
   So matrix addition takes two matrices of the same size and combines them (in a
natural way!) to create a new matrix of the same size. Perhaps this is the “obvious”
thing to do, but it does not relieve us from the obligation to state it carefully.
Example MA Addition of two matrices in M23
If
                                                                 
               2 −3 4                          6             2   −4
          A=                               B=
               1 0 −7                          3             5    2

                                         125
126                                Ro b e rt B e e z e r                         §M O

then
                                            
                        2 −3 4       6 2 −4
                A+B =              +
                        1 0 −7       3 5 2
                                                                      
                        2 + 6 −3 + 2 4 + (−4)    8             −1      0
                    =                          =
                        1+3 0+5       −7 + 2     4              5     −5
                                                                                     4
   Our second operation takes two objects of different types, specifically a number
and a matrix, and combines them to create another matrix. As with vectors, in this
context we call a number a scalar in order to emphasize that it is not a matrix.
Definition MSM Matrix Scalar Multiplication
Given the m × n matrix A and the scalar α ∈ C, the scalar multiple of A is an
m × n matrix, written αA and defined according to
                [αA]ij = α [A]ij                    1 ≤ i ≤ m, 1 ≤ j ≤ n
                                                                                      
   Notice again that we have yet another kind of multiplication, and it is again written
putting two symbols side-by-side. Computationally, scalar matrix multiplication is
very easy.
Example MSM Scalar multiplication in M32
If
                                 "      #
                                   2 8
                            A = −3 5
                                   0 1
and α = 7, then
                          "          # "               # "             #
                          2         8    7(2)      7(8)    14       56
                  αA = 7 −3         5 = 7(−3)      7(5) = −21       35
                          0         1    7(0)      7(1)     0        7
                                                                                     4

Subsection VSP
Vector Space Properties
With definitions of matrix addition and scalar multiplication we can now state, and
prove, several properties of each operation, and some properties that involve their
interplay. We now collect ten of them here for later reference.
Theorem VSPM Vector Space Properties of Matrices
Suppose that Mmn is the set of all m × n matrices (Definition VSM) with addition
and scalar multiplication as defined in Definition MA and Definition MSM. Then

      • ACM Additive Closure, Matrices
        If A, B ∈ Mmn , then A + B ∈ Mmn .

      • SCM Scalar Closure, Matrices
        If α ∈ C and A ∈ Mmn , then αA ∈ Mmn .

      • CM Commutativity, Matrices
        If A, B ∈ Mmn , then A + B = B + A.

      • AAM Additive Associativity, Matrices
        If A, B, C ∈ Mmn , then A + (B + C) = (A + B) + C.

      • ZM Zero Matrix, Matrices
       There is a matrix, O, called the zero matrix, such that A + O = A for all
       A ∈ Mmn .
§M O                A First Course in Linear Algebra                                127

   • AIM Additive Inverses, Matrices
       If A ∈ Mmn , then there exists a matrix −A ∈ Mmn so that A + (−A) = O.

   • SMAM Scalar Multiplication Associativity, Matrices
       If α, β ∈ C and A ∈ Mmn , then α(βA) = (αβ)A.

   • DMAM Distributivity across Matrix Addition, Matrices
       If α ∈ C and A, B ∈ Mmn , then α(A + B) = αA + αB.

   • DSAM Distributivity across Scalar Addition, Matrices
       If α, β ∈ C and A ∈ Mmn , then (α + β)A = αA + βA.

   • OM One, Matrices
       If A ∈ Mmn , then 1A = A.

Proof. While some of these properties seem very obvious, they all require proof.
However, the proofs are not very interesting, and border on tedious. We will prove
one version of distributivity very carefully, and you can test your proof-building skills
on some of the others. We will give our new notation for matrix entries a workout
here. Compare the style of the proofs here with those given for vectors in Theorem
VSPCV — while the objects here are more complicated, our notation makes the
proofs cleaner.
    To prove Property DSAM, (α + β)A = αA + βA, we need to establish the equality
of two matrices (see Proof Technique GS). Definition ME says we need to establish
the equality of their entries, one-by-one. How do we do this, when we do not even
know how many entries the two matrices might have? This is where the notation for
matrix entries, given in Definition M, comes into play. Ready? Here we go.
    For any i and j, 1 ≤ i ≤ m, 1 ≤ j ≤ n,
            [(α + β)A]ij = (α + β) [A]ij                Definition MSM
                         = α [A]ij + β [A]ij            Distributivity in C
                         = [αA]ij + [βA]ij              Definition MSM
                         = [αA + βA]ij                  Definition MA
    There are several things to notice here. (1) Each equals sign is an equality of
scalars (numbers). (2) The two ends of the equation, being true for any i and j,
allow us to conclude the equality of the matrices by Definition ME. (3) There are
several plus signs, and several instances of juxtaposition. Identify each one, and state
exactly what operation is being represented by each.                                  

   For now, note the similarities between Theorem VSPM about matrices and
Theorem VSPCV about vectors.
   The zero matrix described in this theorem, O, is what you would expect — a
matrix full of zeros.
Definition ZM Zero Matrix
The m × n zero matrix is written as O = Om×n and defined by [O]ij = 0, for all
1 ≤ i ≤ m, 1 ≤ j ≤ n.                                                       

Subsection TSM
Transposes and Symmetric Matrices
We describe one more common operation we can perform on matrices. Informally,
to transpose a matrix is to build a new matrix by swapping its rows and columns.
Definition TM Transpose of a Matrix
Given an m × n matrix A, its transpose is the n × m matrix At given by
                    t
                    A ij = [A]ji , 1 ≤ i ≤ n, 1 ≤ j ≤ m.
                                                                                      
128                            Ro b e rt B e e z e r                           §M O

Example TM Transpose of a 3 × 4 matrix
Suppose
                            "                        #
                              3 7 2               −3
                      D = −1 4 2                   8 .
                              0 3 −2               5
   We could formulate the transpose, entry-by-entry, using the definition. But it is
easier to just systematically rewrite rows as columns (or vice-versa). The form of
the definition given will be more useful in proofs. So we have
                                                    
                                        3 −1 0
                                     7      4    3
                               Dt = 
                                        2    2 −2
                                       −3 8       5
                                                                                   4
    It will sometimes happen that a matrix is equal to its transpose. In this case, we
will call a matrix symmetric. These matrices occur naturally in certain situations,
and also have some nice properties, so it is worth stating the definition carefully.
Informally a matrix is symmetric if we can “flip” it about the main diagonal (upper-
left corner, running down to the lower-right corner) and have it look unchanged.
Definition SYM Symmetric Matrix
The matrix A is symmetric if A = At .                                               
Example SYM A symmetric 5 × 5 matrix
The matrix
                                                         
                         2    3 −9                5     7
                      3      1    6             −2    −3
                                                         
                 E = −9 6         0             −1     9
                       5 −2 −1                   4    −8
                         7 −3 9                  −8    −3
is symmetric.                                                                      4
   You might have noticed that Definition SYM did not specify the size of the matrix
A, as has been our custom. That is because it was not necessary. An alternative
would have been to state the definition just for square matrices, but this is the
substance of the next proof.
   Before reading the next proof, we want to offer you some advice about how to
become more proficient at constructing proofs. Perhaps you can apply this advice to
the next theorem. Have a peek at Proof Technique P now.
Theorem SMS Symmetric Matrices are Square
Suppose that A is a symmetric matrix. Then A is square.

Proof. We start by specifying A’s size, without assuming it is square, since we are
trying to prove that, so we cannot also assume it. Suppose A is an m × n matrix.
Because A is symmetric, we know by Definition SYM that A = At . So, in particular,
Definition ME requires that A and At must have the same size. The size of At is
n × m. Because A has m rows and At has n rows, we conclude that m = n, and
hence A must be square by Definition SQM.                                        

    We finish this section with three easy theorems, but they illustrate the interplay
of our three new operations, our new notation, and the techniques used to prove
matrix equalities.
Theorem TMA Transpose and Matrix Addition
Suppose that A and B are m × n matrices. Then (A + B)t = At + B t .

Proof. The statement to be proved is an equality of matrices, so we work entry-by-
entry and use Definition ME. Think carefully about the objects involved here, and
the many uses of the plus sign. Realize too that while A and B are m × n matrices,
§M O               A First Course in Linear Algebra                              129

the conclusion is a statement about the equality of two n × m matrices. So we begin
with: for 1 ≤ i ≤ n, 1 ≤ j ≤ m,
                       
              (A + B)t ij = [A + B]ji                    Definition TM
                             = [A]ji + [B]ji                Definition MA
                                       
                             = At ij + B t ij               Definition TM
                                       
                             = At + B t ij                  Definition MA
    Since the matrices (A + B)t and At + B t agree at each entry, Definition ME tells
us the two matrices are equal.                                                     

Theorem TMSM Transpose and Matrix Scalar Multiplication
Suppose that α ∈ C and A is an m × n matrix. Then (αA)t = αAt .

Proof. The statement to be proved is an equality of matrices, so we work entry-by-
entry and use Definition ME. Notice that the desired equality is of n×m matrices, and
think carefully about the objects involved here, plus the many uses of juxtaposition.
For 1 ≤ i ≤ m, 1 ≤ j ≤ n,
                      
                 (αA)t ji = [αA]ij                   Definition TM
                          = α [A]ij                     Definition MSM
                               
                          = α At ji                     Definition TM
                                
                          = αAt ji                      Definition MSM
   Since the matrices (αA)t and αAt agree at each entry, Definition ME tells us the
two matrices are equal.                                                          

Theorem TT Transpose of a Transpose
                                             t
Suppose that A is an m × n matrix. Then (At ) = A.

Proof. We again want to prove an equality of matrices, so we work entry-by-entry
and use Definition ME. For 1 ≤ i ≤ m, 1 ≤ j ≤ n,
                 h    t i   
                   At      = At ji               Definition TM
                        ij
                             = [A]ij                     Definition TM
                              t
   Since the matrices (At ) and A agree at each entry, Definition ME tells us the
two matrices are equal.                                                        

Subsection MCC
Matrices and Complex Conjugation
As we did with vectors (Definition CCCV), we can define what it means to take the
conjugate of a matrix.
Definition CCM Complex Conjugate of a Matrix
Suppose A is an m × n matrix. Then the conjugate of A, written A is an m × n
matrix defined by
                                 
                                 A ij = [A]ij
                                                                                   
Example CCM Complex conjugate of a matrix
If
                                               
                         2−i        3     5 + 4i
                  A=
                        −3 + 6i 2 − 3i      0
then
                                                            
                                       2+i        3    5 − 4i
                             A=
                                      −3 − 6i   2 + 3i   0
130                               Ro b e rt B e e z e r                         §M O

  The interplay between the conjugate of a matrix and the two operations on
matrices is what you might expect.
Theorem CRMA Conjugation Respects Matrix Addition
Suppose that A and B are m × n matrices. Then A + B = A + B.

Proof. For 1 ≤ i ≤ m, 1 ≤ j ≤ n,
                   
              A + B ij = [A + B]ij                         Definition CCM
                            = [A]ij + [B]ij                Definition MA
                            = [A]ij + [B]ij                Theorem CCRA
                                      
                            = A ij + B ij                  Definition CCM
                                     
                            = A + B ij                     Definition MA

   Since the matrices A + B and A + B are equal in each entry, Definition ME says
that A + B = A + B.                                                            

Theorem CRMSM Conjugation Respects Matrix Scalar Multiplication
Suppose that α ∈ C and A is an m × n matrix. Then αA = αA.

Proof. For 1 ≤ i ≤ m, 1 ≤ j ≤ n,
                 
                 αA ij = [αA]ij                       Definition CCM
                            = α [A]ij                 Definition MSM
                            = α[A]ij                  Theorem CCRM
                                 
                            = α A ij                  Definition CCM
                               
                            = αA ij                   Definition MSM

  Since the matrices αA and αA are equal in each entry, Definition ME says that
αA = αA.                                                                     

Theorem CCM Conjugate of the Conjugate of a Matrix
                                         
Suppose that A is an m × n matrix. Then A = A.

Proof. For 1 ≤ i ≤ m, 1 ≤ j ≤ n,
                 h i       
                   A    = A ij                        Definition CCM
                       ij

                             = [A]ij                  Definition CCM
                       = [A]ij                Theorem CCT
                       
   Since the matrices A and A are equal in each entry, Definition ME says that
  
 A = A.                                                                     

    Finally, we will need the following result about matrix conjugation and transposes
later.
Theorem MCT Matrix Conjugation and Transposes
                                                 t
Suppose that A is an m × n matrix. Then (At ) = A .

Proof. For 1 ≤ i ≤ m, 1 ≤ j ≤ n,
               h      i
                 (At ) = [At ]ji                          Definition CCM
                       ji

                            = [A]ij                       Definition TM
                               
                            = A ij                        Definition CCM
                              h t i
                            = A                           Definition TM
                                        ji
§M O               A First Course in Linear Algebra                                131

                                       t
   Since the matrices (At ) and A           are equal in each entry, Definition ME says
              t
that (At ) = A .                                                                     

Subsection AM
Adjoint of a Matrix
The combination of transposing and conjugating a matrix will be important in
subsequent sections, such as Subsection MINM.UM and Section OD. We make a
key definition here and prove some basic results in the same spirit as those above.
Definition A Adjoint
                                            t
If A is a matrix, then its adjoint is A∗ = A .                                       
                                                               H    ∗     †
   You will see the adjoint written elsewhere variously as A , A or A . Notice that
Theorem MCT says it does not really matter if we conjugate and then transpose, or
transpose and then conjugate.
Theorem AMA Adjoint and Matrix Addition
                                                           ∗
Suppose A and B are matrices of the same size. Then (A + B) = A∗ + B ∗ .

Proof.
                      ∗           t
              (A + B) = A + B                            Definition A
                                  t
                          = A+B                          Theorem CRMA
                             t   t
                          = A + B                        Theorem TMA
                            ∗     ∗
                          =A +B                          Definition A
                                                                                     

Theorem AMSM Adjoint and Matrix Scalar Multiplication
                                                      ∗
Suppose α ∈ C is a scalar and A is a matrix. Then (αA) = αA∗ .

Proof.
                    ∗      t
                (αA) = αA                            Definition A
                           t
                      = αA                           Theorem CRMSM
                           t
                      =α A                           Theorem TMSM
                       = αA∗                         Definition A
                                                                                     

Theorem AA Adjoint of an Adjoint
                                      ∗
Suppose that A is a matrix. Then (A∗ ) = A.

Proof.
                             t
                     ∗
                (A∗ ) = (A∗ )                            Definition A
                               
                              t
                       = (A∗ )                           Theorem MCT
                                  
                              t t
                       =    A                            Definition A
                            
                      = A                                Theorem TT
                      =A                                 Theorem CCM
                                                                                     

   Take note of how the theorems in this section, while simple, build on earlier
theorems and definitions and never descend to the level of entry-by-entry proofs
based on Definition ME. In other words, the equal signs that appear in the previous
proofs are equalities of matrices, not scalars (which is the opposite of a proof like
that of Theorem TMA).
132                               Ro b e rt B e e z e r                               §M O

Reading Questions

1. Perform the following matrix computation.
                                                                    
                            2 −2    8   1         2          7    1   2
                       (6) 4  5   −1 3 + (−2) 3          −1    0   5
                            7 −3    0   2         1          7    3   3

2. Theorem VSPM reminds you of what previous theorem? How strong is the similarity?
3. Compute the transpose of the matrix below.
                                                     
                                       6    8       4
                                    −2     1       0
                                       9   −5       6

Exercises
                                                                       
                                                              2    4
                 1     4   −3          3     2    1
C10†    Let A =                 ,B =                  and C =  4      0. Let α = 4 and
                 6     3    0         −2     −6   5
                                                                 −2 2
β = 1/2. Perform the following calculations: (1) A + B, (2) A + C, (3) B t + C, (4) A + B t ,
(5) βC, (6) 4A − 3B, (7) At + αC, (8) A + B − C t , (9) 4A + 2B − 5C t .
C11†    Solve the given vector equation for x, or explain why no solution exists:
                                                             
                          1 2 3          1 1 2         −1 1     0
                      2            −3              =
                          0 4 2          0 1 x          0   5 −2

C12†    Solve the given matrix equation for α, or explain why no solution exists:
                                                             
                          1 3   4      4 3 −6           7 12    6
                      α             +               =
                          2 1 −1       0 1      1       6 4 −2

C13†    Solve the given matrix equation for α, or explain why no solution exists:
                                                       
                               3 1        4 1        2    1
                            α 2 0 − 3 2 = 1 −2
                               1 4        0 1        2    6

C14†    Find α and β that solve the following equation:
                                                         
                               1 2          2 1      −1     4
                           α         +β          =
                               4 1          3 1       6     1


In Chapter V we defined the operations of vector addition and vector scalar multiplication
in Definition CVA and Definition CVSM. These two operations formed the underpinnings
of the remainder of the chapter. We have now defined similar operations for matrices in
Definition MA and Definition MSM. You will have noticed the resulting similarities between
Theorem VSPCV and Theorem VSPM.
In Exercises M20–M25, you will be asked to extend these similarities to other fundamental
definitions and concepts we first saw in Chapter V. This sequence of problems was suggested
by Martin Jackson.
    M20 Suppose S = {B1 , B2 , B3 , . . . , Bp } is a set of matrices from Mmn . Formulate
    appropriate definitions for the following terms and give an example of the use of each.
       1. A linear combination of elements of S.
       2. A relation of linear dependence on S, both trivial and nontrivial.
       3. S is a linearly independent set.
       4. hSi.

   M21†      Show that the set S is linearly independent in M22 .
                                                          
                                  1 0       0 1     0 0     0 0
                         S=              ,        ,       ,
                                  0 0       0 0     1 0     0 1

   M22† Determine if the set S below is linearly independent in M23 .
                                                                             
      −2 3   4     4 −2 2         −1 −2 −2           −1 1     0     −1           2   −2
                ,              ,                  ,              ,
      −1 3 −2      0 −1 1          2     2    2      −1 0 −2          0         −1   −2
§M O                A First Course in Linear Algebra                                     133

   M23† Determine if the matrix A is in the span of S. In other words, is A ∈ hSi? If so
   write A as a linear combination of the elements of S.
                          
          −13 24       2
   A=
          −8 −2 −20
                                                                              
           −2 3      4       4 −2 2      −1 −2 −2          −1 1    0      −1    2    −2
   S=                    ,             ,                 ,             ,
           −1 3 −2           0 −1 1       2    2     2     −1 0 −2         0   −1 −2

   M24† Suppose Y is the set of all 3 × 3 symmetric matrices (Definition SYM). Find a
   set T so that T is linearly independent and hT i = Y .
   M25    Define a subset of M33 by
                              n                                 o
                       U33 = A ∈ M33 | [A]ij = 0 whenever i > j

   Find a set R so that R is linearly independent and hRi = U33 .

T13† Prove Property CM of Theorem VSPM. Write your proof in the style of the proof
of Property DSAM given in this section.
T14 Prove Property AAM of Theorem VSPM. Write your proof in the style of the proof
of Property DSAM given in this section.
T17 Prove Property SMAM of Theorem VSPM. Write your proof in the style of the
proof of Property DSAM given in this section.
T18 Prove Property DMAM of Theorem VSPM. Write your proof in the style of the
proof of Property DSAM given in this section.

A matrix A is skew-symmetric if At = −A Exercises T30–T37 employ this definition.
   T30 Prove that a skew-symmetric matrix is square. (Hint: study the proof of Theorem
   SMS.)
   T31 Prove that a skew-symmetric matrix must have zeros for its diagonal elements.
   In other words, if A is skew-symmetric of size n, then [A]ii = 0 for 1 ≤ i ≤ n. (Hint:
   carefully construct an example of a 3 × 3 skew-symmetric matrix before attempting a
   proof.)
   T32 Prove that a matrix A is both skew-symmetric and symmetric if and only if A is
   the zero matrix. (Hint: one half of this proof is very easy, the other half takes a little
   more work.)
   T33 Suppose A and B are both skew-symmetric matrices of the same size and α, β ∈ C.
   Prove that αA + βB is a skew-symmetric matrix.
   T34    Suppose A is a square matrix. Prove that A + At is a symmetric matrix.
   T35    Suppose A is a square matrix. Prove that A − At is a skew-symmetric matrix.
   T36 Suppose A is a square matrix. Prove that there is a symmetric matrix B and a
   skew-symmetric matrix C such that A = B + C. In other words, any square matrix can
   be decomposed into a symmetric matrix and a skew-symmetric matrix (Proof Technique
   DC). (Hint: consider building a proof on Exercise MO.T34 and Exercise MO.T35.)
   T37 Prove that the decomposition in Exercise MO.T36 is unique (see Proof Technique
   U). (Hint: a proof can turn on Exercise MO.T31.)
134   Ro b e rt B e e z e r   §M O
Section MM
Matrix Multiplication
We know how to add vectors and how to multiply them by scalars. Together, these
operations give us the possibility of making linear combinations. Similarly, we know
how to add matrices and how to multiply matrices by scalars. In this section we mix
all these ideas together and produce an operation known as “matrix multiplication.”
This will lead to some results that are both surprising and central. We begin with a
definition of how to multiply a vector by a matrix.


Subsection MVP
Matrix-Vector Product
We have repeatedly seen the importance of forming linear combinations of the
columns of a matrix. As one example of this, the oft-used Theorem SLSLC, said
that every solution to a system of linear equations gives rise to a linear combination
of the column vectors of the coefficient matrix that equals the vector of constants.
This theorem, and others, motivate the following central definition.
Definition MVP Matrix-Vector Product
Suppose A is an m × n matrix with columns A1 , A2 , A3 , . . . , An and u is a vector
of size n. Then the matrix-vector product of A with u is the linear combination
                 Au = [u]1 A1 + [u]2 A2 + [u]3 A3 + · · · + [u]n An
                                                                                     
    So, the matrix-vector product is yet another version of “multiplication,” at least
in the sense that we have yet again overloaded juxtaposition of two symbols as our
notation. Remember your objects, an m × n matrix times a vector of size n will
create a vector of size m. So if A is rectangular, then the size of the vector changes.
With all the linear combinations we have performed so far, this computation should
now seem second nature.
Example MTV A matrix times a vector
Consider
                                                                  
                                                                 2
                    "                       #
                   1      4 2        3   4                      1
                                                                 
               A = −3     2 0        1   −2                 u = −2
                   1      6 −3      −1    5                     3
                                                                 −1
   Then
                " #  " #      " #   " #      " # " #
                1     4        2      3        4   7
        Au = 2 −3 + 1 2 + (−2) 0 + 3 1 + (−1) −2 = 1 .
                1     6        −3    −1        5   6
                                                                                    4
    We can now represent systems of linear equations compactly with a matrix-vector
product (Definition MVP) and column vector equality (Definition CVE). This finally
yields a very popular alternative to our unconventional LS(A, b) notation.
Theorem SLEMM Systems of Linear Equations as Matrix Multiplication
The set of solutions to the linear system LS(A, b) equals the set of solutions for x
in the vector equation Ax = b.

Proof. This theorem says that two sets (of solutions) are equal. So we need to show
that one set of solutions is a subset of the other, and vice versa (Definition SE). Let
A1 , A2 , A3 , . . . , An be the columns of A. Both of these set inclusions then follow
from the following chain of equivalences (Proof Technique E),
    x is a solution to LS(A, b)
     ⇐⇒ [x]1 A1 + [x]2 A2 + [x]3 A3 + · · · + [x]n An = b        Theorem SLSLC

                                         135
136                             Ro b e rt B e e z e r                             §M M

      ⇐⇒ x is a solution to Ax = b                                Definition MVP


Example MNSLE Matrix notation for systems of linear equations
Consider the system of linear equations from Example NSLE.


                            2x1 + 4x2 − 3x3 + 5x4 + x5 = 9
                                    3x1 + x2 + x4 − 3x5 = 0
                         −2x1 + 7x2 − 5x3 + 2x4 + 2x5 = −3
has coefficient matrix and vector   of constants
                    "                       #                      "
                                                                   #
                       2 4 −3        5 1                         9
                A= 3 1 0             1 −3                     b= 0
                      −2 7 −5        2 2                         −3
and so will be described compactly by the vector equation Ax = b.                      4
   The matrix-vector product is a very natural computation. We have motivated it
by its connections with systems of equations, but here is another example.
Example MBC Money’s best cities
Every year Money magazine selects several cities in the United States as the “best”
cities to live in, based on a wide array of statistics about each city. This is an example
of how the editors of Money might arrive at a single number that consolidates the
statistics about a city. We will analyze Los Angeles, Chicago and New York City,
based on four criteria: average high temperature in July (Farenheit), number of
colleges and universities in a 30-mile radius, number of toxic waste sites in the
Superfund environmental clean-up program and a personal crime index based on FBI
statistics (average = 100, smaller is safer). It should be apparent how to generalize
the example to a greater number of cities and a greater number of statistics.
    We begin by building a table of statistics. The rows will be labeled with the
cities, and the columns with statistical categories. These values are from Money’s
website in early 2005.


               City             Temp     Colleges   Superfund     Crime
               Los Angeles       77        28          93          254
               Chicago           84        38          85          363
               New York          84        99           1          193


    Conceivably these data might reside in a spreadsheet. Now we must combine
the statistics for each city. We could accomplish this by weighting each category,
scaling the values and summing them. The sizes of the weights would depend upon
the numerical size of each statistic generally, but more importantly, they would
reflect the editors opinions or beliefs about which statistics were most important to
their readers. Is the crime index more important than the number of colleges and
universities? Of course, there is no right answer to this question.
    Suppose the editors finally decide on the following weights to employ: temperature,
0.23; colleges, 0.46; Superfund, −0.05; crime, −0.20. Notice how negative weights
are used for undesirable statistics. Then, for example, the editors would compute for
Los Angeles,
          (0.23)(77) + (0.46)(28) + (−0.05)(93) + (−0.20)(254) = −24.86
   This computation might remind you of an inner product, but we will produce
the computations for all of the cities as a matrix-vector product. Write the table of
raw statistics as a matrix
                                   "                  #
                                     77 28 93 254
                              T = 84 38 85 363
                                     84 99 1 193
§M M                  A First Course in Linear Algebra                                  137

and the weights as a vector
                                                 
                                             0.23
                                            0.46 
                                         w=
                                            −0.05
                                            −0.20
then the matrix-vector product (Definition MVP) yields
               " #          " #             " #        "     # "       #
                77            28             93          254    −24.86
   T w = (0.23) 84 + (0.46) 38 + (−0.05) 85 + (−0.20) 363 = −40.05
                84            99              1          193     26.21
    This vector contains a single number for each of the cities being studied, so the
editors would rank New York best (26.21), Los Angeles next (−24.86), and Chicago
third (−40.05). Of course, the mayor’s offices in Chicago and Los Angeles are free to
counter with a different set of weights that cause their city to be ranked best. These
alternative weights would be chosen to play to each cities’ strengths, and minimize
their problem areas.
    If a speadsheet were used to make these computations, a row of weights would
be entered somewhere near the table of data and the formulas in the spreadsheet
would effect a matrix-vector product. This example is meant to illustrate how “linear”
computations (addition, multiplication) can be organized as a matrix-vector product.
    Another example would be the matrix of numerical scores on examinations and
exercises for students in a class. The rows would be indexed by students and the
columns would be indexed by exams and assignments. The instructor could then
assign weights to the different exams and assignments, and via a matrix-vector
product, compute a single score for each student.                                   4
    Later (much later) we will need the following theorem, which is really a technical
lemma (see Proof Technique LC). Since we are in a position to prove it now, we
will. But you can safely skip it for the moment, if you promise to come back later to
study the proof when the theorem is employed. At that point you will also be able
to understand the comments in the paragraph following the proof.
Theorem EMMVP Equal Matrices and Matrix-Vector Products
Suppose that A and B are m × n matrices such that Ax = Bx for every x ∈ Cn .
Then A = B.

Proof. We are assuming Ax = Bx for all x ∈ Cn , so we can employ this equality
for any choice of the vector x. However, we will limit our use of this equality to the
standard unit vectors, ej , 1 ≤ j ≤ n (Definition SUV). For all 1 ≤ j ≤ n, 1 ≤ i ≤ m,
 [A]ij
 = 0 [A]i1 + · · · + 0 [A]i,j−1 + 1 [A]ij + 0 [A]i,j+1 + · · · + 0 [A]in   Theorem PCNA
 = [ej ]1 [A]i1 + [ej ]2 [A]i2 + [ej ]3 [A]i3 + · · · + [ej ]n [A]in       Property CMCN
 = [A]i1 [ej ]1 + [A]i2 [ej ]2 + [A]i3 [ej ]3 + · · · + [A]in [ej ]n       Definition SUV
 = [Aej ]i                                                                 Definition MVP
 = [Bej ]i                                                                 Hypothesis
 = [B]i1 [ej ]1 + [B]i2 [ej ]2 + [B]i3 [ej ]3 + · · · + [B]in [ej ]n       Definition MVP
 = [ej ]1 [B]i1 + [ej ]2 [B]i2 + [ej ]3 [B]i3 + · · · + [ej ]n [B]in       Property CMCN
 = 0 [B]i1 + · · · + 0 [B]i,j−1 + 1 [B]ij + 0 [B]i,j+1 + · · · + 0 [B]in   Definition SUV
 = [B]ij                                                                   Theorem PCNA
   So by Definition ME the matrices A and B are equal, as desired.                          

    You might notice from studying the proof that the hypotheses of this theorem
could be “weakened” (i.e. made less restrictive). We need only suppose the equality
of the matrix-vector products for just the standard unit vectors (Definition SUV) or
any other spanning set (Definition SSVS) of Cn (Exercise LISS.T40). However, in
practice, when we apply this theorem the stronger hypothesis will be in effect so this
version of the theorem will suffice for our purposes. (If we changed the statement of
138                               Ro b e rt B e e z e r                               §M M

 the theorem to have the less restrictive hypothesis, then we would call the theorem
“stronger.”)


Subsection MM
Matrix Multiplication
We now define how to multiply two matrices together. Stop for a minute and think
about how you might define this new operation.
   Many books would present this definition much earlier in the course. However,
we have taken great care to delay it as long as possible and to present as many
ideas as practical based mostly on the notion of linear combinations. Towards the
conclusion of the course, or when you perhaps take a second course in linear algebra,
you may be in a position to appreciate the reasons for this. For now, understand
that matrix multiplication is a central definition and perhaps you will appreciate its
importance more by having saved it for later.
Definition MM Matrix Multiplication
Suppose A is an m × n matrix and B1 , B2 , B3 , . . . , Bp are the columns of an n × p
matrix B. Then the matrix product of A with B is the m × p matrix where column
i is the matrix-vector product ABi . Symbolically,
              AB = A [B1 |B2 |B3 | . . . |Bp ] = [AB1 |AB2 |AB3 | . . . |ABp ] .
                                                                                          
Example PTM Product of two matrices
Set
                                                                              
                                                           1         6     2 1
                  "                        #
                   1    2    −1     4    6               −1         4     3 2
                                                                              
             A=    0   −4     1     2    3             B= 1         1     2 3
                  −5    1     2    −3    4               6          4    −1 2
                                                           1        −2     3 0
      Then
                                 
                 1       6       2      1
                                              "                                       #
             −1  4        3     2     28  17 20                          10
                                 
       AB = A  1  A  1  A  2  A 3  = 20 −13 −3                           −1 .
             6 4          −1    2    −18 −44 12                          −3
                 1      −2       3      0
                                                                                          4
    Is this the definition of matrix multiplication you expected? Perhaps our previous
operations for matrices caused you to think that we might multiply two matrices of
the same size, entry-by-entry? Notice that our current definition uses matrices of
different sizes (though the number of columns in the first must equal the number
of rows in the second), and the result is of a third size. Notice too in the previous
example that we cannot even consider the product BA, since the sizes of the two
matrices in this order are not right.
    But it gets weirder than that. Many of your old ideas about “multiplication” will
not apply to matrix multiplication, but some still will. So make no assumptions, and
do not do anything until you have a theorem that says you can. Even if the sizes are
right, matrix multiplication is not commutative — order matters.
Example MMNC Matrix multiplication is not commutative
Set
                                                 
                1 3                            4 0
           A=                              B=         .
                −1 2                           5 1
   Then we have two square, 2 × 2 matrices, so Definition MM allows us to multiply
them in either order. We find
                                                            
                         19 3                              4 12
                  AB =                            BA =
                          6 2                              4 17
§M M                     A First Course in Linear Algebra                              139

and AB 6= BA. Not even close. It should not be hard for you to construct other
pairs of matrices that do not commute (try a couple of 3 × 3’s). Can you find a pair
of non-identical matrices that do commute?                                       4

Subsection MMEE
Matrix Multiplication, Entry-by-Entry
While certain “natural” properties of multiplication do not hold, many more do. In
the next subsection, we will state and prove the relevant theorems. But first, we
need a theorem that provides an alternate means of multiplying two matrices. In
many texts, this would be given as the definition of matrix multiplication. We prefer
to turn it around and have the following formula as a consequence of our definition.
It will prove useful for proofs of matrix equality, where we need to examine products
of matrices, entry-by-entry.
Theorem EMP Entries of Matrix Products
Suppose A is an m × n matrix and B is an n × p matrix. Then for 1 ≤ i ≤ m,
1 ≤ j ≤ p, the individual entries of AB are given by
          [AB]ij = [A]i1 [B]1j + [A]i2 [B]2j + [A]i3 [B]3j + · · · + [A]in [B]nj
                         n
                         X
                     =         [A]ik [B]kj
                         k=1

Proof. Let the vectors A1 , A2 , A3 , . . . , An denote the columns of A and let the
vectors B1 , B2 , B3 , . . . , Bp denote the columns of B. Then for 1 ≤ i ≤ m, 1 ≤ j ≤ p,
   [AB]ij = [ABj ]i                                                  Definition MM
                                                    
          = [Bj ]1 A1 + [Bj ]2 A2 + · · · + [Bj ]n An i              Definition MVP
                                                    
          = [Bj ]1 A1 i + [Bj ]2 A2 i + · · · + [Bj ]n An i          Definition CVA
           = [Bj ]1 [A1 ]i + [Bj ]2 [A2 ]i + · · · + [Bj ]n [An ]i   Definition CVSM
           = [B]1j [A]i1 + [B]2j [A]i2 + · · · + [B]nj [A]in         Definition M
           = [A]i1 [B]1j + [A]i2 [B]2j + · · · + [A]in [B]nj         Property CMCN
               n
               X
           =         [A]ik [B]kj
               k=1
                                                                                        

Example PTMEE Product of two matrices, entry-by-entry
Consider again the two matrices from Example PTM
                                                                            
                                                   1  6                  2 1
              "                     #
                1    2 −1 4 6                    −1 4                   3 2
                                                                            
         A = 0 −4 1            2 3           B= 1    1                  2 3
               −5 1      2 −3 4                  6   4                 −1 2
                                                   1 −2                  3 0
   Then suppose we just wanted the entry of AB in the second row, third column:
     [AB]23 = [A]21 [B]13 + [A]22 [B]23 + [A]23 [B]33 + [A]24 [B]43 + [A]25 [B]53
            =(0)(2) + (−4)(3) + (1)(2) + (2)(−1) + (3)(3) = −3
    Notice how there are 5 terms in the sum, since 5 is the common dimension of the
two matrices (column count for A, row count for B). In the conclusion of Theorem
EMP, it would be the index k that would run from 1 to 5 in this computation. Here
is a bit more practice.
    The entry of third row, first column:
     [AB]31 = [A]31 [B]11 + [A]32 [B]21 + [A]33 [B]31 + [A]34 [B]41 + [A]35 [B]51
               =(−5)(1) + (1)(−1) + (2)(1) + (−3)(6) + (4)(1) = −18
   To get some more practice on your own, complete the computation of the other
10 entries of this product. Construct some other pairs of matrices (of compatible
140                                Ro b e rt B e e z e r                         §M M

sizes) and compute their product two ways. First use Definition MM. Since linear
combinations are straightforward for you now, this should be easy to do and to do
correctly. Then do it again, using Theorem EMP. Since this process may take some
practice, use your first computation to check your work.                       4
    Theorem EMP is the way many people compute matrix products by hand. It
will also be very useful for the theorems we are going to prove shortly. However, the
definition (Definition MM) is frequently the most useful for its connections with
deeper ideas like the null space and the upcoming column space.

Subsection PMM
Properties of Matrix Multiplication
In this subsection, we collect properties of matrix multiplication and its interaction
with the zero matrix (Definition ZM), the identity matrix (Definition IM), matrix
addition (Definition MA), scalar matrix multiplication (Definition MSM), the inner
product (Definition IP), conjugation (Theorem MMCC), and the transpose (Defi-
nition TM). Whew! Here we go. These are great proofs to practice with, so try to
concoct the proofs before reading them, they will get progressively more complicated
as we go.
Theorem MMZM Matrix Multiplication and the Zero Matrix
Suppose A is an m × n matrix. Then

  1. AOn×p = Om×p
  2. Op×m A = Op×n

Proof. We will prove (1) and leave (2) to you. Entry-by-entry, for 1 ≤ i ≤ m,
1 ≤ j ≤ p,
                        Xn
           [AOn×p ]ij =    [A]ik [On×p ]kj          Theorem EMP
                             k=1
                             Xn
                         =         [A]ik 0                      Definition ZM
                             k=1
                             Xn
                         =         0
                             k=1
                         =0                                     Property ZCN
                         = [Om×p ]ij                            Definition ZM
  So by the definition of matrix equality (Definition ME), the matrices AOn×p and
Om×p are equal.                                                                 
Theorem MMIM Matrix Multiplication and Identity Matrix
Suppose A is an m × n matrix. Then

  1. AIn = A
  2. Im A = A

Proof. Again, we will prove (1) and leave (2) to you. Entry-by-entry, For 1 ≤ i ≤ m,
1 ≤ j ≤ n,
                   n
                   X
        [AIn ]ij =    [A]ik [In ]kj                       Theorem EMP
                   k=1
                                         n
                                         X
                   = [A]ij [In ]jj +            [A]ik [In ]kj    Property CACN
                                         k=1
                                         k6=j
                                         Xn
                   = [A]ij (1) +                  [A]ik (0)      Definition IM
                                       k=1,k6=j
§M M                A First Course in Linear Algebra                                     141

                                    n
                                    X
                   = [A]ij +                0
                                 k=1,k6=j

                   = [A]ij
  So the matrices A and AIn are equal, entry-by-entry, and by the definition of
matrix equality (Definition ME) we can say they are equal matrices.          

   It is this theorem that gives the identity matrix its name. It is a matrix that
behaves with matrix multiplication like the scalar 1 does with scalar multiplication.
To multiply by the identity matrix is to have no effect on the other matrix.
Theorem MMDAA Matrix Multiplication Distributes Across Addition
Suppose A is an m × n matrix and B and C are n × p matrices and D is a p × s
matrix. Then

  1. A(B + C) = AB + AC

  2. (B + C)D = BD + CD

Proof. We will do (1), you do (2). Entry-by-entry, for 1 ≤ i ≤ m, 1 ≤ j ≤ p,
                         n
                         X
       [A(B + C)]ij =          [A]ik [B + C]kj                           Theorem EMP
                         k=1
                         Xn
                     =         [A]ik ([B]kj + [C]kj )                    Definition MA
                         k=1
                         Xn
                     =         [A]ik [B]kj + [A]ik [C]kj                 Property DCN
                         k=1
                         Xn                     n
                                                X
                     =         [A]ik [B]kj +          [A]ik [C]kj        Property CACN
                         k=1                    k=1
                     = [AB]ij + [AC]ij                                   Theorem EMP
                     = [AB + AC]ij                                       Definition MA
    So the matrices A(B + C) and AB + AC are equal, entry-by-entry, and by the
definition of matrix equality (Definition ME) we can say they are equal matrices.

Theorem MMSMM Matrix Multiplication and Scalar Matrix Multiplication
Suppose A is an m × n matrix and B is an n × p matrix. Let α be a scalar. Then
α(AB) = (αA)B = A(αB).

Proof. These are equalities of matrices. We will do the first one, the second is similar
and will be good practice for you. For 1 ≤ i ≤ m, 1 ≤ j ≤ p,
             [α(AB)]ij = α [AB]ij                                   Definition MSM
                                n
                                X
                         =α          [A]ik [B]kj                    Theorem EMP
                              k=1
                             n
                             X
                         =         α [A]ik [B]kj                    Property DCN
                             k=1
                             Xn
                         =         [αA]ik [B]kj                     Definition MSM
                             k=1
                         = [(αA)B]ij                                Theorem EMP
   So the matrices α(AB) and (αA)B are equal, entry-by-entry, and by the definition
of matrix equality (Definition ME) we can say they are equal matrices.            

Theorem MMA Matrix Multiplication is Associative
Suppose A is an m × n matrix, B is an n × p matrix and D is a p × s matrix. Then
A(BD) = (AB)D.
142                                Ro b e rt B e e z e r                            §M M

Proof. A matrix equality, so we will go entry-by-entry, no surprise there. For 1 ≤
i ≤ m, 1 ≤ j ≤ s,
                     Xn
         [A(BD)]ij =     [A]ik [BD]kj                    Theorem EMP
                         k=1
                          n             p
                                                            !
                         X              X
                     =       [A]ik            [B]k` [D]`j          Theorem EMP
                         k=1            `=1
                         Xn X p
                     =             [A]ik [B]k` [D]`j               Property DCN
                         k=1 `=1

We can switch the order of the summation since these are finite sums,
                         p X
                         X n
                     =             [A]ik [B]k` [D]`j               Property CACN
                         `=1 k=1

As [D]`j does not depend on the index k, we can use distributivity to move it outside
of the inner sum,
                       p          n
                                                !
                      X          X
                    =    [D]`j      [A]ik [B]k`           Property DCN
                         `=1            k=1
                         p
                         X
                     =         [D]`j [AB]i`                        Theorem EMP
                         `=1
                         Xp
                     =         [AB]i` [D]`j                        Property CMCN
                         `=1
                     = [(AB)D]ij                                   Theorem EMP
   So the matrices (AB)D and A(BD) are equal, entry-by-entry, and by the definition
of matrix equality (Definition ME) we can say they are equal matrices.            

    Since Theorem MMA says matrix multipication is associative, it means we do
not have to be careful about the order in which we perform matrix multiplication,
nor how we parenthesize an expression with just several matrices multiplied togther.
So this is where we draw the line on explaining every last detail in a proof. We will
frequently add, remove, or rearrange parentheses with no comment. Indeed, I only
see about a dozen places where Theorem MMA is cited in a proof. You could try to
count how many times we avoid making a reference to this theorem.
    The statement of our next theorem is technically inaccurate. If we upgrade the
vectors u, v to matrices with a single column, then the expression ut v is a 1 × 1
matrix, though we will treat this small matrix as if it was simply the scalar quantity
in its lone entry. When we apply Theorem MMIP there should not be any confusion.
Notice that if we treat a column vector as a matrix with a single column, then we
can also construct the adjoint of a vector, though we will not make this a common
practice.
Theorem MMIP Matrix Multiplication and Inner Products
If we consider the vectors u, v ∈ Cm as m × 1 matrices then
                                       hu, vi = ut v = u∗ v

Proof.
                     m
                     X
          hu, vi =         [u]k [v]k                   Definition IP
                     k=1
                     Xm
                 =         [u]k1 [v]k1                 Column vectors as matrices
                     k=1
                     Xm
                 =         [u]k1 [v]k1                 Definition CCM
                     k=1
§M M                 A First Course in Linear Algebra                             143

                     m
                     X  t
                 =      u 1k [v]k1                 Definition TM
                     k=1
                      t   
                 = uv       11
                                                   Theorem EMP


   To finish we just blur the distinction between a 1 × 1 matrix (ut v) and its lone
entry.                                                                            

Theorem MMCC Matrix Multiplication and Complex Conjugation
Suppose A is an m × n matrix and B is an n × p matrix. Then AB = A B.

Proof. To obtain this matrix equality, we will work entry-by-entry. For 1 ≤ i ≤ m,
1 ≤ j ≤ p,
                
              AB ij = [AB]ij                        Definition CCM
                           n
                           X
                      =          [A]ik [B]kj             Theorem EMP
                           k=1
                           Xn
                      =          [A]ik [B]kj             Theorem CCRA
                           k=1
                           Xn
                      =          [A]ik [B]kj             Theorem CCRM
                           k=1
                           Xn
                                    
                      =           A ik B kj              Definition CCM
                           k=1
                          
                      = A B ij                           Theorem EMP

  So the matrices AB and A B are equal, entry-by-entry, and by the definition of
matrix equality (Definition ME) we can say they are equal matrices.           

   Another theorem in this style, and it is a good one. If you have been practicing
with the previous proofs you should be able to do this one yourself.
Theorem MMT Matrix Multiplication and Transposes
Suppose A is an m × n matrix and B is an n × p matrix. Then (AB)t = B t At .

Proof. This theorem may be surprising but if we check the sizes of the matrices
involved, then maybe it will not seem so far-fetched. First, AB has size m × p, so
its transpose has size p × m. The product of B t with At is a p × n matrix times an
n × m matrix, also resulting in a p × m matrix. So at least our objects are compatible
for equality (and would not be, in general, if we did not reverse the order of the
matrix multiplication).
    Here we go again, entry-by-entry. For 1 ≤ i ≤ m, 1 ≤ j ≤ p,
                  
              (AB)t ji = [AB]ij                          Definition TM
                               n
                               X
                           =         [A]ik [B]kj           Theorem EMP
                               k=1
                               Xn
                           =         [B]kj [A]ik           Property CMCN
                               k=1
                               Xn
                                      t  t
                           =          B jk A ki            Definition TM
                               k=1
                                t     
                           = B At       ji
                                                           Theorem EMP
   So the matrices (AB)t and B t At are equal, entry-by-entry, and by the definition
of matrix equality (Definition ME) we can say they are equal matrices.             
144                              Ro b e rt B e e z e r                            §M M

    This theorem seems odd at first glance, since we have to switch the order of
A and B. But if we simply consider the sizes of the matrices involved, we can see
that the switch is necessary for this reason alone. That the individual entries of the
products then come along to be equal is a bonus.
    As the adjoint of a matrix is a composition of a conjugate and a transpose, its
interaction with matrix multiplication is similar to that of a transpose. Here is the
last of our long list of basic properties of matrix multiplication.
Theorem MMAD Matrix Multiplication and Adjoints
Suppose A is an m × n matrix and B is an n × p matrix. Then (AB)∗ = B ∗ A∗ .

Proof.
                                 t
                (AB)∗ = AB                           Definition A
                                 t
                        = AB                         Theorem MMCC
                           t t
                        = B A                        Theorem MMT
                            ∗    ∗
                        =B A                         Definition A
                                                                                      

    Notice how none of these proofs above relied on writing out huge general matrices
with lots of ellipses (“. . . ”) and trying to formulate the equalities a whole matrix at
a time. This messy business is a “proof technique” to be avoided at all costs. Notice
too how the proof of Theorem MMAD does not use an entry-by-entry approach,
but simply builds on previous results about matrix multiplication’s interaction with
conjugation and transposes.
    These theorems, along with Theorem VSPM and the other results in Section
MO, give you the “rules” for how matrices interact with the various operations
we have defined on matrices (addition, scalar multiplication, matrix multiplication,
conjugation, transposes and adjoints). Use them and use them often. But do not try
to do anything with a matrix that you do not have a rule for. Together, we would
informally call all these operations, and the attendant theorems, “the algebra of
matrices.” Notice, too, that every column vector is just a n × 1 matrix, so these
theorems apply to column vectors also. Finally, these results, taken as a whole, may
make us feel that the definition of matrix multiplication is not so unnatural.


Subsection HM
Hermitian Matrices
The adjoint of a matrix has a basic property when employed in a matrix-vector
product as part of an inner product. At this point, you could even use the following
result as a motivation for the definition of an adjoint.
Theorem AIP Adjoint and Inner Product
Suppose that A is an m × n matrix and x ∈ Cn , y ∈ Cm . Then hAx, yi = hx, A∗ yi.

Proof.
                            t
                hAx, yi = Ax y                           Theorem MMIP
                            t
                        = Ax y                           Theorem MMCC
                                 t
                         = xt A y                        Theorem MMT
                             t        ∗
                         = x (A y)                       Definition A
                                      ∗
                         = hx, A yi                      Theorem MMIP
                                                                                      

    Sometimes a matrix is equal to its adjoint (Definition A), and these matrices
have interesting properties. One of the most common situations where this occurs
is when a matrix has only real number entries. Then we are simply talking about
§M M                A First Course in Linear Algebra                                    145

symmetric matrices (Definition SYM), so you can view this as a generalization of a
symmetric matrix.
Definition HM Hermitian Matrix
The square matrix A is Hermitian (or self-adjoint) if A = A∗ .                           
   Again, the set of real matrices that are Hermitian is exactly the set of symmetric
matrices. In Section PEE we will uncover some amazing properties of Hermitian
matrices, so when you get there, run back here to remind yourself of this definition.
Further properties will also appear in Section OD. Right now we prove a fundamental
result about Hermitian matrices, matrix vector products and inner products. As a
characterization, this could be employed as a definition of a Hermitian matrix and
some authors take this approach.
Theorem HMIP Hermitian Matrices and Inner Products
Suppose that A is a square matrix of size n. Then A is Hermitian if and only if
hAx, yi = hx, Ayi for all x, y ∈ Cn .

Proof. (⇒) This is the “easy half” of the proof, and makes the rationale for a
definition of Hermitian matrices most obvious. Assume A is Hermitian,
                 hAx, yi = hx, A∗ yi                       Theorem AIP
                          = hx, Ayi                        Definition HM


    (⇐) This “half” will take a bit more work. Assume that hAx, yi = hx, Ayi for
all x, y ∈ Cn . We show that A = A∗ by establishing that Ax = A∗ x for all x, so we
can then apply Theorem EMMVP. With only this much motivation, consider the
inner product for any x ∈ Cn .
hAx − A∗ x, Ax − A∗ xi = hAx − A∗ x, Axi − hAx − A∗ x, A∗ xi               Theorem IPVA
                                       ∗                       ∗
                          = hAx − A x, Axi − hA (Ax − A x) , xi Theorem AIP
                          = hAx − A∗ x, Axi − hAx − A∗ x, Axi              Hypothesis
                          =0                                               Property AICN
   Because this first inner product equals zero, and has the same vector in each
argument (Ax − A∗ x), Theorem PIP gives the conclusion that Ax − A∗ x = 0. With
Ax = A∗ x for all x ∈ Cn , Theorem EMMVP says A = A∗ , which is the defining
property of a Hermitian matrix (Definition HM).                                

    So, informally, Hermitian matrices are those that can be tossed around from one
side of an inner product to the other with reckless abandon. We will see later what
this buys us.

Reading Questions

1. Form the matrix vector product of
                                                                       2
                                                                      
                  2   3   −1 0
                1 −2                                                 −3
                           7   3                 with                0
                  1   5    3   2
                                                                       5

2. Multiply together the two matrices below (in the order given).
                                                             2     6
                                                                  
                       2   3   −1 0
                     1 −2                                −3 −4
                                7    3                   0       2
                       1   5    3    2
                                                             3    −1

3. Rewrite the system of linear equations below as a vector equality and using a matrix-
   vector product. (This question does not ask for a solution to the system. But it does
   ask you to express the system of equations in a new form using tools from this section.)
                                     2x1 + 3x2 − x3 = 0
                                       x1 + 2x2 + x3 = 3
146                               Ro b e rt B e e z e r                                §M M

                                      x1 + 3x2 + 3x3 = 7


Exercises
C20† Compute the product of the two matrices below, AB. Do this using the definitions
of the matrix-vector product (Definition MVP) and the definition of matrix multiplication
(Definition MM).
                              
                        2    5                                        
                                                      1 5 −3         4
                 A = −1     3                 B=
                                                      2 0      2    −3
                        2   −2

C21† Compute the product AB of the two matrices below using both the definition of
the matrix-vector product (Definition MVP) and the definition of matrix multiplication
(Definition MM).
                                                               
                        1   3 2                           4 1 2
                  A = −1 2 1                     B = 1 0 1 
                        0   1 0                           3 1 5

C22† Compute the product AB of the two matrices below using both the definition of
the matrix-vector product (Definition MVP) and the definition of matrix multiplication
(Definition MM).
                                                             
                           1   0                            2 3
                    A=                              B=
                          −2 1                              4 6

C23† Compute the product AB of the two matrices below using both the definition of
the matrix-vector product (Definition MVP) and the definition of matrix multiplication
(Definition MM).
                          3 1
                              
                                                               
                        2 4                             −3 1
                    A=                            B =
                          6 5                             4   2
                          1 2

C24†      Compute the product AB of the two matrices below.
                                                                3
                                                             
                          1 2   3    −2
                                                                4
                   A = 0 1 −2 −1                          B= 
                                                              
                                                                0
                          1 1   3     1
                                                                2

C25†      Compute the product AB of    the two matrices below.
                                                                −7
                                                              
                         1 2   3       −2
                                                               3
                   A = 0 1 −2         −1                   B= 
                                                                 1
                         1 1   3        1
                                                                 1

C26† Compute the product AB of the two matrices below using both the definition of
the matrix-vector product (Definition MVP) and the definition of matrix multiplication
(Definition MM).
                                                                 
                       1 3 1                          2    −5 −1
                 A = 0 1 0                   B= 0        1     0
                       1 1 2                         −1     2     1
                                 
      †                      1 2
C30 For the matrix A =              , find A2 , A3 , A4 . Find a general formula for An for any
                             0 1
positive integer n.
                                     
                               1 −1
C31† For the matrix A =                 , find A2 , A3 , A4 . Find a general formula for An for
                               0  1
any positive integer n.
                                         
                                1 0 0
     †
C32 For the matrix A = 0 2 0, find A2 , A3 , A4 . Find a general formula for An
                                0 0 3
for any positive integer n.
§M M                 A First Course in Linear Algebra                                  147
                                       
                             0    1   2
C33† For the matrix A = 0        0   1, find A2 , A3 , A4 . Find a general formula for An
                             0    0   0
for any positive integer n.
T10† Suppose that A is a square matrix and there is a vector, b, such that LS(A, b) has
a unique solution. Prove that A is nonsingular. Give a direct proof (perhaps appealing to
Theorem PSPHS) rather than just negating a sentence from the text discussing a similar
situation.
T12 The conclusion of Theorem HMIP is hAx, yi = hx, A∗ yi. Use the same hypotheses,
and prove the similar conclusion: hx, Ayi = hA∗ x, yi. Two different approaches can be
based on an application of Theorem HMIP. The first uses Theorem AA, while a second
uses Theorem IPAC. Can you provide two proofs?
T20     Prove the second part of Theorem MMZM.
T21     Prove the second part of Theorem MMIM.
T22     Prove the second part of Theorem MMDAA.
T23†    Prove the second part of Theorem MMSMM.
T31     Suppose that A is an m × n matrix and x, y ∈ N (A). Prove that x + y ∈ N (A).
T32     Suppose that A is an m × n matrix, α ∈ C, and x ∈ N (A). Prove that αx ∈ N (A).
T35     Suppose that A is an n × n matrix. Prove that A∗ A and AA∗ are Hermitian matrices.
    †
T40 Suppose that A is an m × n matrix and B is an n × p matrix. Prove that the null
space of B is a subset of the null space of AB, that is N (B) ⊆ N (AB). Provide an example
where the opposite is false, in other words give an example where N (AB) 6⊆ N (B).
T41† Suppose that A is an n × n nonsingular matrix and B is an n × p matrix. Prove that
the null space of B is equal to the null space of AB, that is N (B) = N (AB). (Compare
with Exercise MM.T40.)
T50 Suppose u and v are any two solutions of the linear system LS(A, b). Prove that
u − v is an element of the null space of A, that is, u − v ∈ N (A).
T51† Give a new proof of Theorem PSPHS replacing applications of Theorem SLSLC
with matrix-vector products (Theorem SLEMM).
T52† Suppose that x, y ∈ Cn , b ∈ Cm and A is an m × n matrix. If x, y and x + y are
each a solution to the linear system LS(A, b), what can you say that is interesting about
b? Form an implication with the existence of the three solutions as the hypothesis and an
interesting statement about LS(A, b) as the conclusion, and then give a proof.
148   Ro b e rt B e e z e r   §M M
Section MISLE
Matrix Inverses and Systems of Linear Equations
The inverse of a square matrix, and solutions to linear systems with square coefficient
matrices, are intimately connected.


Subsection SI
Solutions and Inverses
We begin with a familiar example, performed in a novel way.
Example SABMI Solutions to Archetype B with a matrix inverse
Archetype B is the system of m = 3 linear equations in n = 3 variables,
                               −7x1 − 6x2 − 12x3 = −33
                                    5x1 + 5x2 + 7x3 = 24
                                          x1 + 4x3 = 5
   By Theorem SLEMM we can represent this system of equations as
                                         Ax = b
where
            "                       #             "    #               " #
             −7       −6      −12                 x1                  −33
          A= 5         5       7              x = x2              b = 24
              1        0       4                  x3                   5
   Now, entirely unmotivated, we define the 3 × 3 matrix B,
                                                  
                                  −10 −12 −9
                            B =  132     8     11 
                                                2
                                    5           5
                                    2     3     2
and note the remarkable fact that
                                    
                    −10 −12 −9 "−7                −6   −12
                                                           # "
                                                              1    0   0
                                                                        #
            BA =  132      8     11 
                                  2    5           5    7 = 0      1   0
                     5            5    1           0    4     0    0   1
                     2      3     2

   Now apply this computation to the problem of solving the system of equations,
                   x = I3 x                        Theorem MMIM
                     = (BA)x                       Substitution
                     = B(Ax)                       Theorem MMA
                     = Bb                          Substitution


   So we have
                                       
                             −10 −12 −9 "−33# "−3#
                   x = Bb =  13
                               2  8  11 
                                      2   24 = 5
                               5      5    5    2
                               2  3   2

    So with the help and assistance of B we have been able to determine a solution
to the system represented by Ax = b through judicious use of matrix multiplication.
We know by Theorem NMUS that since the coefficient matrix in this example is
nonsingular, there would be a unique solution, no matter what the choice of b. The
derivation above amplifies this result, since we were forced to conclude that x = Bb
and the solution could not be anything else. You should notice that this argument
would hold for any particular choice of b.                                         4
    The matrix B of the previous example is called the inverse of A. When A and B
are combined via matrix multiplication, the result is the identity matrix, which can
be inserted “in front” of x as the first step in finding the solution. This is entirely

                                           149
150                          Ro b e rt B e e z e r                           §M I S L E

analogous to how we might solve a single linear equation like 3x = 12.
                                  
                              1            1        1
                  x = 1x =      (3) x = (3x) = (12) = 4
                              3            3        3
    Here we have obtained a solution by employing the “multiplicative inverse” of
3, 3−1 = 13 . This works fine for any scalar multiple of x, except for zero, since zero
does not have a multiplicative inverse. Consider separately the two linear equations,
                        0x = 12                          0x = 0
   The first has no solutions, while the second has infinitely many solutions. For
matrices, it is all just a little more complicated. Some matrices have inverses, some
do not. And when a matrix does have an inverse, just how would we compute it?
In other words, just where did that matrix B in the last example come from? Are
there other matrices that might have worked just as well?


Subsection IM
Inverse of a Matrix
Definition MI Matrix Inverse
Suppose A and B are square matrices of size n such that AB = In and BA = In .
Then A is invertible and B is the inverse of A. In this situation, we write B = A−1 .

   Notice that if B is the inverse of A, then we can just as easily say A is the inverse
of B, or A and B are inverses of each other.
   Not every square matrix has an inverse. In Example SABMI the matrix B is
the inverse of the coefficient matrix of Archetype B. To see this it only remains to
check that AB = I3 . What about Archetype A? It is an example of a square matrix
without an inverse.
Example MWIAA A matrix without an inverse, Archetype A
Consider the coefficient matrix from Archetype A,
                                     "         #
                                       1 −1 2
                                 A= 2 1 1
                                       1 1 0
    Suppose that A is invertible and does have an inverse, say B. Choose the vector
of constants
                                          " #
                                           1
                                      b= 3
                                           2
and consider the system of equations LS(A, b). Just as in Example SABMI, this
vector equation would have the unique solution x = Bb.
     However, the system LS(A, b) is inconsistent. Form the augmented matrix
[ A | b] and row-reduce to
                                                
                                1    0    1    0
                              0     1 −1 0 
                                   0    0     0      1
which allows us to recognize the inconsistency by Theorem RCLS.
   So the assumption of A’s inverse leads to a logical inconsistency (the system
cannot be both consistent and inconsistent), so our assumption is false. A is not
invertible.
   It is possible this example is less than satisfying. Just where did that particular
choice of the vector b come from anyway? Stay tuned for an application of the future
Theorem CSCS in Example CSAA.                                                       4
   Let us look at one more matrix inverse before we embark on a more systematic
study.
Example MI Matrix inverse
§M I S L E            A First Course in Linear Algebra                              151

Consider the matrices,
                                                                           
             1    2    1       2    1              −3      3    6   −1     −2
           −2 −3 0            −5   −1           0      −2   −5   −1      1
                                                                           
      A= 1       1    0       2    1          B= 1      2    4   1      −1
           −2 −3 −1           −3   −2           1       0    1   1      0
            −1 −3 −1           −3    1              1     −1   −2    0      1
   Then
                                                                                 
       1         2    1   2    1    −3      3    6   −1   −2     1     0   0    0   0
     −2        −3    0   −5   −1  0     −2   −5   −1    1  0      1   0    0   0
                                                                                 
AB =  1         1    0   2    1  1       2    4   1    −1 = 0     0   1    0   0
     −2        −3   −1   −3   −2  1      0    1   1    0  0       0   0    1   0
      −1        −3   −1   −3    1    1     −1   −2    0    1     0     0   0    0   1
and
                                                                                 
      −3         3    6   −1   −2    1      2    1   2    1      1     0   0    0   0
     0         −2   −5   −1    1  −2    −3    0   −5   −1 0       1   0    0   0
                                                                                 
BA =  1         2    4   1    −1  1      1    0   2    1  = 0     0   1    0   0
     1          0    1   1    0  −2     −3   −1   −3   −2 0       0   0    1   0
       1        −1   −2    0    1    −1    −3   −1   −3    1     0     0   0    0   1
so by Definition MI, we can say that A is invertible and write B = A−1 .             4
    We will now concern ourselves less with whether or not an inverse of a matrix
exists, but instead with how you can find one when it does exist. In Section MINM
we will have some theorems that allow us to more quickly and easily determine just
when a matrix is invertible.

Subsection CIM
Computing the Inverse of a Matrix
We have seen that the matrices from Archetype B and Archetype K both have
inverses, but these inverse matrices have just dropped from the sky. How would
we compute an inverse? And just when is a matrix invertible, and when is it not?
Writing a putative inverse with n2 unknowns and solving the resultant n2 equations
is one approach. Applying this approach to 2 × 2 matrices can get us somewhere, so
just for fun, let us do it.
Theorem TTMI Two-by-Two Matrix Inverse
Suppose
                                   
                                a b
                         A=
                                c d
   Then A is invertible if and only if ad − bc 6= 0. When A is invertible, then
                                                     
                                        1      d −b
                             A−1 =
                                    ad − bc −c a

Proof. (⇐) Assume that ad − bc = 6 0. We will use the definition of the inverse of a
matrix to establish that A has an inverse (Definition MI). Note that if ad − bc 6= 0
then the displayed formula for A−1 is legitimate since we are not dividing by zero).
Using this proposed formula for the inverse of A, we compute
                                                                        
     −1     a b         1     d −b             1     ad − bc     0         1 0
  AA =                                   =                              =
             c d     ad − bc −c a           ad − bc     0     ad − bc      0 1

and
                                                                         
      −1         1     d −b a       b        1     ad − bc    0      1         0
  A        A=                           =                          =
              ad − bc −c a    c     d     ad − bc     0    ad − bc   0         1
   By Definition MI this is sufficient to establish that A is invertible, and that the
expression for A−1 is correct.
   (⇒) Assume that A is invertible, and proceed with a proof by contradiction
(Proof Technique CD), by assuming also that ad − bc = 0. This translates to ad = bc.
152                           Ro b e rt B e e z e r                               §M I S L E

Let
                                                      
                                          e        f
                                       B=
                                          g        h
be a putative inverse of A.
   This means that
                                                                    
                                a    b e      f    ae + bg       af + bh
                  I2 = AB =                      =
                                 c   d g      h     ce + dg      cf + dh
   Working on the matrices on two ends of this equation, we will multiply the top
row by c and the bottom row by a.
                                                    
                         c 0      ace + bcg acf + bch
                              =
                        0 a       ace + adg acf + adh
   We are assuming that ad = bc, so we can replace two occurrences of ad by bc in
the bottom row of the right matrix.
                                                    
                          c 0       ace + bcg acf + bch
                               =
                         0 a        ace + bcg acf + bch
    The matrix on the right now has two rows that are identical, and therefore the
same must be true of the matrix on the left. Identical rows for the matrix on the
left implies that a = 0 and c = 0.
    With this information, the product AB becomes
                                                             
                 1 0                 ae + bg af + bh       bg bh
                       = I2 = AB =                      =
                 0 1                  ce + dg cf + dh      dg dh
   So bg = dh = 1 and thus b, g, d, h are all nonzero. But then bh and dg (the “other
corners”) must also be nonzero, so this is (finally) a contradiction. So our assumption
was false and we see that ad − bc 6= 0 whenever A has an inverse.                     

    There are several ways one could try to prove this theorem, but there is a continual
temptation to divide by one of the eight entries involved (a through f ), but we can
never be sure if these numbers are zero or not. This could lead to an analysis by
cases, which is messy, messy, messy. Note how the above proof never divides, but
always multiplies, and how zero/nonzero considerations are handled. Pay attention
to the expression ad − bc, as we will see it again in a while (Chapter D).
    This theorem is cute, and it is nice to have a formula for the inverse, and a condition
that tells us when we can use it. However, this approach becomes impractical for
larger matrices, even though it is possible to demonstrate that, in theory, there is
a general formula. (Think for a minute about extending this result to just 3 × 3
matrices. For starters, we need 18 letters!) Instead, we will work column-by-column.
Let us first work an example that will motivate the main theorem and remove some
of the previous mystery.
Example CMI Computing a matrix inverse
Consider the matrix defined in Example MI as,
                                                              
                                1   2   1     2             1
                              −2 −3 0 −5                   −1
                                                              
                         A= 1      1   0     2             1
                              −2 −3 −1 −3                  −2
                               −1 −3 −1 −3                   1
    For its inverse, we desire a matrix B so that AB = I5 . Emphasizing the structure
of the columns and employing the definition of matrix multiplication Definition MM,

                                                 AB = I5
                            A[B1 |B2 |B3 |B4 |B5 ] = [e1 |e2 |e3 |e4 |e5 ]
                    [AB1 |AB2 |AB3 |AB4 |AB5 ] = [e1 |e2 |e3 |e4 |e5 ]
      Equating the matrices column-by-column we have
       AB1 = e1       AB2 = e2          AB3 = e3           AB4 = e4          AB5 = e5 .
§M I S L E           A First Course in Linear Algebra                           153

    Since the matrix B is what we are trying to compute, we can view each column,
Bi , as a column vector of unknowns. Then we have five systems of equations to
solve, each with 5 equations in 5 variables. Notice that all 5 of these systems have
the same coefficient matrix. We will now solve each system in turn,




Row-reduce the augmented matrix of the linear system LS(A, e1 ),
                                                                
                                   1     0    0   0   0 −3              
  1    2    1    2  1 1            0                                    −3
−2 −3 0 −5 −1 0                         1    0   0   0     0         0
                         RREF                                         
1     1    0    2  1 0 −−−−→  0         0    1   0   0     1  ; B1 =  1 
−2 −3 −1 −3 −2 0                                                     1
                                   0      0    0   1   0     1
 −1 −3 −1 −3 1 0                                                           1
                                     0     0    0   0   1     1
Row-reduce the augmented matrix of the linear system LS(A, e2 ),
                                                                
                                   1     0    0   0   0     3           
  1    2    1    2  1 0            0                                     3
−2 −3 0 −5 −1 1                         1    0   0   0 −2            −2
                         RREF                                         
1     1    0    2  1 0 −−−−→  0         0    1   0   0     2  ; B2 =  2 
−2 −3 −1 −3 −2 0                                                     0
                                   0      0    0   1   0     0
 −1 −3 −1 −3 1 0                                                          −1
                                     0     0    0   0   1 −1
Row-reduce the augmented matrix of the linear system LS(A, e3 ),
                                                                
                                   1     0    0   0   0     6           
  1    2    1    2  1 0            0                                     6
−2 −3 0 −5 −1 0                         1    0   0   0 −5            −5
                         RREF                                         
1     1    0    2  1 1 −−−−→  0         0    1   0   0     4  ; B3 =  4 
−2 −3 −1 −3 −2 0                                                     1
                                   0      0    0   1   0     1
 −1 −3 −1 −3 1 0                                                          −2
                                     0     0    0   0   1 −2
Row-reduce the augmented matrix of the linear system LS(A, e4 ),
                                                                
                                   1     0    0   0   0 −1              
  1    2    1    2  1 0            0                                    −1
−2 −3 0 −5 −1 0                         1    0   0   0 −1            −1
                         RREF                                         
1     1    0    2  1 0 −−−−→  0         0    1   0   0     1  ; B4 =  1 
−2 −3 −1 −3 −2 1                                                     1
                                   0      0    0   1   0     1
 −1 −3 −1 −3 1 0                                                           0
                                     0     0    0   0   1     0
Row-reduce the augmented matrix of the linear system LS(A, e5 ),
                                                                
                                   1     0    0   0   0 −2          
  1    2    1    2  1 0            0                                −2
−2 −3 0 −5 −1 0                         1    0   0   0     1     1
                         RREF                                     
1     1    0    2  1 0 −−−−→  0         0    1   0   0 −1 ; B5 = −1
−2 −3 −1 −3 −2 0                                                 0
                                   0      0    0   1   0     0
 −1 −3 −1 −3 1 1                                                       1
                                     0     0    0   0   1     1


We can now collect our 5 solution vectors into the matrix B,
                   B =[B1 |B2 |B3 |B4 |B5 ]
                                               
                         −3        3        6    −1     −2
                        0  −2 −5         −1    1 
                                               
                     =  1   2   4        1    −1 
                        1   0   1        1     0 
                           1      −1      −2      0      1
154                                 Ro b e rt B e e z e r                      §M I S L E
                                                              
                                 −3         3    6   −1     −2
                                0         −2   −5   −1      1
                                                              
                               = 1         2    4   1      −1
                                1          0    1   1      0
                                  1        −1   −2    0      1
   By this method, we know that AB = I5 . Check that BA = I5 , and then we will
know that we have the inverse of A.                                         4
   Notice how the five systems of equations in the preceding example were all solved
by exactly the same sequence of row operations. Would it not be nice to avoid this
obvious duplication of effort? Our main theorem for this section follows, and it
mimics this previous example, while also avoiding all the overhead.
Theorem CINM Computing the Inverse of a Nonsingular Matrix
Suppose A is a nonsingular square matrix of size n. Create the n × 2n matrix M by
placing the n × n identity matrix In to the right of the matrix A. Let N be a matrix
that is row-equivalent to M and in reduced row-echelon form. Finally, let J be the
matrix formed from the final n columns of N . Then AJ = In .


Proof. A is nonsingular, so by Theorem NMRRI there is a sequence of row operations
that will convert A into In . It is this same sequence of row operations that will
convert M into N , since having the identity matrix in the first n columns of N is
sufficient to guarantee that N is in reduced row-echelon form.
    If we consider the systems of linear equations, LS(A, ei ), 1 ≤ i ≤ n, we see that
the aforementioned sequence of row operations will also bring the augmented matrix
of each of these systems into reduced row-echelon form. Furthermore, the unique
solution to LS(A, ei ) appears in column n + 1 of the row-reduced augmented matrix
of the system and is identical to column n+i of N . Let N1 , N2 , N3 , . . . , N2n denote
the columns of N . So we find,
         AJ =A[Nn+1 |Nn+2 |Nn+3 | . . . |Nn+n ]
              =[ANn+1 |ANn+2 |ANn+3 | . . . |ANn+n ]               Definition MM
              =[e1 |e2 |e3 | . . . |en ]
              =In                                                  Definition IM
as desired.                                                                            


    We have to be just a bit careful here about both what this theorem says and
what it does not say. If A is a nonsingular matrix, then we are guaranteed a matrix
B such that AB = In , and the proof gives us a process for constructing B. However,
the definition of the inverse of a matrix (Definition MI) requires that BA = In also.
So at this juncture we must compute the matrix product in the “opposite” order
before we claim B as the inverse of A. However, we will soon see that this is always
the case, in Theorem OSIS, so the title of this theorem is not inaccurate.
    What if A is singular? At this point we only know that Theorem CINM cannot
be applied. The question of A’s inverse is still open. (But see Theorem NI in the
next section.)
    We will finish by computing the inverse for the coefficient matrix of Archetype
B, the one we just pulled from a hat in Example SABMI. There are more examples
in the Archetypes (Archetypes) to practice with, though notice that it is silly to ask
for the inverse of a rectangular matrix (the sizes are not right) and not every square
matrix has an inverse (remember Example MWIAA?).
Example CMIAB Computing a matrix inverse, Archetype B
Archetype B has a coefficient matrix given as
                              "               #
                               −7 −6 −12
                         B= 5        5     7
                                1    0     4
§M I S L E           A First Course in Linear Algebra                           155

Exercising Theorem CINM we set
                          "                           #
                           −7          −6   −12 1 0 0
                      M= 5              5    7  0 1 0 .
                            1           0    4  0 0 1
which row reduces to
                                                                 
                              1 0 0         −10        −12   −9
                         N = 0 1 0          13
                                              2         8    11
                                                              2
                                                                  .
                                              5               5
                              0 0 1           2         3     2

So
                                                      
                                 −10   −12    −9
                       B −1   =  13
                                  2     8         11
                                                  2
                                                       
                                  5               5
                                  2     3         2

once we check that B −1 B = I3 (the product in the opposite order is a consequence
of the theorem).                                                                4


Subsection PMI
Properties of Matrix Inverses
The inverse of a matrix enjoys some nice properties. We collect a few here. First, a
matrix can have but one inverse.
Theorem MIU Matrix Inverse is Unique
Suppose the square matrix A has an inverse. Then A−1 is unique.

Proof. As described in Proof Technique U, we will assume that A has two inverses.
The hypothesis tells there is at least one. Suppose then that B and C are both
inverses for A, so we know by Definition MI that AB = BA = In and AC = CA = In .
Then we have,
                  B = BIn                              Theorem MMIM
                    = B(AC)                            Definition MI
                    = (BA)C                            Theorem MMA
                    = In C                             Definition MI
                    =C                                 Theorem MMIM
  So we conclude that B and C are the same, and cannot be different. So any
matrix that acts like an inverse, must be the inverse.                   

   When most of us dress in the morning, we put on our socks first, followed by our
shoes. In the evening we must then first remove our shoes, followed by our socks.
Try to connect the conclusion of the following theorem with this everyday example.
Theorem SS Socks and Shoes
Suppose A and B are invertible matrices of size n. Then AB is an invertible matrix
and (AB)−1 = B −1 A−1 .

Proof. At the risk of carrying our everyday analogies too far, the proof of this
theorem is quite easy when we compare it to the workings of a dating service. We
have a statement about the inverse of the matrix AB, which for all we know right
now might not even exist. Suppose AB was to sign up for a dating service with
two requirements for a compatible date. Upon multiplication on the left, and on
the right, the result should be the identity matrix. In other words, AB’s ideal date
would be its inverse.
   Now along comes the matrix B −1 A−1 (which we know exists because our hy-
pothesis says both A and B are invertible and we can form the product of these
two matrices), also looking for a date. Let us see if B −1 A−1 is a good match for
156                            Ro b e rt B e e z e r                        §M I S L E

AB. First they meet at a noncommittal neutral location, say a coffee shop, for quiet
conversation:
           (B −1 A−1 )(AB) = B −1 (A−1 A)B                 Theorem MMA
                                     −1
                               =B         In B             Definition MI
                                     −1
                               =B         B                Theorem MMIM
                               = In                        Definition MI
The first date having gone smoothly, a second, more serious, date is arranged, say
dinner and a show:
           (AB)(B −1 A−1 ) = A(BB −1 )A−1                  Theorem MMA
                                           −1
                               = AIn A                     Definition MI
                                      −1
                               = AA                        Theorem MMIM
                               = In                        Definition MI
                    −1    −1
   So the matrix B A has met all of the requirements to be AB’s inverse (date)
and with the ensuing marriage proposal we can announce that (AB)−1 = B −1 A−1 .


Theorem MIMI Matrix Inverse of a Matrix Inverse
Suppose A is an invertible matrix. Then A−1 is invertible and (A−1 )−1 = A.

Proof. As with the proof of Theorem SS, we examine if A is a suitable inverse for
A−1 (by definition, the opposite is true).
                   AA−1 = In                           Definition MI
and
                   A−1 A = In                          Definition MI
   The matrix A has met all the requirements to be the inverse of A−1 , and so is
invertible and we can write A = (A−1 )−1 .                                     

Theorem MIT Matrix Inverse of a Transpose
Suppose A is an invertible matrix. Then At is invertible and (At )−1 = (A−1 )t .

Proof. As with the proof of Theorem SS, we see if (A−1 )t is a suitable inverse for
At . Apply Theorem MMT to see that
               (A−1 )t At = (AA−1 )t                    Theorem MMT
                           =   Int                      Definition MI
                           = In                         Definition SYM
and
               At (A−1 )t = (A−1 A)t                    Theorem MMT
                           =   Int                      Definition MI
                           = In                         Definition SYM
    The matrix (A−1 )t has met all the requirements to be the inverse of At , and so
is invertible and we can write (At )−1 = (A−1 )t .                                

Theorem MISM Matrix Inverse of a Scalar Multiple
                                                                      −1           1 −1
Suppose A is an invertible matrix and α is a nonzero scalar. Then (αA) =           αA
and αA is invertible.

Proof. As with the proof of Theorem SS, we see if α1 A−1 is a suitable inverse for αA.
                           
         1 −1              1          
           A    (αA) =       α A−1 A          Theorem MMSMM
         α                 α
                         = 1In                    Scalar multiplicative inverses
§M I S L E                  A First Course in Linear Algebra                                        157

                                = In                               Property OM
and
                                             
                     1 −1                   1              
          (αA)         A        =       α           AA−1           Theorem MMSMM
                     α                      α
                                = 1In                              Scalar multiplicative inverses
                                = In                               Property OM
   The matrix α1 A−1 has met all the requirements to be the inverse of αA, so we
              −1
can write (αA) = α1 A−1 .                                                      

    Notice that there are some likely theorems that are missing here. For example, it
would be tempting to think that (A + B)−1 = A−1 + B −1 , but this is false. Can you
find a counterexample? (See Exercise MISLE.T10.)

Reading Questions

1. Compute the inverse of the matrix below.
                                                                  
                                         −2                    3
                                         −3                    4

2. Compute the inverse of the matrix below.
                                                                     
                                      2     3                       1
                                    1     −2                      −3
                                      −2    4                      6

3. Explain why Theorem SS has the title it does. (Do not just state the theorem, explain
   the choice of the title making reference to the theorem itself.)


Exercises
                                                                    
                                                          1     0  1
      †
C16        If it exists, find the inverse of         A = 1     1  1, and check your answer.
                                                          2    −1  1
                                                                    
                                                          2    −1 1
C17†       If it exists, find the inverse of         A = 1     2  1, and check your answer.
                                                          3     1  2
                                                                  
                                                          1    3 1
C18†       If it exists, find the inverse of         A = 1    2 1, and check your answer.
                                                          2    2 1
                                                                  
                                                          1    3 1
C19†       If it exists, find the inverse of         A = 0    2 1, and check your answer.
                                                          2    2 1
C21†       Verify that B is the     inverse of A.
                      1     1        −1    2                            4          2     0   −1
                                                                                             
                    −2 −1            2   −3                        8            4    −1   −1
               A=                                                 B=
                      1     1         0    2                          −1           0   1     0
                      −1    2         0    2                                  −6   −3    1   1

C22†       Recycle the matrices A and B from Exercise MISLE.C21 and set
                                 2                           1
                                                           
                              1                           1
                         c=                            d= 
                                −3                           1
                                 2                           1
Employ the matrix B to solve the two linear systems LS(A, c) and LS(A, d).
C23       If it exists, find the inverse of the 2 × 2 matrix
                                                       
                                                    7 3
                                              A=
                                                    5 2
and check your answer. (See Theorem TTMI.)
158                            Ro b e rt B e e z e r                          §M I S L E

C24    If it exists, find the inverse of the 2 × 2 matrix
                                                    
                                                 6 3
                                           A=
                                                 4 2
and check your answer. (See Theorem TTMI.)
C25 At the conclusion of Example CMI, verify that BA = I5 by computing the matrix
product.
C26†    Let
                                   1          −1     3    −2   1
                                                                
                                 −2           3    −5     3   0
                               D= 1          −1     4    −2   2
                                                                
                                 −1           4    −1     0   4
                                   1           0     5    −2   5
Compute the inverse of D, D−1 , by forming the 5 × 10 matrix [ D | I5 ] and row-reducing
(Theorem CINM). Then use a calculator to compute D−1 directly.
C27†    Let
                                  1          −1     3    −2     1
                                                                 
                                −2           3    −5     3    −1
                              E= 1          −1     4    −2     2
                                                                 
                                −1           4    −1     0    2
                                  1          0      5    −2     4
Compute the inverse of E, E −1 , by forming the 5 × 10 matrix [ E | I5 ] and row-reducing
(Theorem CINM). Then use a calculator to compute E −1 directly.
C28†    Let
                                     1         1      3     1
                                                              
                                   −2         −1    −4     −1
                                 C=
                                     1         4     10     2 
                                    −2          0    −4      5
Compute the inverse of C, C −1 , by forming the 4 × 8 matrix [ C | I4 ] and row-reducing
(Theorem CINM). Then use a calculator to compute C −1 directly.
C40† Find all solutions to the system of equations below, making use of the matrix inverse
found in Exercise MISLE.C28.
                                   x1 + x2 + 3x3 + x4 = −4
                                −2x1 − x2 − 4x3 − x4 = 4
                               x1 + 4x2 + 10x3 + 2x4 = −20
                                    −2x1 − 4x3 + 5x4 = 9

C41† Use the inverse of a matrix to find all the solutions to the following system of
equations.
                                       x1 + 2x2 − x3 = −3
                                    2x1 + 5x2 − x3 = −4
                                            −x1 − 4x2 = 2

C42†    Use a matrix inverse to solve the linear system of equations.
                                    x1 − x2 + 2x3 = 5
                                             x1 − 2x3 = −8
                                    2x1 − x2 − x3 = −6

T10† Construct an example to demonstrate that (A + B)−1 = A−1 + B −1 is not true for
all square matrices A and B of the same size.
Section MINM
Matrix Inverses and Nonsingular Matrices
We saw in Theorem CINM that if a square matrix A is nonsingular, then there
is a matrix B so that AB = In . In other words, B is halfway to being an inverse
of A. We will see in this section that B automatically fulfills the second condition
(BA = In ). Example MWIAA showed us that the coefficient matrix from Archetype
A had no inverse. Not coincidentally, this coefficient matrix is singular. We will make
all these connections precise now. Not many examples or definitions in this section,
just theorems.

Subsection NMI
Nonsingular Matrices are Invertible
We need a couple of technical results for starters. Some books would call these minor,
but essential, results “lemmas.” We’ll just call ’em theorems. See Proof Technique
LC for more on the distinction.
    The first of these technical results is interesting in that the hypothesis says
something about a product of two square matrices and the conclusion then says the
same thing about each individual matrix in the product. This result has an analogy
in the algebra of complex numbers: suppose α, β ∈ C, then αβ 6= 0 if and only if
α 6= 0 and β =6 0. We can view this result as suggesting that the term “nonsingular”
for matrices is like the term “nonzero” for scalars. Consider too that we know
singular matrices, as coefficient matrices for systems of equations, will sometimes
lead to systems with no solutions, or systems with infinitely many solutions (Theorem
NMUS). What do linear equations with zero look like? Consider 0x = 5, which has
no solution, and 0x = 0, which has infinitely many solutions. In the algebra of scalars,
zero is exceptional (meaning different, not better), and in the algebra of matrices,
singular matrices are also the exception. While there is only one zero scalar, and
there are infinitely many singular matrices, we will see that singular matrices are a
distinct minority.
Theorem NPNT Nonsingular Product has Nonsingular Terms
Suppose that A and B are square matrices of size n. The product AB is nonsingular
if and only if A and B are both nonsingular.

Proof. (⇒) For this portion of the proof we will form the logically-equivalent con-
trapositive and prove that statement using two cases. “AB is nonsingular implies A
and B are both nonsingular” becomes “A or B is singular implies AB is singular.”
(Be sure to undertstand why the “and” became an “or”, see Proof Technique CP.)
   Case 1. Suppose B is singular. Then there is a nonzero vector z that is a solution
to LS(B, 0). So
                 (AB)z = A(Bz)                     Theorem MMA
                        = A0                       Theorem SLEMM
                        =0                         Theorem MMZM


    With Theorem SLEMM we can translate this vector equality to the statement
that z is a nonzero solution to LS(AB, 0). Thus AB is singular (Definition NM), as
desired.
    Case 2. Suppose A is singular, and B is not singular. In other words, with Case
1 complete, we can be more precise about this remaining case and assume that B is
nonsingular. Because A is singular, there is a nonzero vector y that is a solution
to LS(A, 0). Now consider the linear system LS(B, y). Since B is nonsingular, the
system has a unique solution (Theorem NMUS), which we will denote as w. We first
claim w is not the zero vector either. Assuming the opposite, suppose that w = 0
(Proof Technique CD). Then
                      y = Bw                       Theorem SLEMM

                                         159
160                           Ro b e rt B e e z e r                           §M I N M

                         = B0                         Hypothesis
                         =0                           Theorem MMZM
contrary to y being nonzero. So w 6= 0. The pieces are in place, so here we go,
                (AB)w = A(Bw)                         Theorem MMA
                         = Ay                         Theorem SLEMM
                         =0                           Theorem SLEMM


    With Theorem SLEMM we can translate this vector equality to the statement
that w is a nonzero solution to LS(AB, 0). Thus AB is singular (Definition NM),
as desired. And this conclusion holds for both cases.
    (⇐) Now assume that both A and B are nonsingular. Suppose that x ∈ Cn is a
solution to LS(AB, 0). Then
                   0 = (AB) x                     Theorem SLEMM
                     = A (Bx)                     Theorem MMA
   By Theorem SLEMM, Bx is a solution to LS(A, 0), and by the definition of a
nonsingular matrix (Definition NM), we conclude that Bx = 0. Now, by an entirely
similar argument, the nonsingularity of B forces us to conclude that x = 0. So
the only solution to LS(AB, 0) is the zero vector and we conclude that AB is
nonsingular by Definition NM.                                                 

    This is a powerful result in the “forward” direction, because it allows us to begin
with a hypothesis that something complicated (the matrix product AB) has the
property of being nonsingular, and we can then conclude that the simpler constituents
(A and B individually) then also have the property of being nonsingular. If we had
thought that the matrix product was an artificial construction, results like this would
make us begin to think twice.
    The contrapositive of this entire result is equally interesting. It says that A or B
(or both) is a singular matrix if and only if the product AB is singular. (See Proof
Technique CP.)
Theorem OSIS One-Sided Inverse is Sufficient
Suppose A and B are square matrices of size n such that AB = In . Then BA = In .

Proof. The matrix In is nonsingular (since it row-reduces easily to In , Theorem
NMRRI). So A and B are nonsingular by Theorem NPNT, so in particular B is
nonsingular. We can therefore apply Theorem CINM to assert the existence of a
matrix C so that BC = In . This application of Theorem CINM could be a bit
confusing, mostly because of the names of the matrices involved. B is nonsingular,
so there must be a “right-inverse” for B, and we are calling it C.
    Now
                 BA = (BA)In                          Theorem MMIM
                     = (BA)(BC)                       Theorem CINM
                     = B(AB)C                         Theorem MMA
                     = BIn C                          Hypothesis
                     = BC                             Theorem MMIM
                     = In                             Theorem CINM
which is the desired conclusion.                                                      

   So Theorem OSIS tells us that if A is nonsingular, then the matrix B guaranteed
by Theorem CINM will be both a “right-inverse” and a “left-inverse” for A, so A is
invertible and A−1 = B.
   So if you have a nonsingular matrix, A, you can use the procedure described
in Theorem CINM to find an inverse for A. If A is singular, then the procedure
in Theorem CINM will fail as the first n columns of M will not row-reduce to
§M I N M             A First Course in Linear Algebra                                161

the identity matrix. However, we can say a bit more. When A is singular, then A
does not have an inverse (which is very different from saying that the procedure in
Theorem CINM fails to find an inverse). This may feel like we are splitting hairs,
but it is important that we do not make unfounded assumptions. These observations
motivate the next theorem.
Theorem NI Nonsingularity is Invertibility
Suppose that A is a square matrix. Then A is nonsingular if and only if A is invertible.

Proof. (⇐) Since A is invertible, we can write In = AA−1 (Definition MI). Notice
that In is nonsingular (Theorem NMRRI) so Theorem NPNT implies that A (and
A−1 ) is nonsingular.
   (⇒) Suppose now that A is nonsingular. By Theorem CINM we find B so that
AB = In . Then Theorem OSIS tells us that BA = In . So B is A’s inverse, and by
construction, A is invertible.                                                

    So for a square matrix, the properties of having an inverse and of having a trivial
null space are one and the same. Cannot have one without the other.
Theorem NME3 Nonsingular Matrix Equivalences, Round 3
Suppose that A is a square matrix of size n. The following are equivalent.

   1. A is nonsingular.

   2. A row-reduces to the identity matrix.

   3. The null space of A contains only the zero vector, N (A) = {0}.

   4. The linear system LS(A, b) has a unique solution for every possible choice of
      b.

   5. The columns of A are a linearly independent set.

   6. A is invertible.

Proof. We can update our list of equivalences for nonsingular matrices (Theorem
NME2) with the equivalent condition from Theorem NI.                          

   In the case that A is a nonsingular coefficient matrix of a system of equations,
the inverse allows us to very quickly compute the unique solution, for any vector of
constants.
Theorem SNCM Solution with Nonsingular Coefficient Matrix
Suppose that A is nonsingular. Then the unique solution to LS(A, b) is A−1 b.

Proof. By Theorem NMUS we know already that LS(A, b) has a unique solution
for every choice of b. We need to show that the expression stated is indeed a solution
(the solution). That is easy, just “plug it in” to the vector equation representation
of the system (Theorem SLEMM),
                                   
               A A−1 b = AA−1 b                       Theorem MMA
                          = In b                        Definition MI
                          =b                            Theorem MMIM
                                                 −1
   Since Ax = b is true when we substitute A          b for x, A−1 b is a (the!) solution
to LS(A, b).                                                                            

Subsection UM
Unitary Matrices
                                                 t
Recall that the adjoint of a matrix is A∗ = A         (Definition A).
Definition UM Unitary Matrices
Suppose that U is a square matrix of size n such that U ∗ U = In . Then we say U is
unitary.                                                                         
162                              Ro b e rt B e e z e r                              §M I N M

  This condition may seem rather far-fetched at first glance. Would there be any
matrix that behaved this way? Well, yes, here is one.
Example UM3 Unitary matrix of size 3
                                               1+i    3+2
                                                                      
                                                √      √ i     2+2i
                                                               √
                                                  5      55      22
                                               1−i    2+2
                                                       √ i    −3+i  
                                 U=            √5       55
                                                               √
                                                                 22 
                                                √i     3−5
                                                       √ i        2
                                                              − √22
                                                  5      55
The computations get a bit tiresome, but if you work your way through the compu-
tation of U ∗ U , you will arrive at the 3 × 3 identity matrix I3 .           4
   Unitary matrices do not have to look quite so gruesome. Here is a larger one
that is a bit more pleasing.
Example UPM Unitary permutation matrix
The matrix
                                                                 
                            0 1 0 0                             0
                           0 0 0 1                             0
                                                                 
                       P = 1 0 0 0                             0
                           0 0 0 0                             1
                            0 0 1 0                             0
is unitary as can be easily checked. Notice that it is just a rearrangement of the
columns of the 5 × 5 identity matrix, I5 (Definition IM).
    An interesting exercise is to build another 5 × 5 unitary matrix, R, using a
different rearrangement of the columns of I5 . Then form the product P R. This
will be another unitary matrix (Exercise MINM.T10). If you were to build all
5! = 5 × 4 × 3 × 2 × 1 = 120 matrices of this type you would have a set that
remains closed under matrix multiplication. It is an example of another algebraic
structure known as a group since together the set and the one operation (matrix
multiplication here) is closed, associative, has an identity (I5 ), and inverses (Theorem
UMI). Notice though that the operation in this group is not commutative!                4
    If a matrix A has only real number entries (we say it is a real matrix) then
the defining property of being unitary simplifies to At A = In . In this case we, and
everybody else, call the matrix orthogonal, so you may often encounter this term
in your other reading when the complex numbers are not under consideration.
    Unitary matrices have easily computed inverses. They also have columns that
form orthonormal sets. Here are the theorems that show us that unitary matrices
are not as strange as they might initially appear.
Theorem UMI Unitary Matrices are Invertible
Suppose that U is a unitary matrix of size n. Then U is nonsingular, and U −1 = U ∗ .

Proof. By Definition UM, we know that U ∗ U = In . The matrix In is nonsingular
(since it row-reduces easily to In , Theorem NMRRI). So by Theorem NPNT, U and
U ∗ are both nonsingular matrices.
    The equation U ∗ U = In gets us halfway to an inverse of U , and Theorem OSIS
tells us that then U U ∗ = In also. So U and U ∗ are inverses of each other (Definition
MI).                                                                                  

Theorem CUMOS Columns of Unitary Matrices are Orthonormal Sets
Suppose that S = {A1 , A2 , A3 , . . . , An } is the set of columns of a square matrix A
of size n. Then A is a unitary matrix if and only if S is an orthonormal set.

Proof. The proof revolves around recognizing that a typical entry of the product
A∗ A is an inner product of columns of A. Here are the details to support this claim.
                         n
                         X
              [A∗ A]ij =    [A∗ ]ik [A]kj            Theorem EMP
                           k=1
                           Xn h           i
                                      t
                       =          A            [A]kj                  Theorem EMP
                                          ik
                           k=1
§M I N M              A First Course in Linear Algebra                                              163

                           n
                           X         
                       =          A       ki
                                               [A]kj                       Definition TM
                           k=1
                           Xn
                       =         [A]ki [A]kj                               Definition CCM
                           k=1
                           Xn
                       =         [Ai ]k [Aj ]k
                           k=1
                       = hAi , Aj i                                        Definition IP
   We now employ this equality in a chain of equivalences,
      S = {A1 , A2 , A3 , . . . , An } is an orthonormal set
                          (
                             0 if i 6= j
       ⇐⇒ hAi , Aj i =                                                             Definition ONS
                             1 if i = j
                       (
                          0 if i 6= j
       ⇐⇒ [A∗ A]ij =
                          1 if i = j
         ⇐⇒ [A∗ A]ij = [In ]ij , 1 ≤ i ≤ n, 1 ≤ j ≤ n                              Definition IM
                ∗
         ⇐⇒ A A = In                                                               Definition ME
         ⇐⇒ A is a unitary matrix                                                  Definition UM
                                                                                                     

Example OSMC Orthonormal set from matrix columns
The matrix
                         1+i 3+2 i    2+2i
                                            
                                               √       √           √
                                      √5                55          22
                                                                  −3+i   
                                 U =  1−i
                                         5
                                                       2+2
                                                       √ i
                                                         55
                                                                   √
                                                                     22 
                                               √i      3−5
                                                       √ i        − √222
                                                 5       55
from Example UM3 is a unitary matrix. By Theorem CUMOS, its columns
                        1+i   3+2 i   2+2i 
                       
                        √5        √
                                     55
                                             √
                                               22
                                                   
                                                   
                          1−i
                           √     2+2
                                   √   i   −3+i
                                             √    
                          5  ,  55  ,  22 
                       
                        √i                        
                                   3−5
                                   √ i
                                      5
                                            − √2      55             22
form an orthonormal set. You might find checking the six inner products of pairs
of these vectors easier than doing the matrix product U ∗ U . Or, because the inner
product is anti-commutative (Theorem IPAC) you only need check three inner
products (see Exercise MINM.T12).                                                4
   When using vectors and matrices that only have real number entries, orthogonal
matrices are those matrices with inverses that equal their transpose. Similarly, the
inner product is the familiar dot product. Keep this special case in mind as you read
the next theorem.
Theorem UMPIP Unitary Matrices Preserve Inner Products
Suppose that U is a unitary matrix of size n and u and v are two vectors from Cn .
Then
             hU u, U vi = hu, vi                            and                 kU vk = kvk

Proof.
                                t
                hU u, U vi = U u U v                                  Theorem MMIP
                                t
                           = Uu Uv                                    Theorem MMCC
                                  t   t
                           = u U Uv                                   Theorem MMT
                                  t   ∗
                           = u U Uv                                   Definition A
                                  t
                           = u In v                                   Definition UM
                                  t
                           =uv                                        Theorem MMIM
164                            Ro b e rt B e e z e r                            §M I N M

                            = hu, vi                     Theorem MMIP
      The second conclusion is just a specialization of the first conclusion.
                          q
                                   2
                  kU vk = kU vk
                          p
                        = hU v, U vi                     Theorem IPN
                          p
                        = hv, vi
                          q
                                 2
                        = kvk                            Theorem IPN
                         = kvk
                                                                                        

    Aside from the inherent interest in this theorem, it makes a bigger statement
about unitary matrices. When we view vectors geometrically as directions or forces,
then the norm equates to a notion of length. If we transform a vector by multiplication
with a unitary matrix, then the length (norm) of that vector stays the same. If we
consider column vectors with two or three slots containing only real numbers, then
the inner product of two such vectors is just the dot product, and this quantity can
be used to compute the angle between two vectors. When two vectors are multiplied
(transformed) by the same unitary matrix, their dot product is unchanged and their
individual lengths are unchanged. This results in the angle between the two vectors
remaining unchanged.
    A “unitary transformation” (matrix-vector products with unitary matrices) thus
preserve geometrical relationships among vectors representing directions, forces, or
other physical quantities. In the case of a two-slot vector with real entries, this is
simply a rotation. These sorts of computations are exceedingly important in computer
graphics such as games and real-time simulations, especially when increased realism
is achieved by performing many such computations quickly. We will see unitary
matrices again in subsequent sections (especially Theorem OD) and in each instance,
consider the interpretation of the unitary matrix as a sort of geometry-preserving
transformation. Some authors use the term isometry to highlight this behavior. We
will speak loosely of a unitary matrix as being a sort of generalized rotation.
    A final reminder: the terms “dot product,” “symmetric matrix” and “orthogonal
matrix” used in reference to vectors or matrices with real number entries are special
cases of the terms “inner product,” “Hermitian matrix” and “unitary matrix” that
we use for vectors or matrices with complex number entries, so keep that in mind as
you read elsewhere.

Reading Questions

1. Compute the inverse of the coefficient matrix of the system of equations below and use
   the inverse to solve the system.
                                       4x1 + 10x2 = 12
                                        2x1 + 6x2 = 4

2. In the reading questions for Section MISLE you were asked to find the inverse of the
   3 × 3 matrix below.
                                                 
                                        2   3   1
                                      1   −2 −3
                                        −2  4   6
   Because the matrix was not nonsingular, you had no theorems at that point that would
   allow you to compute the inverse. Explain why you now know that the inverse does
   not exist (which is different than not being able to compute it) by quoting the relevant
   theorem’s acronym.
3. Is the matrix A unitary? Why?
                              "                                       #
                                  √1 (4 + 2i)     √1    (5   + 3i)
                            A=      22              374
                                  √1 (−1 − i)    √1   (12    + 14i)
                                   22             374
§M I N M                   A First Course in Linear Algebra                            165

Exercises
                                                 
                   1   2    1            −1   1   0
C20       Let A = 0   1    1 and B =  1    2   1. Verify that AB is nonsingular.
                   1   0    2             0   1   1
C40†      Solve the system of equations below using the inverse of a matrix.
                                    x1 + x2 + 3x3 + x4 = 5
                                 −2x1 − x2 − 4x3 − x4 = −7
                                 x1 + 4x2 + 10x3 + 2x4 = 9
                                       −2x1 − 4x3 + 5x4 = 9
                                                               
                                                       1 2    x
      †
M10        Find values of x, y, z so that matrix A = 3 0     y  is invertible.
                                                       1 1    z
                                                               
                                                      1 x     1
M11†       Find values of x, y z so that matrix A = 1 y      4 is singular.
                                                      0 z     5
M15† If A and B are n × n matrices, A is nonsingular, and B is singular, show directly
that AB is singular, without using Theorem NPNT.
M20†       Construct an example of a 4 × 4 unitary matrix.
M80† Matrix multiplication interacts nicely with many operations. But not always with
transforming a matrix to reduced row-echelon form. Suppose that A is an m × n matrix
and B is an n × p matrix. Let P be a matrix that is row-equivalent to A and in reduced
row-echelon form, Q be a matrix that is row-equivalent to B and in reduced row-echelon
form, and let R be a matrix that is row-equivalent to AB and in reduced row-echelon form.
Is P Q = R? (In other words, with nonstandard notation, is rref(A)rref(B) = rref(AB)?)

Construct a counterexample to show that, in general, this statement is false. Then find a
large class of matrices where if A and B are in the class, then the statement is true.
T10 Suppose that Q and P are unitary matrices of size n. Prove that QP is a unitary
matrix.
T11 Prove that Hermitian matrices (Definition HM) have real entries on the diagonal.
More precisely, suppose that A is a Hermitian matrix of size n. Then [A]ii ∈ R, 1 ≤ i ≤ n.
T12 Suppose that we are checking if a square matrix of size n is unitary. Show that
a straightforward application of Theorem CUMOS requires the computation of n2 inner
products when the matrix is unitary, and fewer when the matrix is not orthogonal. Then
show that this maximum number of inner products can be reduced to 12 n(n + 1) in light of
Theorem IPAC.
T25 The notation Ak means a repeated matrix product between k copies of the square
matrix A.

   1. Assume A is an n × n matrix where A2 = O (which does not imply that A = O.)
      Prove that In − A is invertible by showing that In + A is an inverse of In − A.

   2. Assume that A is an n × n matrix where A3 = O. Prove that In − A is invertible.

   3. Form a general theorem based on your observations from parts (1) and (2) and
      provide a proof.
166   Ro b e rt B e e z e r   §M I N M
Section CRS
Column and Row Spaces
A matrix-vector product (Definition MVP) is a linear combination of the columns
of the matrix and this allows us to connect matrix multiplication with systems of
equations via Theorem SLSLC. Row operations are linear combinations of the rows
of a matrix, and of course, reduced row-echelon form (Definition RREF) is also
intimately related to solving systems of equations. In this section we will formalize
these ideas with two key definitions of sets of vectors derived from a matrix.


Subsection CSSE
Column Spaces and Systems of Equations
Theorem SLSLC showed us that there is a natural correspondence between solutions
to linear systems and linear combinations of the columns of the coefficient matrix.
This idea motivates the following important definition.
Definition CSM Column Space of a Matrix
Suppose that A is an m × n matrix with columns A1 , A2 , A3 , . . . , An . Then
the column space of A, written C(A), is the subset of Cm containing all linear
combinations of the columns of A,
                         C(A) = h{A1 , A2 , A3 , . . . , An }i
                                                                                   
    Some authors refer to the column space of a matrix as the range, but we will
reserve this term for use with linear transformations (Definition RLT).
    Upon encountering any new set, the first question we ask is what objects are in
the set, and which objects are not? Here is an example of one way to answer this
question, and it will motivate a theorem that will then answer the question precisely.
Example CSMCS Column space of a matrix and consistent systems
Archetype D and Archetype E are linear systems of equations, with an identical
3 × 4 coefficient matrix, which we call A here. However, Archetype D is consistent,
while Archetype E is not. We can explain this difference by employing the column
space of the matrix A.
    The column vector of constants, b, in Archetype D is given below, and one
solution listed for LS(A, b) is x,

                                                           
                          "    #                           7
                          8
                                                          8
                     b = −12                            x= 
                                                           1
                          4
                                                           3
   By Theorem SLSLC, we can summarize this solution as a linear combination of
the columns of A that equals b,
                 " #      " #   " #     " # "          #
                   2        1    7        −7        8
                7 −3 + 8 4 + 1 −5 + 3 −6 = −12 = b.
                   1        1    4        −5        4
    This equation says that b is a linear combination of the columns of A, and then
by Definition CSM, we can say that b ∈ C(A).
    On the other hand, Archetype E is the linear system LS(A, c), where the vector
of constants is
                                           " #
                                            2
                                       c= 3
                                            2
and this system of equations is inconsistent. This means c 6∈ C(A), for if it were,
then it would equal a linear combination of the columns of A and Theorem SLSLC
would lead us to a solution of the system LS(A, c).                             4

                                         167
168                           Ro b e rt B e e z e r                            §C R S

    So if we fix the coefficient matrix, and vary the vector of constants, we can
sometimes find consistent systems, and sometimes inconsistent systems. The vectors
of constants that lead to consistent systems are exactly the elements of the column
space. This is the content of the next theorem, and since it is an equivalence, it
provides an alternate view of the column space.
Theorem CSCS Column Spaces and Consistent Systems
Suppose A is an m × n matrix and b is a vector of size m. Then b ∈ C(A) if and
only if LS(A, b) is consistent.

Proof. (⇒) Suppose b ∈ C(A). Then we can write b as some linear combination
of the columns of A. By Theorem SLSLC we can use the scalars from this linear
combination to form a solution to LS(A, b), so this system is consistent.
    (⇐) If LS(A, b) is consistent, there is a solution that may be used with Theorem
SLSLC to write b as a linear combination of the columns of A. This qualifies b for
membership in C(A).                                                                

    This theorem tells us that asking if the system LS(A, b) is consistent is exactly
the same question as asking if b is in the column space of A. Or equivalently, it tells
us that the column space of the matrix A is precisely those vectors of constants, b,
that can be paired with A to create a system of linear equations LS(A, b) that is
consistent.
    Employing Theorem SLEMM we can form the chain of equivalences
         b ∈ C(A) ⇐⇒ LS(A, b) is consistent ⇐⇒ Ax = b for some x
  Thus, an alternative (and popular) definition of the column space of an m × n
matrix A is
        C(A) = { y ∈ Cm | y = Ax for some x ∈ Cn } = { Ax| x ∈ Cn } ⊆ Cm
    We recognize this as saying create all the matrix vector products possible with
the matrix A by letting x range over all of the possibilities. By Definition MVP
we see that this means take all possible linear combinations of the columns of A —
precisely the definition of the column space (Definition CSM) we have chosen.
    Notice how this formulation of the column space looks very much like the definition
of the null space of a matrix (Definition NSM), but for a rectangular matrix the
column vectors of C(A) and N (A) have different sizes, so the sets are very different.
    Given a vector b and a matrix A it is now very mechanical to test if b ∈ C(A).
Form the linear system LS(A, b), row-reduce the augmented matrix, [ A | b], and
test for consistency with Theorem RCLS. Here is an example of this procedure.
Example MCSM Membership in the column space of a matrix
Consider the column space of the 3 × 4 matrix A,
                                "                #
                                  3    2    1 −4
                           A = −1 1 −2 3
                                  2 −4 6 −8
                          " #
                            18
   We first show that v = −6 is in the column space of A, v ∈ C(A). Theorem
                            12
CSCS says we need only check the consistency of LS(A, v). Form the augmented
matrix and row-reduce,
            "                       #                           
              3   2    1 −4 18                 1  0    1 −2 6
                                      RREF
             −1 1 −2 3 −6 −−−−→  0               1 −1 1 0
              2 −4 6 −8 12                     0  0    0    0 0
    Since the final column is not a pivot column, Theorem RCLS tells us the system
is consistent and therefore by Theorem CSCS, v ∈ C(A).
    If we wished to demonstrate explicitly that v is a linear combination of the
columns of A, we can find a solution (any solution) of LS(A, v) and use Theorem
SLSLC to construct the desired linear combination. For example, set the free variables
to x3 = 2 and x4 = 1. Then a solution has x2 = 1 and x1 = 6. Then by Theorem
§C R S             A First Course in Linear Algebra                                169

SLSLC,
                      "#    " #      " #     " #       " #
                    18        3       2         1       −4
              v = −6 = 6 −1 + 1 1 + 2 −2 + 1 3
                    12        2       −4        6       −8
                           " #
                             2
   Now we show that w = 1 is not in the column space of A, w 6∈ C(A).
                            −3
Theorem CSCS says we need only check the consistency of LS(A, w). Form the
augmented matrix and row-reduce,
         "                       #                            
           3    2    1 −4 2                1   0    1 −2 0
                                   RREF
           −1 1 −2 3           1 −−−−→  0     1 −1 1        0
           2 −4 6 −8 −3                    0   0    0    0   1
   since the final column is a pivot column, Theorem RCLS tells us the system is
inconsistent and therefore by Theorem CSCS, w 6∈ C(A).                       4

    Theorem CSCS completes a collection of three theorems, and one definition,
that deserve comment. Many questions about spans, linear independence, null space,
column spaces and similar objects can be converted to questions about systems
of equations (homogeneous or not), which we understand well from our previous
results, especially those in Chapter SLE. These previous results include theorems
like Theorem RCLS which allows us to quickly decide consistency of a system, and
Theorem BNS which allows us to describe solution sets for homogeneous systems
compactly as the span of a linearly independent set of column vectors.
   The table below lists these four definitions and theorems along with a brief
reminder of the statement and an example of how the statement is used.


 Definition NSM
           Synopsis       Null space is solution set of homogeneous system
           Example        General solution sets described by Theorem PSPHS
 Theorem SLSLC
           Synopsis       Solutions for linear combinations with unknown scalars
           Example        Deciding membership in spans
 Theorem SLEMM
           Synopsis       System of equations represented by matrix-vector product
           Example        Solution to LS(A, b) is A−1 b when A is nonsingular
 Theorem CSCS
           Synopsis       Column space vectors create consistent systems
           Example        Deciding membership in column spaces



Subsection CSSOC
Column Space Spanned by Original Columns

So we have a foolproof, automated procedure for determining membership in C(A).
While this works just fine a vector at a time, we would like to have a more useful
description of the set C(A) as a whole. The next example will preview the first of
two fundamental results about the column space of a matrix.

Example CSTW Column space, two            ways
Consider the 5 × 7 matrix A,
                                                            
                        2    4  1         −1      1    4   4
                     1      2  1         0      2     4    7
                                                            
                     0      0  1         4      1     8    7
                     1      2 −1          2      1   9    6
                       −2 −4 1             3     −1   −2   −2
170                            Ro b e rt B e e z e r                              §C R S

      According to the definition (Definition CSM),    the column space of A is
                                                   
                       2        4       1      −1        1      4       4 
                 *
                                                                          +
                    1   2   1   0              2   4   7   
                                                    
         C(A) =       0 ,  0 ,  1 ,  4 ,        1 ,  8 ,  7 
                   
                       2  −1  2                1   9   6   
                    1
                                                                          
                                                                           
                      −2       −4       1       3       −1      −2     −2
   While this is a concise description of an infinite set, we might be able to describe
the span with fewer than seven vectors. This is the substance of Theorem BS. So we
take these seven vectors and make them the columns of a matrix, which is simply
the original matrix A again. Now we row-reduce,
                                                                                
     2    4    1 −1 1           4     4              1 2 0         0    0    3 1
  1      2    1     0     2    4     7          0 0 1           0    0 −1 0
                                        RREF   
                                                                                   
  0      0    1     4     1    8     7  −−−−→  0 0 0            1    0    2 1  
  1      2 −1 2           1    9     6                                          
                                                     0 0 0         0    1    1 3
    −2 −4 1          3 −1 −2 −2                      0 0 0         0    0    0 0
      The pivot columns are D = {1, 3, 4, 5}, so we can create the set
                                   
                            
                               2       1       −1      1 
                             1   1   0   2 
                                                           
                                                            
                                     
                       T =  0 ,  1 ,  4 ,  1 
                            
                               −1  2   1          
                             1
                                                           
                                                            
                               −2       1        3     −1
and know that C(A) = hT i and T is a linearly independent set of columns from the
set of columns of A.                                                           4
    We will now formalize the previous example, which will make it trivial to determine
a linearly independent set of vectors that will span the column space of a matrix,
and is constituted of just columns of A.
Theorem BCS Basis of the Column Space
Suppose that A is an m × n matrix with columns A1 , A2 , A3 , . . . , An , and B is
a row-equivalent matrix in reduced row-echelon form with r pivot columns. Let
D = {d1 , d2 , d3 , . . . , dr } be the set of indices for the pivot columns of B Let T =
{Ad1 , Ad2 , Ad3 , . . . , Adr }. Then

   1. T is a linearly independent set.
   2. C(A) = hT i.

Proof. Definition CSM describes the column space as the span of the set of columns
of A. Theorem BS tells us that we can reduce the set of vectors used in a span. If
we apply Theorem BS to C(A), we would collect the columns of A into a matrix
(which would just be A again) and bring the matrix to reduced row-echelon form,
which is the matrix B in the statement of the theorem. In this case, the conclusions
of Theorem BS applied to A, B and C(A) are exactly the conclusions we desire. 
    This is a nice result since it gives us a handful of vectors that describe the entire
column space (through the span), and we believe this set is as small as possible
because we cannot create any more relations of linear dependence to trim it down
further. Furthermore, we defined the column space (Definition CSM) as all linear
combinations of the columns of the matrix, and the elements of the set T are still
columns of the matrix (we will not be so lucky in the next two constructions of the
column space).
    Procedurally this theorem is extremely easy to apply. Row-reduce the original
matrix, identify r pivot columns the reduced matrix, and grab the columns of the
original matrix with the same indices as the pivot columns. But it is still important
to study the proof of Theorem BS and its motivation in Example COV which lie at
the root of this theorem. We will trot through an example all the same.
Example CSOCD Column space, original columns, Archetype D
Let us determine a compact expression for the entire column space of the coefficient
§C R S              A First Course in Linear Algebra                              171

matrix of the system of equations that is Archetype D. Notice that in Example
CSMCS we were only determining if individual vectors were in the column space or
not, now we are describing the entire column space.
   To start with the application of Theorem BCS, call the coefficient matrix A and
row-reduce it to reduced row-echelon form B,
                "                  #                                 
                   2 1 7 −7                           1   0 3 −2
           A = −3 4 −5 −6                     B=0        1 1 −3
                   1 1 4 −5                           0   0 0 0
   Since columns 1 and 2 are pivot columns, D = {1, 2}. To construct a set that
spans C(A), just grab the columns of A with indices in D, so
                                  *(" # " #)+
                                        2     1
                           C(A) =      −3 , 4         .
                                        1     1
That’s it.
   In Example CSMCS we determined that the vector
                                    " #
                                     2
                                c= 3
                                     2
was not in the column space of A. Try to write c as a linear combination of the first
two columns of A. What happens?
   Also in Example CSMCS we determined that the vector
                                        "    #
                                          8
                                   b = −12
                                          4
was in the column space of A. Try to write b as a linear combination of the first
two columns of A. What happens? Did you find a unique solution to this question?
Hmmmm.                                                                         4

Subsection CSNM
Column Space of a Nonsingular Matrix
Let us specialize to square matrices and contrast the column spaces of the coefficient
matrices in Archetype A and Archetype B.
Example CSAA Column space of Archetype A
The coefficient matrix in Archetype A is A, which row-reduces to B,

                    "           #                                  
                  1      −1   2                    1         0    1
               A= 2       1   1                 B=0         1   −1
                  1       1   0                    0         0    0
   Columns 1 and 2 are pivot columns, so by Theorem BCS we can write
                                        *(" # " #)+
                                             1    −1
                  C(A) = h{A1 , A2 }i =      2 , 1      .
                                             1    1
     We want to show in this example that C(A) 6= C3 . So take, for example, the
            " #
              1
vector b = 3 . Then there is no solution to the system LS(A, b), or equivalently,
              2
it is not possible to write b as a linear combination of A1 and A2 . Try one of these
two computations yourself. (Or try both!). Since b 6∈ C(A), the column space of A
cannot be all of C3 . So by varying the vector of constants, it is possible to create
inconsistent systems of equations with this coefficient matrix (the vector b being
one such example).
     In Example MWIAA we wished to show that the coefficient matrix from Arche-
type A was not invertible as a first example of a matrix without an inverse. Our
device there was to find an inconsistent linear system with A as the coefficient
172                                 Ro b e rt B e e z e r                       §C R S

matrix. The vector of constants in that example was b, deliberately chosen outside
the column space of A.                                                          4
Example CSAB Column space of Archetype B
The coefficient matrix in Archetype B, call it B here, is known to be nonsingular
(see Example NM). By Theorem NMUS, the linear system LS(B, b) has a (unique)
solution for every choice of b. Theorem CSCS then says that b ∈ C(B) for all b ∈ C3 .
Stated differently, there is no way to build an inconsistent system with the coefficient
matrix B, but then we knew that already from Theorem NMUS.                            4
   Example CSAA and Example CSAB together motivate the following equivalence,
which says that nonsingular matrices have column spaces that are as big as possible.
Theorem CSNM Column Space of a Nonsingular Matrix
Suppose A is a square matrix of size n. Then A is nonsingular if and only if
C(A) = Cn .

Proof. (⇒) Suppose A is nonsingular. We wish to establish the set equality C(A) =
Cn . By Definition CSM, C(A) ⊆ Cn . To show that Cn ⊆ C(A) choose b ∈ Cn . By
Theorem NMUS, we know the linear system LS(A, b) has a (unique) solution and
therefore is consistent. Theorem CSCS then says that b ∈ C(A). So by Definition
SE, C(A) = Cn .
    (⇐) If ei is column i of the n × n identity matrix (Definition SUV) and by
hypothesis C(A) = Cn , then ei ∈ C(A) for 1 ≤ i ≤ n. By Theorem CSCS, the system
LS(A, ei ) is consistent for 1 ≤ i ≤ n. Let bi denote any one particular solution to
LS(A, ei ), 1 ≤ i ≤ n.
    Define the n × n matrix B = [b1 |b2 |b3 | . . . |bn ]. Then
             AB = A [b1 |b2 |b3 | . . . |bn ]
                  = [Ab1 |Ab2 |Ab3 | . . . |Abn ]           Definition MM
                  = [e1 |e2 |e3 | . . . |en ]
                  = In                                      Definition SUV


   So the matrix B is a “right-inverse” for A. By Theorem NMRRI, In is a non-
singular matrix, so by Theorem NPNT both A and B are nonsingular. Thus, in
particular, A is nonsingular. (Travis Osborne contributed to this proof.)  

  With this equivalence for nonsingular matrices we can update our list, Theorem
NME3.
Theorem NME4 Nonsingular Matrix Equivalences, Round 4
Suppose that A is a square matrix of size n. The following are equivalent.

   1. A is nonsingular.

   2. A row-reduces to the identity matrix.

   3. The null space of A contains only the zero vector, N (A) = {0}.

   4. The linear system LS(A, b) has a unique solution for every possible choice of
      b.

   5. The columns of A are a linearly independent set.

   6. A is invertible.

   7. The column space of A is Cn , C(A) = Cn .

Proof. Since Theorem CSNM is an equivalence, we can add it to the list in Theorem
NME3.                                                                           
§C R S              A First Course in Linear Algebra                                173

Subsection RSM
Row Space of a Matrix

The rows of a matrix can be viewed as vectors, since they are just lists of numbers,
arranged horizontally. So we will transpose a matrix, turning rows into columns, so
we can then manipulate rows as column vectors. As a result we will be able to make
some new connections between row operations and solutions to systems of equations.
OK, here is the second primary definition of this section.

Definition RSM Row Space of a Matrix
Suppose A is an m × n matrix. Then the row space of A, R(A), is the column
space of At , i.e. R(A) = C(At ).                                        

    Informally, the row space is the set of all linear combinations of the rows of
A. However, we write the rows as column vectors, thus the necessity of using the
transpose to make the rows into columns. Additionally, with the row space defined
in terms of the column space, all of the previous results of this section can be applied
to row spaces.
   Notice that if A is a rectangular m×n matrix, then C(A) ⊆ Cm , while R(A) ⊆ Cn
and the two sets are not comparable since they do not even hold objects of the same
type. However, when A is square of size n, both C(A) and R(A) are subsets of Cn ,
though usually the sets will not be equal (but see Exercise CRS.M20).

Example RSAI Row space of Archetype I
The coefficient matrix in Archetype I is
                                                             
                           1    4    0 −1 0              7 −9
                         2     8 −1 3     9           −13 7 
                    I=
                           0    0    2 −3 −4            12 −8
                          −1 −4 2        4 8           −31 37

   To build the row space, we transpose the matrix,
                                                   
                                 1     2     0   −1
                               4      8     0   −4 
                                                   
                               0     −1     2    2 
                           t                       
                          I = −1      3    −3    4 
                               0      9    −4    8 
                                                   
                                7 −13 12 −31
                                −9     7    −8 37

   Then the columns of this matrix are used in a span to build the row space,
                                                      
                             
                                 1       2        0      −1 
                              4   8   0   −4 
                                                              
                                                               
                            *
                             
                                                      
                                                               +
                               0    −1      2     2 
                       t                               
            R(I) = C I =       −1 ,  3  , −3 ,  4           .
                             
                               0   9  −4  8          
                             
                                                      
                             
                               7  −13  12  −31       
                                                               
                             
                                                              
                                                               
                                 −9       7       −8      37

   However, we can use Theorem BCS to get a slightly better description. First,
row-reduce I t ,
                                            
                             1  0   0 − 31 7
                                         12 
                          0    1   0     7 
                                         13 
                          0    0   1        
                                         7 
                          0    0   0     0 .
                          0    0   0     0 
                                            
                          0    0   0     0 
                             0  0   0     0

   Since the pivot columns have indices D = {1, 2, 3}, the column space of I t can
174                             Ro b e rt B e e z e r                             §C R S

be spanned by just the first three columns of I t ,
                                                           
                                    
                                        1          2          0 
                                    
                                     4   8              0 
                                  *
                                                          +
                                    0   −1             2 
                                                         
                R(I) = C I t =         −1 ,  3  ,       −3     .
                                    
                                                       −4
                                    
                                      0  9                
                                                                  
                                    
                                       7    −13         12 
                                                                  
                                    
                                                                 
                                                                  
                                        −9          7        −8
                                                                                         4
   The row space would not be too interesting if it was simply the column space of
the transpose. However, when we do row operations on a matrix we have no effect
on the many linear combinations that can be formed with the rows of the matrix.
This is stated more carefully in the following theorem.
Theorem REMRS Row-Equivalent Matrices have equal Row Spaces
Suppose A and B are row-equivalent matrices. Then R(A) = R(B).

Proof. Two matrices are row-equivalent (Definition REM) if one can be obtained
from another by a sequence of (possibly many) row operations. We will prove the
theorem for two matrices that differ by a single row operation, and then this result
can be applied repeatedly to get the full statement of the theorem. The row spaces
of A and B are spans of the columns of their transposes. For each row operation
we perform on a matrix, we can define an analogous operation on the columns.
Perhaps we should call these column operations. Instead, we will still call them
row operations, but we will apply them to the columns of the transposes.
    Refer to the columns of At and B t as Ai and Bi , 1 ≤ i ≤ m. The row operation
that switches rows will just switch columns of the transposed matrices. This will
have no effect on the possible linear combinations formed by the columns.
    Suppose that B t is formed from At by multiplying column At by α =   6 0. In other
words, Bt = αAt , and Bi = Ai for all i 6= t. We need to establish that two sets
are equal, C(At ) = C(B t ). We will take a generic element of one and show that it is
contained in the other.
          β1 B1 +β2 B2 + β3 B3 + · · · + βt Bt + · · · + βm Bm
                   = β1 A1 + β2 A2 + β3 A3 + · · · + βt (αAt ) + · · · + βm Am
                   = β1 A1 + β2 A2 + β3 A3 + · · · + (αβt ) At + · · · + βm Am
says that C(B ) ⊆ C(At ). Similarly,
               t

         γ1 A1 +γ2 A2 + γ3 A3 + · · · + γt At + · · · + γm Am
                                                      γ 
                                                          t
                 = γ1 A1 + γ2 A2 + γ3 A3 + · · · +          α A t + · · · + γm A m
                                                         α
                                                      γt
                 = γ1 A1 + γ2 A2 + γ3 A3 + · · · + (αAt ) + · · · + γm Am
                                                       α
                                                      γt
                 = γ1 B1 + γ2 B2 + γ3 B3 + · · · + Bt + · · · + γm Bm
                                                      α
says that C(At ) ⊆ C(B t ). So R(A) = C(At ) = C(B t ) = R(B) when a single row
operation of the second type is performed.
   Suppose now that B t is formed from At by replacing At with αAs + At for some
α ∈ C and s 6= t. In other words, Bt = αAs + At , and Bi = Ai for i 6= t.
   β1 B1 +β2 B2 + · · · + βs Bs + · · · + βt Bt + · · · + βm Bm
           = β1 A1 + β2 A2 + · · · + βs As + · · · + βt (αAs + At ) + · · · + βm Am
           = β1 A1 + β2 A2 + · · · + βs As + · · · + (βt α) As + βt At + · · · + βm Am
           = β1 A1 + β2 A2 + · · · + βs As + (βt α) As + · · · + βt At + · · · + βm Am
           = β1 A1 + β2 A2 + · · · + (βs + βt α) As + · · · + βt At + · · · + βm Am
says that C(B t ) ⊆ C(At ). Similarly,

γ1 A1 + γ2 A2 + · · · + γs As + · · · + γt At + · · · + γm Am
§C R S               A First Course in Linear Algebra                                  175

   = γ1 A1 + γ2 A2 + · · · + γs As + · · · + (−αγt As + αγt As ) + γt At + · · · + γm Am
   = γ1 A1 + γ2 A2 + · · · + (−αγt + γs ) As + · · · + γt (αAs + At ) + · · · + γm Am
   = γ1 B1 + γ2 B2 + · · · + (−αγt + γs ) Bs + · · · + γt Bt + · · · + γm Bm
says that C(At ) ⊆ C(B t ). So R(A) = C(At ) = C(B t ) = R(B) when a single row
operation of the third type is performed.
   So the row space of a matrix is preserved by each row operation, and hence row
spaces of row-equivalent matrices are equal sets.                               

Example RSREM Row spaces of two row-equivalent matrices
In Example TREM we saw that the matrices
              "             #               "             #
                2 −1 3 4                      1 1   0   6
          A = 5 2 −2 3                   B = 3 0 −2 −9
                1 1    0 6                    2 −1 3    4
are row-equivalent by demonstrating a sequence of two row operations that converted
A into B. Applying Theorem REMRS we can say
                                         
            * 2          5      1  + * 1             3      2 +
                                                               
                −1  2  1                1  0  −1
    R(A) =       3  , −2 , 0       =    0 , −2 ,  3        = R(B)
              
                                   
                                           
                                                                 
                                                                  
                   4      3      6              6      −9      4
   Theorem REMRS is at its best when one of the row-equivalent matrices is in
reduced row-echelon form. The vectors that are zero rows can be ignored. (Who
needs the zero vector when building a span? See Exercise LI.T10.) The echelon
pattern insures that the nonzero rows yield vectors that are linearly independent.
Here is the theorem.
Theorem BRS Basis for the Row Space
Suppose that A is a matrix and B is a row-equivalent matrix in reduced row-echelon
form. Let S be the set of nonzero columns of B t . Then

  1. R(A) = hSi.

  2. S is a linearly independent set.

Proof. From Theorem REMRS we know that R(A) = R(B). If B has any zero rows,
these are columns of B t that are the zero vector. We can safely toss out the zero
vector in the span construction, since it can be recreated from the nonzero vectors
by a linear combination where all the scalars are zero. So R(A) = hSi.
    Suppose B has r nonzero rows and let D = {d1 , d2 , d3 , . . . , dr } denote the
indices of the pivot columns of B. Denote the r column vectors of B t , the vectors
in S, as B1 , B2 , B3 , . . . , Br . To show that S is linearly independent, start with a
relation of linear dependence
                        α1 B1 + α2 B2 + α3 B3 + · · · + αr Br = 0
   Now consider this vector equality in location di . Since B is in reduced row-echelon
form, the entries of column di of B are all zero, except for a leading 1 in row i. Thus,
in B t , row di is all zeros, excepting a 1 in column i. So, for 1 ≤ i ≤ r,
   0 = [0]di                                                        Definition ZCV
     = [α1 B1 + α2 B2 + α3 B3 + · · · + αr Br ]di                   Definition RLDCV
     = [α1 B1 ]di + [α2 B2 ]di + [α3 B3 ]di + · · · + [αr Br ]di    Definition MA
     = α1 [B1 ]di + α2 [B2 ]di + α3 [B3 ]di + · · · + αr [Br ]di    Definition MSM
     = α1 (0) + α2 (0) + α3 (0) + · · · + αi (1) + · · · + αr (0)   Definition RREF
     = αi
    So we conclude that αi = 0 for all 1 ≤ i ≤ r, establishing the linear independence
of S (Definition LICV).                                                             
176                              Ro b e rt B e e z e r                          §C R S

Example IAS Improving a span
Suppose in the course of analyzing a matrix (its column space, its null space, its
. . . ) we encounter the following set of vectors, described by a span
                                                     
                                  1       3       1       −3 
                             *
                                                             +
                               2 −1 −1  2           
                                                      
                       X=        1 ,  2  ,  0  ,  −3 
                               
                                  −1 −1  6          
                                6
                                                             
                                                              
                                  6       6     −2       −10
      Let A be the matrix whose rows    are the vectors in X, so by design X = R(A),
                                                           
                                 1       2    1    6      6
                              3        −1 2 −1           6 
                          A=
                                 1      −1 0 −1 −2 
                                −3       2 −3 6 −10
      Row-reduce A to form a row-equivalent matrix in reduced row-echelon form,
                                                       
                                 1    0    0   2 −1
                               0     1    0   3     1
                          B=  0
                                                        
                                      0    1 −2 5 
                                 0    0    0   0     0
      Then Theorem BRS says we can grab the nonzero columns of B t and write
                                           
                                       1
                                     *
                                      
                                                  0      0 +
                                       0  1  0      
                                              
                 X = R(A) = R(B) =        0  , 0 ,  1 
                                       2  3 −2
                                                           
                                      
                                                           
                                                            
                                          −1      1      5
   These three vectors provide a much-improved description of X. There are fewer
vectors, and the pattern of zeros and ones in the first three entries makes it easier to
determine membership in X.                                                            4
    Notice that in Example IAS all we had to do was row-reduce the right matrix and
toss out a zero row. Next to row operations themselves, Theorem BRS is probably
the most powerful computational technique at your disposal as it quickly provides a
much improved description of a span, any span (row space, column space, . . . ).
    Theorem BRS and the techniques of Example IAS will provide yet another
description of the column space of a matrix. First we state a triviality as a theorem,
so we can reference it later.
Theorem CSRST Column Space, Row Space, Transpose
Suppose A is a matrix. Then C(A) = R(At ).

Proof.
                              t 
                  C(A) = C At                            Theorem TT
                              
                       = R At                            Definition RSM
                                                                                      

   So to find another expression for the column space of a matrix, build its transpose,
row-reduce it, toss out the zero rows, and convert the nonzero rows to column vectors
to yield an improved set for the span construction. We will do Archetype I, then
you do Archetype J.
Example CSROI Column space from row operations, Archetype I
To find the column space of the coefficient matrix of Archetype I, we proceed as
follows. The matrix is
                                                          
                         1    4    0 −1 0           7    −9
                       2     8 −1 3          9 −13 7 
                    I=
                         0    0    2 −3 −4 12 −8
                        −1 −4 2           4   8 −31 37
§C R S              A First Course in Linear Algebra                                 177

The transpose is
                                                      
                                1      2   0       −1
                              4       8   0       −4 
                                                      
                              0      −1   2         2 
                                                      
                              −1      3  −3        4 .
                              0       9  −4        8 
                                                      
                              7      −13 12       −31
                               −9      7  −8        37
   Row-reduced this becomes,
                                                       
                                  1   0    0    − 31
                                                   7
                                                  12
                                 0   1    0       7 
                                                  13
                                 0   0    1         
                                                  7 
                                 0   0    0       0 .
                                 0   0    0       0 
                                                    
                                 0   0    0       0 
                                  0   0    0       0
   Now, using Theorem CSRST and Theorem BRS
                                        
                              * 1        0      0 +
                                                  
                          t       0  1 0
                C(I) = R I =      0 ,  0 ,  1   .
                               
                                                   
                                   31    12     13 
                                  −7      7      7
   This is a very nice description of the column space. Fewer vectors than the 7
involved in the definition, and the pattern of the zeros and ones in the first 3 slots
can be used to advantage. For example, Archetype I is presented as a consistent
system of equations with a vector of constants
                                           
                                           3
                                          9
                                     b =  .
                                           1
                                           4
    Since LS(I, b) is consistent, Theorem CSCS tells us that b ∈ C(I). But we could
see this quickly with the following computation, which really only involves any work
in the 4th entry of the vectors as the scalars in the linear combination are dictated
by the first three entries of b.
                                                      
                              3        1           0         0
                            9      0        1        0
                       b =   = 3 0  + 9 0  + 1 1 
                              1
                              4       − 31
                                         7
                                                  12
                                                   7
                                                             13
                                                             7
   Can you now rapidly construct several vectors, b, so that LS(I, b) is consistent,
and several more so that the system is inconsistent?                             4

Reading Questions
1. Write the column space of the matrix below as   the span of a set of three vectors and
   explain your choice of method.
                                                    
                                      1   3 1      3
                                   2     0 1      1
                                     −1 2 1        0

2. Suppose that A is an n × n nonsingular matrix. What can you say about its column
   space?
                  0
                  
                 5
3. Is the vector   in the row space of the following matrix? Why or why not?
                  2
                  3
                                                    
                                        1    3 1 3
                                     2      0 1 1
                                       −1 2 1 0
178                               Ro b e rt B e e z e r                                §C R S

Exercises
C20 For each matrix below, find a set of linearly independent vectors X so that hXi
equals the column space of the matrix, and a set of linearly independent vectors Y so that
hY i equals the row space of the matrix.
                                                                         2 1     0
                                                                                  
             1   2   3    1
                                                         
                                         1 2     1     1 1             3 0      3
           0    1   1    2 
      A=                         B = 3 2 −1 4 5                 C = 1 2 −3
                                                                                  
             1 −1 2       3                                           1 1 −1
                                         0 1     1     1 2
             1   1   2 −1
                                                                         1 1 −1
From your results for these three matrices, can you formulate a conjecture about the sets
X and Y ?
C30† Example CSOCD expresses the column space of the coefficient matrix from Arche-
type D (call the matrix A here) as the span of the first two columns of A. In Example
CSMCS we determined that the vector
                                            
                                            2
                                       c = 3
                                            2
was not in the column space of A and that the vector
                                              
                                             8
                                     b = −12
                                             4
was in the column space of A. Attempt to write c and b as linear combinations of the two
vectors in the span construction for the column space in Example CSOCD and record your
observations.
C31† For the matrix A below find a set of vectors T meeting the following requirements:
(1) the span of T is the column space of A, that is, hT i = C(A), (2) T is linearly independent,
and (3) the elements of T are columns of A.
                                      2      1     4     −1 2
                                                              
                                    1     −1      5      1   1
                               A=
                                     −1      2   −7       0   1
                                      2    −1      8     −1 2
C32 In Example CSAA, verify that the vector b is not in the column space of the
coefficient matrix.
C33† Find a linearly independent set S so that the span of S, hSi, is row space of the
matrix B, and S is linearly independent.
                                                  
                                       2  3 1    1
                                 B= 1    1 0    1
                                       −1 2 3 −4

C34† For the 3 × 4 matrix A and the column vector y ∈ C4 given below, determine if y
is in the row space of A. In other words, answer the question: y ∈ R(A)?
                                                                    2
                                                                 
                         −2     6   7 −1
                                                                  1
                  A=     7    −3   0  −3                   y= 
                                                                    3
                          8    0    7   6
                                                                   −2

C35† For the matrix A below, find two different linearly independent sets whose spans
equal the column space of A, C(A), such that

   1. the elements are each columns of A.

   2. the set is obtained by a procedure that is substantially different from the procedure
      you use in part (1).

                                                          
                                      3        5    1   −2
                                  A= 1        2    3   3
                                     −3        −4   7   13

C40 The following archetypes are systems of equations. For each system, write the vector
of constants as a linear combination of the vectors in the span construction for the column
§C R S               A First Course in Linear Algebra                                   179

space provided by Theorem BCS (these vectors are listed for each of these archetypes).

Archetype A, Archetype B, Archetype C, Archetype D, Archetype E, Archetype F, Archetype
G, Archetype H, Archetype I, Archetype J
C42 The following archetypes are either matrices or systems of equations with coefficient
matrices. For each matrix, compute a set of column vectors such that (1) the vectors are
columns of the matrix, (2) the set is linearly independent, and (3) the span of the set is
the column space of the matrix. See Theorem BCS.

Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J, Archetype K, Archetype L
C50 The following archetypes are either matrices or systems of equations with coefficient
matrices. For each matrix, compute a set of column vectors such that (1) the set is linearly
independent, and (2) the span of the set is the row space of the matrix. See Theorem BRS.

Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J, Archetype K, Archetype L
C51 The following archetypes are either matrices or systems of equations with coefficient
matrices. For each matrix, compute the column space as the span of a linearly independent
set as follows: transpose the matrix, row-reduce, toss out zero rows, convert rows into
column vectors. See Example CSROI.

Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J, Archetype K, Archetype L
C52 The following archetypes are systems of equations. For each different coefficient
matrix build two new vectors of constants. The first should lead to a consistent system
and the second should lead to an inconsistent system. Descriptions of the column space as
spans of linearly independent sets of vectors with “nice patterns” of zeros and ones might
be most useful and instructive in connection with this exercise. (See the end of Example
CSROI.)

Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J
M10† For the matrix E below, find vectors b         and c so that the system LS(E, b) is
consistent and LS(E, c) is inconsistent.
                                                         
                                        −2  1       1   0
                                E= 3      −1       0   2
                                         4 1        1   6

M20† Usually the column space and null space of a matrix contain vectors of different
sizes. For a square matrix, though, the vectors in these two sets are the same size. Usually
the two sets will be different. Construct an example of a square matrix where the column
space and null space are equal.
M21† We have a variety of theorems about how to create column spaces and row spaces
and they frequently involve row-reducing a matrix. Here is a procedure that some try to
use to get a column space. Begin with an m × n matrix A and row-reduce to a matrix B
with columns B1 , B2 , B3 , . . . , Bn . Then form the column space of A as
                         C(A) = h{B1 , B2 , B3 , . . . , Bn }i = C(B)
This is not not a legitimate procedure, and therefore is not a theorem. Construct an example
to show that the procedure will not in general create the column space of A.
T40† Suppose that A is an m × n matrix and B is an n × p matrix. Prove that the
column space of AB is a subset of the column space of A, that is C(AB) ⊆ C(A). Provide an
example where the opposite is false, in other words give an example where C(A) 6⊆ C(AB).
(Compare with Exercise MM.T40.)
T41† Suppose that A is an m × n matrix and B is an n × n nonsingular matrix. Prove
that the column space of A is equal to the column space of AB, that is C(A) = C(AB).
(Compare with Exercise MM.T41 and Exercise CRS.T40.)
T45† Suppose that A is an m × n matrix and B is an n × m matrix where AB is a
nonsingular matrix. Prove that

   1. N (B) = {0}
180                            Ro b e rt B e e z e r                 §C R S

  2. C(B) ∩ N (A) = {0}

      Discuss the case when m = n in connection with Theorem NPNT.
Section FS
Four Subsets
There are four natural subsets associated with a matrix. We have met three already:
the null space, the column space and the row space. In this section we will introduce
a fourth, the left null space. The objective of this section is to describe one procedure
that will allow us to find linearly independent sets that span each of these four sets
of column vectors. Along the way, we will make a connection with the inverse of
a matrix, so Theorem FS will tie together most all of this chapter (and the entire
course so far).



Subsection LNS
Left Null Space
Definition LNS Left Null Space
Suppose A is an m × n matrix. Then the left null space is defined as L(A) =
N (At ) ⊆ Cm .                                                            
    The left null space will not feature prominently in the sequel, but we can explain
its name and connect it to row operations. Suppose y ∈ L(A). Then by Definition
LNS, At y = 0. We can then write
                               t
                    0 t = At y                     Definition LNS
                                  t
                        = y t At                   Theorem MMT
                       = yt A                         Theorem TT
                   t
    The product y A can be viewed as the components of y acting as the scalars in
a linear combination of the rows of A. And the result is a “row vector”, 0t that is
totally zeros. When we apply a sequence of row operations to a matrix, each row of
the resulting matrix is some linear combination of the rows. These observations tell
us that the vectors in the left null space are scalars that record a sequence of row
operations that result in a row of zeros in the row-reduced version of the matrix.
We will see this idea more explicitly in the course of proving Theorem FS.
Example LNS Left null space
We will find the left null space of
                                                      
                                     1           −3   1
                                   −2            1   1
                                 A=
                                     1            5   1
                                     9           −4   0
   We transpose A and row-reduce,
                  "               #                                   
                    1 −2 1 9               1               0   0     2
                                    RREF
             At = −3 1 5 −4 −−−−→  0                      1   0    −3
                    1    1 1 0             0               0   1     1
   Applying Definition LNS and Theorem BNS we have
                                         
                                       * −2 +
                                             
                                          3
                        L(A) = N At =     −1
                                        
                                              
                                               
                                            1
    If you row-reduce A you will discover one zero row in the reduced row-echelon
form. This zero row is created by a sequence of row operations, which in total amounts
to a linear combination, with scalars a1 = −2, a2 = 3, a3 = −1 and a4 = 1, on the
rows of A and which results in the zero vector (check this!). So the components of
the vector describing the left null space of A provide a relation of linear dependence
on the rows of A.                                                                   4

                                           181
182                             Ro b e rt B e e z e r                             §F S

Subsection CCS
Computing Column Spaces
We have three ways to build the column space of a matrix. First, we can use just the
definition, Definition CSM, and express the column space as a span of the columns of
the matrix. A second approach gives us the column space as the span of some of the
columns of the matrix, and additionally, this set is linearly independent (Theorem
BCS). Finally, we can transpose the matrix, row-reduce the transpose, kick out
zero rows, and write the remaining rows as column vectors. Theorem CSRST and
Theorem BRS tell us that the resulting vectors are linearly independent and their
span is the column space of the original matrix.
    We will now demonstrate a fourth method by way of a rather complicated example.
Study this example carefully, but realize that its main purpose is to motivate a
theorem that simplifies much of the apparent complexity. So other than an instructive
exercise or two, the procedure we are about to describe will not be a usual approach
to computing a column space.
Example CSANS Column space as null space
Let us find the column space of the matrix A below      with a new approach.
                                                           
                              10     0    3    8          7
                            −16 −1 −4 −10              −13
                                                           
                             −6     1 −3 −6             −6 
                       A=
                              0     2 −2 −3             −2 
                                                            
                             3      0    1    2          3 
                              −1 −1 1          1          0
    By Theorem CSCS we know that the column vector b is in the column space
of A if and only if the linear system LS(A, b) is consistent. So let us try to solve
this system in full generality, using a vector of variables for the vector of constants.
In other words, which vectors b lead to consistent systems? Begin by forming the
augmented matrix [ A | b] with a general version of b,
                                                                  
                                  10    0    3      8     7     b1
                                −16 −1 −4 −10 −13 b2 
                                                                  
                                 −6    1 −3 −6          −6 b3 
                     [ A | b] = 
                                 0     2 −2 −3          −2 b4    
                                 3     0    1      2     3     b5 
                                  −1 −1 1           1     0     b6
   To identify solutions we will bring this matrix to reduced row-echelon form.
Despite the presence of variables in the last column, there is nothing to stop us
from doing this, except numerical computational routines cannot be used, and even
some of the symbolic algebra routines do some unexpected maneuvers with this
computation. So do it by hand. Yes, it is a bit of work. But worth it. We’ll still be
here when you get back. Notice along the way that the row operations are exactly
the same ones you would do if you were just row-reducing the coefficient matrix
alone, say in connection with a homogeneous system of equations. The column with
the bi acts as a sort of bookkeeping device. There are many different possibilities for
the result, depending on what order you choose to perform the row operations, but
shortly we will all be on the same page. If you want to match our work right now,
use row 5 to remove any occurrence of b1 from the other entries of the last column,
and use row 6 to remove any occurrence of b2 from the last columns. We have:
                                                                    
                   1     0   0    0    2        b3 − b4 + 2b5 − b6
                 0      1   0    0 −3 −2b3 + 3b4 − 3b5 + 3b6 
                                                                    
                 0      0   1    0    1       b3 + b4 + 3b5 + 3b6 
                                                                    
                                                                    
                 0      0   0    1 −2           −2b3 + b4 − 4b5     
                 0      0   0    0    0 b1 + 3b3 − b4 + 3b5 + b6 
                   0     0   0    0    0     b2 − 2b3 + b4 + b5 − b6
   Our goal is to identify those vectors b which make LS(A, b) consistent. By
Theorem RCLS we know that the consistent systems are precisely those without a
pivot column in the last column. Are the expressions in the last column of rows 5
§F S               A First Course in Linear Algebra                                183

and 6 equal to zero, or are they leading 1’s? The answer is: maybe. It depends on b.
With a nonzero value for either of these expressions, we would scale the row and
produce a leading 1. So we get a consistent system, and b is in the column space,
if and only if these two expressions are both simultaneously zero. In other words,
members of the column space of A are exactly those vectors b that satisfy
                             b1 + 3b3 − b4 + 3b5 + b6 = 0
                              b2 − 2b3 + b4 + b5 − b6 = 0

    Hmmm. Looks suspiciously like a homogeneous system of two equations with
six variables. If you have been playing along (and we hope you have) then you may
have a slightly different system, but you should have just two equations. Form the
coefficient matrix and row-reduce (notice that the system above has a coefficient
matrix that is already in reduced row-echelon form). We should all be together now
with the same matrix,
                                                         
                                 1   0     3 −1 3 1
                          L=
                                 0   1 −2 1 1 −1

    So, C(A) = N (L) and we can apply Theorem BNS to obtain a linearly independent
set to use in a span construction,
                                         
                                  
                                    −3      1     −3     −1 
                                * 2  −1 −1  1        +
                                                                
                                  
                                                      
                                    1 0 0 0
                C(A) = N (L) =       0 ,  1 ,  0 ,  0 
                                  
                                                      
                                  
                                   0   0   1   0      
                                  
                                                               
                                                                
                                      0      0      0      1

   Whew! As a postscript to this central example, you may wish to convince yourself
that the four vectors above really are elements of the column space. Do they create
consistent systems with A as coefficient matrix? Can you recognize the constant
vector in your description of these solution sets?
    OK, that was so much fun, let us do it again. But simpler this time. And we will
all get the same results all the way through. Doing row operations by hand with
variables can be a bit error prone, so let us see if we can improve the process some.
Rather than row-reduce a column vector b full of variables, let us write b = I6 b
and we will row-reduce the matrix I6 and when we finish row-reducing, then we will
compute the matrix-vector product. You should first convince yourself that we can
operate like this (this is the subject of a future homework exercise).
    Rather than augmenting A    with b, we will instead augment it with I6 (does this
feel familiar?),
                                                                         
                    10    0       3    8      7  1 0        0   0   0   0
                  −16 −1        −4   −10    −13 0 1        0   0   0   0
                                                                         
                   −6    1      −3   −6     −6 0 0         1   0   0   0
              M =
                   0     2      −2   −3     −2 0 0         0   1   0   0
                   3     0       1    2      3  0 0        0   0   1   0
                    −1 −1         1    1      0  0 0        0   0   0   1

    We want to row-reduce the left-hand side of this matrix, but we will apply the
same row operations to the right-hand side as well. And once we get the left-hand
side in reduced row-echelon form, we will continue on to put leading 1’s in the final
two rows, as well as making pivot columns that contain these two additional leading
1’s. It is these additional row operations that will ensure that we all get to the same
place, since the reduced row-echelon form is unique (Theorem RREFU),
                                                                     
                        1 0 0 0 2 0 0 1 −1 2 −1
                       0 1 0 0 −3 0 0 −2 3 −3 3 
                                                                     
                       0 0 1 0 1 0 0 1                  1    3     3
                  N =                                                
                       0 0 0 1 −2 0 0 −2 1 −4 0 
                       0 0 0 0 0 1 0 3 −1 3                        1
                        0 0 0 0 0 0 1 −2 1                    1 −1
184                              Ro b e rt B e e z e r                             §F S

      We are after the final six columns of this matrix,   which we will multiply by b
                                                            
                                   0 0 1 −1 2              −1
                                 0 0 −2 3 −3               3
                                                            
                                 0 0 1        1    3       3
                            J =
                                 0 0 −2 1 −4               0
                                                             
                                 1 0 3 −1 3                1
                                   0 1 −2 1         1      −1
so
                                                                        
               0 0      1 −1       2   −1 b1            b3 − b4 + 2b5 − b6
              0 0     −2 3       −3    3  b2   −2b3 + 3b4 − 3b5 + 3b6 
                                                                        
              0 0      1  1      3    3  b3   b3 + b4 + 3b5 + 3b6 
         Jb =                                 =                         
              0 0     −2 1       −4    0  b4        −2b3 + b4 − 4b5     
              1 0      3 −1       3    1  b5   b1 + 3b3 − b4 + 3b5 + b6 
               0 1     −2 1        1   −1 b6         b2 − 2b3 + b4 + b5 − b6

    So by applying the same row operations that row-reduce A to the identity matrix
(which we could do computationally once I6 is placed alongside of A), we can then
arrive at the result of row-reducing a column of symbols where the vector of constants
usually resides. Since the row-reduced version of A has two zero rows, for a consistent
system we require that
                              b1 + 3b3 − b4 + 3b5 + b6 = 0
                               b2 − 2b3 + b4 + b5 − b6 = 0

   Now we are exactly back where we were on the first go-round. Notice that we
obtain the matrix L as simply the last two rows and last six columns of N . 4

    This example motivates the remainder of this section, so it is worth careful study.
You might attempt to mimic the second approach with the coefficient matrices of
Archetype I and Archetype J. We will see shortly that the matrix L contains more
information about A than just the column space.



Subsection EEF
Extended Echelon Form

The final matrix that we row-reduced in Example CSANS should look familiar in
most respects to the procedure we used to compute the inverse of a nonsingular
matrix, Theorem CINM. We will now generalize that procedure to matrices that are
not necessarily nonsingular, or even square. First a definition.

Definition EEF Extended Echelon Form
Suppose A is an m × n matrix. Extend A on its right side with the addition of an
m×m identity matrix to form an m×(n+m) matrix M . Use row operations to bring
M to reduced row-echelon form and call the result N . N is the extended reduced
row-echelon form of A, and we will standardize on names for five submatrices (B,
C, J, K, L) of N .
    Let B denote the m × n matrix formed from the first n columns of N and let J
denote the m × m matrix formed from the last m columns of N . Suppose that B has
r nonzero rows. Further partition N by letting C denote the r × n matrix formed
from all of the nonzero rows of B. Let K be the r × m matrix formed from the first
r rows of J, while L will be the (m − r) × m matrix formed from the bottom m − r
rows of J. Pictorially,
                                                             
                                  RREF                 C K
                    M = [A|Im ] −−−−→ N = [B|J] =
                                                       0 L

                                                                                     

Example SEEF Submatrices of extended echelon form
§F S              A First Course in Linear Algebra                               185

We illustrate Definition EEF with the matrix A,
                                                           
                             1 −1 −2        7   1         6
                           −6 2 −4 −18 −3               −26
                       A=
                             4 −1 4        10   2        17 
                             3 −1 2         9   1        12
   Augmenting with the     4 × 4 identity matrix,
                                                               
                    1       −1    −2    7      1     6  1 0 0 0
                 −6         2    −4   −18    −3    −26 0 1 0 0
            M =
                    4       −1     4    10    2     17 0 0 1 0
                    3       −1     2     9    1     12 0 0 0 1
and row-reducing, we obtain
                                                               
                     1    0       2 1     0    3     0   1 1 1
                   0     1       4 −6    0   −1     0    2 3 0
             N =  0
                                                                
                          0       0 0     1    2     0   −1 0 −2
                     0    0       0 0     0    0     1   2 2 1
   So we then obtain
                                                           
                                 1  0 2 1           0     3
                              0    1 4 −6          0    −1
                          B= 0
                                                            
                                    0 0 0           1    2
                                 0  0 0 0           0     0
                                                           
                                 1  0 2 1           0     3
                          C=0      1 4 −6          0    −1
                                 0  0 0 0           1    2
                                           
                                 0  1 1 1
                              0    2 3 0
                          J =  0 −1 0 −2
                                 1  2 2 1
                              "           #
                                0 1 1 1
                          K= 0 2 3 0
                                0 −1 0 −2
                                       
                          L= 1 2 2 1
   You can observe (or verify) the properties of the following theorem with this
example.                                                                      4
Theorem PEEF Properties of Extended Echelon Form
Suppose that A is an m × n matrix and that N is its extended echelon form. Then

   1. J is nonsingular.

   2. B = JA.

   3. If x ∈ Cn and y ∈ Cm , then Ax = y if and only if Bx = Jy.

   4. C is in reduced row-echelon form, has no zero rows and has r pivot columns.

   5. L is in reduced row-echelon form, has no zero rows and has m−r pivot columns.

Proof. J is the result of applying a sequence of row operations to Im , and therefore
J and Im are row-equivalent. LS(Im , 0) has only the zero solution, since Im is
nonsingular (Theorem NMRRI). Thus, LS(J, 0) also has only the zero solution
(Theorem REMES, Definition ESYS) and J is therefore nonsingular (Definition
NSM).
   To prove the second part of this conclusion, first convince yourself that row
operations and the matrix-vector product are associative operations. By this we
mean the following. Suppose that F is an m × n matrix that is row-equivalent to
the matrix G. Apply to the column vector F w the same sequence of row operations
that converts F to G. Then the result is Gw. So we can do row operations on the
186                            Ro b e rt B e e z e r                             §F S

matrix, then do a matrix-vector product, or do a matrix-vector product and then do
row operations on a column vector, and the result will be the same either way. Since
matrix multiplication is defined by a collection of matrix-vector products (Definition
MM), the matrix product F H will become GH if we apply the same sequence of
row operations to F H that convert F to G. (This argument can be made more
rigorous using elementary matrices from the upcoming Subsection DM.EM and the
associative property of matrix multiplication established in Theorem MMA.) Now
apply these observations to A.
    Write AIn = Im A and apply the row operations that convert M to N . A is
converted to B, while Im is converted to J, so we have BIn = JA. Simplifying the
left side gives the desired conclusion.
    For the third conclusion, we now establish the two equivalences
       Ax = y           ⇐⇒          JAx = Jy            ⇐⇒          Bx = Jy
    The forward direction of the first equivalence is accomplished by multiplying
both sides of the matrix equality by J, while the backward direction is accomplished
by multiplying by the inverse of J (which we know exists by Theorem NI since J is
nonsingular). The second equivalence is obtained simply by the substitutions given
by JA = B.
    The first r rows of N are in reduced row-echelon form, since any contiguous
collection of rows taken from a matrix in reduced row-echelon form will form a
matrix that is again in reduced row-echelon form (Exercise RREF.T12). Since the
matrix C is formed by removing the last n entries of each these rows, the remainder
is still in reduced row-echelon form. By its construction, C has no zero rows. C has
r rows and each contains a leading 1, so there are r pivot columns in C.
    The final m − r rows of N are in reduced row-echelon form, since any contiguous
collection of rows taken from a matrix in reduced row-echelon form will form a
matrix that is again in reduced row-echelon form. Since the matrix L is formed by
removing the first n entries of each these rows, and these entries are all zero (they
form the zero rows of B), the remainder is still in reduced row-echelon form. L is the
final m − r rows of the nonsingular matrix J, so none of these rows can be totally
zero, or J would not row-reduce to the identity matrix. L has m − r rows and each
contains a leading 1, so there are m − r pivot columns in L.                        
    Notice that in the case where A is a nonsingular matrix we know that the reduced
row-echelon form of A is the identity matrix (Theorem NMRRI), so B = In . Then
the second conclusion above says JA = B = In , so J is the inverse of A. Thus this
theorem generalizes Theorem CINM, though the result is a “left-inverse” of A rather
than a “right-inverse.”
    The third conclusion of Theorem PEEF is the most telling. It says that x is a
solution to the linear system LS(A, y) if and only if x is a solution to the linear
system LS(B, Jy). Or said differently, if we row-reduce the augmented matrix [ A | y]
we will get the augmented matrix [ B | Jy]. The matrix J tracks the cumulative effect
of the row operations that converts A to reduced row-echelon form, here effectively
applying them to the vector of constants in a system of equations having A as a
coefficient matrix. When A row-reduces to a matrix with zero rows, then Jy should
also have zero entries in the same rows if the system is to be consistent.

Subsection FS
Four Subsets
With all the preliminaries in place we can state our main result for this section. In
essence this result will allow us to say that we can find linearly independent sets to
use in span constructions for all four subsets (null space, column space, row space,
left null space) by analyzing only the extended echelon form of the matrix, and
specifically, just the two submatrices C and L, which will be ripe for analysis since
they are already in reduced row-echelon form (Theorem PEEF).
Theorem FS Four Subsets
Suppose A is an m × n matrix with extended echelon form N . Suppose the reduced
§F S               A First Course in Linear Algebra                            187

row-echelon form of A has r nonzero rows. Then C is the submatrix of N formed
from the first r rows and the first n columns and L is the submatrix of N formed
from the last m columns and the last m − r rows. Then

  1. The null space of A is the null space of C, N (A) = N (C).
  2. The row space of A is the row space of C, R(A) = R(C).
  3. The column space of A is the null space of L, C(A) = N (L).
  4. The left null space of A is the row space of L, L(A) = R(L).

Proof. First, N (A) = N (B) since B is row-equivalent to A (Theorem REMES). The
zero rows of B represent equations that are always true in the homogeneous system
LS(B, 0), so the removal of these equations will not change the solution set. Thus,
in turn, N (B) = N (C).
    Second, R(A) = R(B) since B is row-equivalent to A (Theorem REMRS). The
zero rows of B contribute nothing to the span that is the row space of B, so the
removal of these rows will not change the row space. Thus, in turn, R(B) = R(C).
    Third, we prove the set equality C(A) = N (L) with Definition SE. Begin by
showing that C(A) ⊆ N (L). Choose y ∈ C(A) ⊆ Cm . Then there exists a vector
x ∈ Cn such that Ax = y (Theorem CSCS). Then for 1 ≤ k ≤ m − r,
            [Ly]k = [Jy]r+k             L a submatrix of J
                  = [Bx]r+k             Theorem PEEF
                  = [Ox]k               Zero matrix a submatrix of B
                  = [0]k                Theorem MMZM
   So, for all 1 ≤ k ≤ m − r, [Ly]k = [0]k . So by Definition CVE we have Ly = 0
and thus y ∈ N (L).
   Now, show that N (L) ⊆ C(A). Choose y ∈ N (L) ⊆ Cm . Form the vector
Ky ∈ Cr . The linear system LS(C, Ky) is consistent since C is in reduced row-
echelon form and has no zero rows (Theorem PEEF). Let x ∈ Cn denote a solution
to LS(C, Ky).
   Then for 1 ≤ j ≤ r,
             [Bx]j = [Cx]j               C a submatrix of B
                    = [Ky]j              x a solution to LS(C, Ky)
                    = [Jy]j              K a submatrix of J


   And for r + 1 ≤ k ≤ m,
            [Bx]k = [Ox]k−r             Zero matrix a submatrix of B
                   = [0]k−r             Theorem MMZM
                   = [Ly]k−r            y in N (L)
                   = [Jy]k              L a submatrix of J


   So for all 1 ≤ i ≤ m, [Bx]i = [Jy]i and by Definition CVE we have Bx = Jy.
From Theorem PEEF we know then that Ax = y, and therefore y ∈ C(A) (Theorem
CSCS). By Definition SE we now have C(A) = N (L).
   Fourth, we prove the set equality L(A) = R(L) with Definition SE. Begin by
showing that R(L) ⊆ L(A). Choose y ∈ R(L) ⊆ Cm . Then there exists a vector
w ∈ Cm−r such that y = Lt w (Definition RSM, Theorem CSCS). Then for 1 ≤ i ≤ n,
              m
              X
        t      t
        Ay i=    A ik [y]k                              Theorem EMP
                  k=1
                  Xm
                         t  t 
              =          A ik L w k                     Definition of w
                  k=1
188                                                    Ro b e rt B e e z e r                                       §F S

                      m
                      X  t  m−r
                              X 
                  =      A ik     Lt k` [w]`                                              Theorem EMP
                      k=1                          `=1
                      Xm m−r
                          X                         t
                  =                           At    ik
                                                       L k` [w]`                          Property DCN
                      k=1 `=1
                      m−r
                      XX   m
                                                    t
                  =                           At    ik
                                                       L k` [w]`                          Property CACN
                      `=1 k=1
                      m−r   m
                                                                      !
                      X X                       t  t
                  =                             A ik L k`                 [w]`            Property DCN
                      `=1            k=1
                      m−r             m
                                                                              !
                      X              X          t  t
                  =                             A ik J k,r+`                      [w]`    L a submatrix of J
                      `=1            k=1
                      m−r
                      X           t t
                  =               A J i,r+` [w]`                                          Theorem EMP
                      `=1
                      Xh
                      m−r
                                               t
                                                   i
                  =                  (JA)                   [w]`                          Theorem MMT
                                                    i,r+`
                      `=1
                      m−r
                      X           t
                  =               B i,r+` [w]`                                            Theorem PEEF
                      `=1
                      m−r
                      X
                  =              0 [w]`                                                   Zero rows in B
                        `=1
                  =0                                                                      Property ZCN
                  = [0]i                                                                  Definition ZCV
   Since [At y]i = [0]i for 1 ≤ i ≤ n, Definition CVE implies that At y = 0. This
means that y ∈ N (At ).
   Now, show that L(A) ⊆ R(L). Choose y ∈ L(A) ⊆ Cm . The matrix J is
nonsingular (Theorem PEEF), so J t is also nonsingular (Theorem MIT) and therefore
the linear system LS(J t , y) has a unique solution. Denote this solution as x ∈ Cm .
We will need to work with two “halves” of x, which we will denote as z and w with
formal definitions given by
        [z]j = [x]i              1 ≤ j ≤ r,                                   [w]k = [x]r+k     1≤k ≤m−r
      Now, for 1 ≤ j ≤ r,
                Xr
        t          t
        C z j=       C jk [z]k                                                            Theorem EMP
                  k=1
                  Xr                 m−r
                                     X
                         t
              =          C jk [z]k +     [O]j` [w]`                                       Definition ZM
                  k=1                                    `=1
                  Xr                                     m−r
                                                         X
                         t                                    t
              =          B jk [z]k +                            B j,r+` [w]`              C, O submatrices of B
                  k=1                                    `=1
                  Xr                                     m−r
                                                         X
                         t                                    t
              =          B jk [x]k +                            B j,r+` [x]r+`            Definitions of z and w
                  k=1                                    `=1
                  Xr                                      Xm
                                                                       
              =             Bt    jk
                                          [x]k +                     Bt   jk
                                                                                  [x]k    Re-index second sum
                  k=1                                    k=r+1
                  Xm
                                
              =             Bt    jk
                                          [x]k                                            Combine sums
                  k=1
                  Xm h                    i
                                      t
              =             (JA)                   [x]k                                   Theorem PEEF
                                              jk
                  k=1
§F S                  A First Course in Linear Algebra                                                 189

                m
                X  t t
            =      A J jk [x]k                                                 Theorem MMT
                k=1
                Xm Xm
                            t  t
            =               A j` J `k [x]k                                     Theorem EMP
                k=1 `=1
                Xm X m
                                     t
            =                  At    j`
                                        J `k [x]k                              Property CACN
                `=1 k=1
                 m                   m
                                                                     !
                X    t             X             
                                                t
            =        A j`                   J       `k
                                                            [x]k               Property DCN
                `=1                  k=1
                Xm
                     t         t 
            =        A j`        J x `                                         Theorem EMP
                `=1
                Xm
                     t
            =        A j`       [y]`                                           Definition of x
                `=1
                 t 
            =    Ay j                                                          Theorem EMP
            = [0]j                                                             y ∈ L(A)
    So, by Definition CVE, C t z = 0 and the vector z gives us a linear combination
of the columns of C t that equals the zero vector. In other words, z gives a relation
of linear dependence on the the rows of C. However, the rows of C are a linearly
independent set by Theorem BRS. According to Definition LICV we must conclude
that the entries of z are all zero, i.e. z = 0.
    Now, for 1 ≤ i ≤ m, we have
                  
       [y]i = J t x i                                   Definition of x
               m
               X  t
           =      J ik [x]k                                                    Theorem EMP
               k=1
               Xr                 m
                                  X
                      t            t
           =          J ik [x]k +    J ik [x]k                                 Break apart sum
               k=1                         k=r+1
               Xr                           Xm
                      t                                       
           =          J ik [z]k +                           Jt   ik
                                                                      [w]k−r   Definition of z and w
               k=1                      k=r+1
               r
               X           m−r
                           X 
                  t
           =      J ik 0 +     J t i,r+` [w]`                                  z = 0, re-index
               k=1                    `=1
                     m−r
                     X      t
           =0+              L i,` [w]`                                         L a submatrix of J
                     `=1
                
           = Lt w i                                                            Theorem EMP


   So by Definition CVE, y = Lt w. The existence of w implies that y ∈ R(L), and
therefore L(A) ⊆ R(L). So by Definition SE we have L(A) = R(L).                

    The first two conclusions of this theorem are nearly trivial. But they set up a
pattern of results for C that is reflected in the latter two conclusions about L. In
total, they tell us that we can compute all four subsets just by finding null spaces and
row spaces. This theorem does not tell us exactly how to compute these subsets, but
instead simply expresses them as null spaces and row spaces of matrices in reduced
row-echelon form without any zero rows (C and L). A linearly independent set that
spans the null space of a matrix in reduced row-echelon form can be found easily
with Theorem BNS. It is an even easier matter to find a linearly independent set
that spans the row space of a matrix in reduced row-echelon form with Theorem
BRS, especially when there are no zero rows present. So an application of Theorem
FS is typically followed by two applications each of Theorem BNS and Theorem
190                             Ro b e rt B e e z e r                               §F S

BRS.
    The situation when r = m deserves comment, since now the matrix L has no
rows. What is C(A) when we try to apply Theorem FS and encounter N (L)? One
interpretation of this situation is that L is the coefficient matrix of a homogeneous
system that has no equations. How hard is it to find a solution vector to this system?
Some thought will convince you that any proposed vector will qualify as a solution,
since it makes all of the equations true. So every possible vector is in the null space
of L and therefore C(A) = N (L) = Cm . OK, perhaps this sounds like some twisted
argument from Alice in Wonderland. Let us try another argument that might solidly
convince you of this logic.
    If r = m, when we row-reduce the augmented matrix of LS(A, b) the result
will have no zero rows, and the first n columns will all be pivot columns, leaving
none for the final column, so by Theorem RCLS the system will be consistent. By
Theorem CSCS, b ∈ C(A). Since b was arbitrary, every possible vector is in the
column space of A, so we again have C(A) = Cm . The situation when a matrix has
r = m is known by the term full rank, and in the case of a square matrix coincides
with nonsingularity (see Exercise FS.M50).
    The properties of the matrix L described by this theorem can be explained
informally as follows. A column vector y ∈ Cm is in the column space of A if the
linear system LS(A, y) is consistent (Theorem CSCS). By Theorem RCLS, the
reduced row-echelon form of the augmented matrix [ A | y] of a consistent system
will have zeros in the bottom m − r locations of the last column. By Theorem PEEF
this final column is the vector Jy and so should then have zeros in the final m − r
locations. But since L comprises the final m − r rows of J, this condition is expressed
by saying y ∈ N (L).
    Additionally, the rows of J are the scalars in linear combinations of the rows
of A that create the rows of B. That is, the rows of J record the net effect of the
sequence of row operations that takes A to its reduced row-echelon form, B. This
can be seen in the equation JA = B (Theorem PEEF). As such, the rows of L are
scalars for linear combinations of the rows of A that yield zero rows. But such linear
combinations are precisely the elements of the left null space. So any element of the
row space of L is also an element of the left null space of A.
    We will now illustrate Theorem FS with a few examples.
Example FS1 Four subsets, no. 1
In Example SEEF we found the five relevant submatrices of the extended echelon
form for the matrix
                                                      
                         1 −1 −2          7    1     6
                       −6 2 −4 −18 −3 −26
                    A=
                         4 −1 4          10    2    17 
                         3 −1 2           9    1    12
      To apply Theorem FS we only   need C and L,
                                          
                  1   0 2 1          0   3                                     
           C= 0      1 4 −6         0 −1              L=       1     2   2   1
                  0   0 0 0          1   2
      Then we use Theorem FS to obtain
                               
                           −2
                                     −1    −3 
                                               
                         *
                          
                           −4  6   1   +
                                       
                            1 0 0
         N (A) = N (C) =     0 ,  1 ,  0                       Theorem BNS
                          
                                
                           0   0  −2
                                              
                                               
                          
                                              
                                               
                               0      0      1
                               
                           1
                                   0     0 
                                            
                         *
                          
                           0    1   0
                                            +
                                    
                            2  4  0
         R(A) = R(C) =      1 , −6 , 0                         Theorem BRS
                               
                                           
                          
                              0  1  
                           0
                                           
                                            
                             3     −1     2
§F S              A First Course in Linear Algebra                                      191
                                           
                      * −2     −2              −1 +
                                                  
                         1 0               0
       C(A) = N (L) =     ,  1 ,           0                    Theorem BNS
                        0
                                                  
                                                   
                            0   0               1
                        
                      * 1 +
                             
                         2
       L(A) = R(L) =     2                                          Theorem BRS
                       
                             
                              
                          1
   Boom!                                                                                 4
Example FS2 Four subsets, no. 2
Now let us return to the matrix A that we used to motivate this section in Example
CSANS,
                                                       
                               10    0    3     8     7
                            −16 −1 −4 −10 −13
                                                       
                             −6     1 −3 −6 −6 
                        A=
                             0      2 −2 −3 −2        
                             3      0    1     2     3 
                               −1 −1 1          1     0
   We form the matrix M by        adjoining the 6 × 6 identity matrix I6 ,
                                                                       
                  10    0           3     8     7    1 0 0 0 0 0
                −16 −1            −4 −10 −13 0 1 0 0 0 0
                                                                       
                 −6    1          −3 −6 −6 0 0 1 0 0 0
           M =
                  0    2          −2 −3 −2 0 0 0 1 0 0                
                 3     0           1     2     3    0 0 0 0 1 0
                  −1 −1             1     1     0    0 0 0 0 0 1
and row-reduce to obtain N
                                                                              
                 1    0   0         0      2   0    0        1   −1     2    −1
              0      1   0         0     −3   0    0       −2    3    −3     3
                                                                              
                                                                              
              0      0   1         0      1   0    0        1   1     3     3
          N =                                                                 
              0      0   0         1     −2   0    0       −2    1    −4     0
                                                                              
              0      0   0         0      0   1    0        3   −1     3     1
                 0    0   0         0      0   0    1       −2    1     1    −1
   To find the four subsets   for A, we only need identify the 4 × 5 matrix C and the
2 × 6 matrix L,
                                     
             1    0    0      0     2
                                                                                   
           0     1    0      0    −3                  1    0    3     −1   3 1
      C=  0
                                              L=
                  0    1      0     1                  0    1   −2      1   1 −1
             0    0    0      1    −2
   Then we apply Theorem FS,
                      
                         −2 
                    *
                     
                      3    +
                             
                        
    N (A) = N (C) =    −1                                            Theorem BNS
                     
                          
                      2 
                             
                          1
                            
                        1       0     0      0 
                    *
                     
                                               +
                     0  1  0  0       
                              
    R(A) = R(C) =      0 ,  0  , 1 ,  0                        Theorem BRS
                     
                         0  0  1      
                      0
                                               
                                                
                        2      −3     1     −2
                            
                      −3
                                 1     −3     −1  
                    *
                     
                       2     −1   −1     1 
                                                    +
                                          
                       1 0 0 0
    C(A) = N (L) =      0 ,  1 ,  0 ,  0                       Theorem BNS
                            
                                                   
                     
                         0   1   0        
                      0
                                                   
                                                    
                         0        0     0        1
192                                 Ro b e rt B e e z e r                                 §F S
                                      
                       1
                                       0 
                     *
                      
                       0            1 
                                           +
                                     
                        3           −2
       L(A) = R(L) =    −1 ,        1                                   Theorem BRS
                      
                                    
                      
                       3            1 
                                           
                                           
                      
                                          
                                           
                          1            −1
                                                                                            4
    The next example is just a bit different since the matrix has more rows than
columns, and a trivial null space.
Example FSAG Four subsets, Archetype G
Archetype G and Archetype H are both systems of m = 5 equations in n = 2
variables. They have identical coefficient matrices, which we will denote here as the
matrix G,
                                                 
                                           2    3
                                        −1 4 
                                                 
                                  G= 3        10 
                                         3 −1
                                           6    9
      Adjoin the 5 × 5 identity matrix, I5 ,   to form
                                                        
                                  2      3      1 0 0 0 0
                                −1 4           0 1 0 0 0
                                                        
                           M = 3       10      0 0 1 0 0
                                 3 −1          0 0 0 1 0
                                  6      9      0 0 0 0 1
      This row-reduces to
                                                                       
                                                              3     1
                                1    0    0     0    0       11    33
                               0    1    0     0    0         2
                                                            − 11    1 
                                                                  11 
                                                                       
                       N =     0    0    1     0    0       0     − 13 
                                                                       
                               0    0    0     1    0       1     − 13 
                                0    0    0     0    1       1     −1
    The first n = 2 columns contain r = 2 leading 1’s, so we obtain C as the 2 × 2
identity matrix and extract L from the final m − r = 3 rows in the final m = 5
columns.
                                                                   
                                             1    0     0 0 − 13
                    1    0
              C=                        L=0        1     0 1 − 31 
                    0    1
                                               0    0     1 1 −1
      Then we apply Theorem FS,
         N (G) = N (C) = h∅i = {0}                                      Theorem BNS
                            
                              1     0
         R(G) = R(C) =           ,        = C2                          Theorem BRS
                              0     1
                              1 
                                0
                         *
                                     3  +
                           −1  13   
                                
          C(G) = N (L) =     −1 ,  1                                Theorem BNS
                           
                               0    
                            1
                                         
                                          
                                0     1
                              
                                0     1 
                         *
                                         +
                           −1 1     
                                
                       =     −1 , 3
                           
                              0     
                            1
                                         
                                          
                                0     3
                                
                                1       0      0 
                         *
                                                 +
                            0   1   0     
                              0   0   
          L(G) = R(L) =       ,  ,  1                             Theorem BRS
                           
                             0   1   1    
                           
                                                 
                                                  
                               − 13   − 31     −1
§F S               A First Course in Linear Algebra                                 193
                                             
                              3         0        0 
                         *
                          
                                                   +
                           0       3       0 
                                                    
                                            
                       =     0 ,     0 ,   1
                          
                                   3       1 
                                                    
                           0
                                                   
                                                    
                             −1        −1       −1
    As mentioned earlier, Archetype G is consistent, while Archetype H is inconsistent.
See if you can write the two different vectors of constants from these two archetypes
as linear combinations of the two vectors that form the spanning set for C(G). How
about the two columns of G? Can you write each individually as a linear combination
of the two vectors that form the spanning set for C(G)? They must be in the column
space of G also. Are your answers unique? Do you notice anything about the scalars
that appear in the linear combinations you are forming?                             4
    Example COV and Example CSROI each describes the column space of the
coefficient matrix from Archetype I as the span of a set of r = 3 linearly independent
vectors. It is no accident that these two different sets both have the same size. If we
(you?) were to calculate the column space of this matrix using the null space of the
matrix L from Theorem FS then we would again find a set of 3 linearly independent
vectors that span the range. More on this later.
    So we have three different methods to obtain a description of the column space of
a matrix as the span of a linearly independent set. Theorem BCS is sometimes useful
since the vectors it specifies are equal to actual columns of the matrix. Theorem BRS
and Theorem CSRST combine to create vectors with lots of zeros, and strategically
placed 1’s near the top of the vector. Theorem FS and the matrix L from the
extended echelon form gives us a third method, which tends to create vectors with
lots of zeros, and strategically placed 1’s near the bottom of the vector. If we do not
care about linear independence we can also appeal to Definition CSM and simply
express the column space as the span of all the columns of the matrix, giving us a
fourth description.
    With Theorem CSRST and Definition RSM, we can compute column spaces
with theorems about row spaces, and we can compute row spaces with theorems
about column spaces, but in each case we must transpose the matrix first. At this
point you may be overwhelmed by all the possibilities for computing column and
row spaces. Diagram CSRST is meant to help. For both the column space and row
space, it suggests four techniques. One is to appeal to the definition, another yields a
span of a linearly independent set, and a third uses Theorem FS. A fourth suggests
transposing the matrix and the dashed line implies that then the companion set of
techniques can be applied. This can lead to a bit of silliness, since if you were to
follow the dashed lines twice you would transpose the matrix twice, and by Theorem
TT would accomplish nothing productive.

                            Definition CSM
                            Theorem BCS
        C(A)
                            Theorem FS, N (L)
                            Theorem CSRST, R(At )


                            Definition RSM, C(At )
                            Theorem FS, R(C)
       R(A)
                            Theorem BRS
                            Definition RSM

            Diagram CSRST: Column Space and Row Space Techniques
194                               Ro b e rt B e e z e r                                 §F S

    Although we have many ways to describe a column space, notice that one tempting
strategy will usually fail. It is not possible to simply row-reduce a matrix directly
and then use the columns of the row-reduced matrix as a set whose span equals
the column space. In other words, row operations do not preserve column spaces
(however row operations do preserve row spaces, Theorem REMRS). See Exercise
CRS.M21.

Reading Questions

1. Find a nontrivial element of the left null   space of A.
                                                             
                                         2      1     −3    4
                                 A = −1        −1     2   −1
                                         0      −1     1   2

2. Find the matrices C and L in the extended      echelon form of A.
                                                         
                                        −9         5   −3
                                  A= 2           −1    1
                                        −5         3   −1

3. Why is Theorem FS a great conclusion to Chapter M?

Exercises
C20 Example FSAG concludes with several questions. Perform the analysis suggested by
these questions.
C25† Given the matrix A below, use the extended echelon form of A to answer each part
of this problem. In each part, find a linearly independent set of vectors, S, so that the span
of S, hSi, equals the specified set of vectors.
                                           −5     3   −1
                                                        
                                         −1      1   1
                                    A=
                                           −8     5   −1
                                            3    −2    0

   1. The row space of A, R(A).

   2. The column space of A, C(A).

   3. The null space of A, N (A).

   4. The left null space of A, L(A).

C26†    For the matrix D below use the extended echelon form to find:

   1. A linearly independent set whose span is the column space of D.

   2. A linearly independent set whose span is the left null space of D.


                                 −7        −11       −19   −15
                                                             
                                6         10         18   14 
                              D=
                                  3         5          9    7 
                                 −1        −2        −4    −3

C41 The following archetypes are systems of equations. For each system, write the vector
of constants as a linear combination of the vectors in the span construction for the column
space provided by Theorem FS and Theorem BNS (these vectors are listed for each of these
archetypes).

Archetype A, Archetype B, Archetype C, Archetype D, Archetype E, Archetype F, Archetype
G, Archetype H, Archetype I, Archetype J
C43 The following archetypes are either matrices or systems of equations with coefficient
matrices. For each matrix, compute the extended echelon form N and identify the matrices
C and L. Using Theorem FS, Theorem BNS and Theorem BRS express the null space, the
row space, the column space and left null space of each coefficient matrix as a span of a
§F S                A First Course in Linear Algebra                                   195

linearly independent set.

Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J, Archetype K, Archetype L
C60† For the matrix B below, find sets of vectors whose span equals the column space of
B (C(B)) and which individually meet the following extra requirements.

   1. The set illustrates the definition of the column space.

   2. The set is linearly independent and the members of the set are columns of B.

   3. The set is linearly independent with a “nice pattern of zeros and ones” at the top of
      each vector.

   4. The set is linearly independent with a “nice pattern of zeros and ones” at the bottom
      of each vector.


                                                       
                                     2       3   1    1
                                 B= 1       1   0    1
                                    −1       2   3   −4

C61†   Let A be the matrix below, and find the indicated sets with the requested properties.
                                                         
                                     2    −1     5     −3
                              A = −5      3    −12     7
                                     1     1     4     −3

   1. A linearly independent set S so that C(A) = hSi and S is composed of columns of A.

   2. A linearly independent set S so that C(A) = hSi and the vectors in S have a nice
      pattern of zeros and ones at the top of the vectors.

   3. A linearly independent set S so that C(A) = hSi and the vectors in S have a nice
      pattern of zeros and ones at the bottom of the vectors.

   4. A linearly independent set S so that R(A) = hSi.

M50 Suppose that A is a nonsingular matrix. Extend the four conclusions of Theorem
FS in this special case and discuss connections with previous results (such as Theorem
NME4).
M51 Suppose that A is a singular matrix. Extend the four conclusions of Theorem FS in
this special case and discuss connections with previous results (such as Theorem NME4).
196   Ro b e rt B e e z e r   §F S
Chapter VS
Vector Spaces

We now have a computational toolkit in place and so we can begin our study of
linear algebra at a more theoretical level.
    Linear algebra is the study of two fundamental objects, vector spaces and linear
transformations (see Chapter LT). This chapter will focus on the former. The power
of mathematics is often derived from generalizing many different situations into one
abstract formulation, and that is exactly what we will be doing throughout this
chapter.


Section VS
Vector Spaces
In this section we present a formal definition of a vector space, which will lead to an
extra increment of abstraction. Once defined, we study its most basic properties.

Subsection VS
Vector Spaces
Here is one of the two most important definitions in the entire course.
Definition VS Vector Space
Suppose that V is a set upon which we have defined two operations: (1) vector
addition, which combines two elements of V and is denoted by “+”, and (2) scalar
multiplication, which combines a complex number with an element of V and is
denoted by juxtaposition. Then V , along with the two operations, is a vector space
over C if the following ten properties hold.

   • AC Additive Closure
      If u, v ∈ V , then u + v ∈ V .

   • SC Scalar Closure
      If α ∈ C and u ∈ V , then αu ∈ V .

   • C Commutativity
      If u, v ∈ V , then u + v = v + u.

   • AA Additive Associativity
      If u, v, w ∈ V , then u + (v + w) = (u + v) + w.

   • Z Zero Vector
     There is a vector, 0, called the zero vector, such that u + 0 = u for all u ∈ V .

   • AI Additive Inverses
      If u ∈ V , then there exists a vector −u ∈ V so that u + (−u) = 0.

                                          197
198                               Ro b e rt B e e z e r                          §V S

      • SMA Scalar Multiplication Associativity
        If α, β ∈ C and u ∈ V , then α(βu) = (αβ)u.

      • DVA Distributivity across Vector Addition
        If α ∈ C and u, v ∈ V , then α(u + v) = αu + αv.

      • DSA Distributivity across Scalar Addition
        If α, β ∈ C and u ∈ V , then (α + β)u = αu + βu.

      • O One
        If u ∈ V , then 1u = u.

   The objects in V are called vectors, no matter what else they might really be,
simply by virtue of being elements of a vector space.                          
    Now, there are several important observations to make. Many of these will be
easier to understand on a second or third reading, and especially after carefully
studying the examples in Subsection VS.EVS.
    An axiom is often a “self-evident” truth. Something so fundamental that we
all agree it is true and accept it without proof. Typically, it would be the logical
underpinning that we would begin to build theorems upon. Some might refer to
the ten properties of Definition VS as axioms, implying that a vector space is a
very natural object and the ten properties are the essence of a vector space. We
will instead emphasize that we will begin with a definition of a vector space. After
studying the remainder of this chapter, you might return here and remind yourself
how all our forthcoming theorems and definitions rest on this foundation.
    As we will see shortly, the objects in V can be anything, even though we will
call them vectors. We have been working with vectors frequently, but we should
stress here that these have so far just been column vectors — scalars arranged in a
columnar list of fixed length. In a similar vein, you have used the symbol “+” for
many years to represent the addition of numbers (scalars). We have extended its
use to the addition of column vectors and to the addition of matrices, and now we
are going to recycle it even further and let it denote vector addition in any possible
vector space. So when describing a new vector space, we will have to define exactly
what “+” is. Similar comments apply to scalar multiplication. Conversely, we can
define our operations any way we like, so long as the ten properties are fulfilled (see
Example CVS).
    In Definition VS, the scalars do not have to be complex numbers. They can come
from what are called in more advanced mathematics, “fields”. Examples of fields are
the set of complex numbers, the set of real numbers, the set of rational numbers,
and even the finite set of “binary numbers”, {0, 1}. There are many, many others.
In this case we would call V a vector space over (the field) F .
    A vector space is composed of three objects, a set and two operations. Some
would explicitly state in the definition that V must be a nonempty set, but we can
infer this from Property Z, since the set cannot be empty and contain a vector that
behaves as the zero vector. Also, we usually use the same symbol for both the set
and the vector space itself. Do not let this convenience fool you into thinking the
operations are secondary!
    This discussion has either convinced you that we are really embarking on a new
level of abstraction, or it has seemed cryptic, mysterious or nonsensical. You might
want to return to this section in a few days and give it another read then. In any
case, let us look at some concrete examples now.

Subsection EVS
Examples of Vector Spaces
Our aim in this subsection is to give you a storehouse of examples to work with, to
become comfortable with the ten vector space properties and to convince you that
the multitude of examples justifies (at least initially) making such a broad definition
§V S                A First Course in Linear Algebra                                     199

as Definition VS. Some of our claims will be justified by reference to previous
theorems, we will prove some facts from scratch, and we will do one nontrivial
example completely. In other places, our usual thoroughness will be neglected, so
grab paper and pencil and play along.
Example VSCV The vector space Cm
Set: Cm , all column vectors of size m, Definition VSCV.
    Equality: Entry-wise, Definition CVE.
    Vector Addition: The “usual” addition, given in Definition CVA.
    Scalar Multiplication: The “usual” scalar multiplication, given in Definition
CVSM.
    Does this set with these operations fulfill the ten properties? Yes. And by design
all we need to do is quote Theorem VSPCV. That was easy.                            4
Example VSM The vector space of matrices, Mmn
Set: Mmn , the set of all matrices of size m × n and entries from C, Definition VSM.
   Equality: Entry-wise, Definition ME.
   Vector Addition: The “usual” addition, given in Definition MA.
   Scalar Multiplication: The “usual” scalar multiplication, given in Definition MSM.
   Does this set with these operations fulfill the ten properties? Yes. And all we
need to do is quote Theorem VSPM. Another easy one (by design).                    4
     So, the set of all matrices of a fixed size forms a vector space. That entitles us to
call a matrix a vector, since a matrix is an element of a vector space. For example, if
A, B ∈ M34 then we call A and B “vectors,” and we even use our previous notation
for column vectors to refer to A and B. So we could legitimately write expressions
like
                            u+v =A+B =B+A=v+u
This could lead to some confusion, but it is not too great a danger. But it is worth
comment.
     The previous two examples may be less than satisfying. We made all the relevant
definitions long ago. And the required verifications were all handled by quoting old
theorems. However, it is important to consider these two examples first. We have
been studying vectors and matrices carefully (Chapter V, Chapter M), and both
objects, along with their operations, have certain properties in common, as you
may have noticed in comparing Theorem VSPCV with Theorem VSPM. Indeed,
it is these two theorems that motivate us to formulate the abstract definition of a
vector space, Definition VS. Now, if we prove some general theorems about vector
spaces (as we will shortly in Subsection VS.VSP), we can then instantly apply the
conclusions to both Cm and Mmn . Notice too, how we have taken six definitions and
two theorems and reduced them down to two examples. With greater generalization
and abstraction our old ideas get downgraded in stature.
     Let us look at some more examples, now considering some new vector spaces.
Example VSP The vector space of polynomials, Pn
Set: Pn , the set of all polynomials of degree n or less in the variable x with coefficients
from C.
    Equality:
            a0 + a1 x + a2 x2 + · · · + an xn = b0 + b1 x + b2 x2 + · · · + bn xn
                            if and only if ai = bi for 0 ≤ i ≤ n
   Vector Addition:
         (a0 + a1 x + a2 x2 + · · · + an xn ) + (b0 + b1 x + b2 x2 + · · · + bn xn ) =
                     (a0 + b0 ) + (a1 + b1 )x + (a2 + b2 )x2 + · · · + (an + bn )xn
   Scalar Multiplication:
  α(a0 + a1 x + a2 x2 + · · · + an xn ) = (αa0 ) + (αa1 )x + (αa2 )x2 + · · · + (αan )xn
   This set, with these operations, will fulfill the ten properties, though we will not
work all the details here. However, we will make a few comments and prove one of
200                                     Ro b e rt B e e z e r                                        §V S

the properties. First, the zero vector (Property Z) is what you might expect, and
you can check that it has the required property.
                                   0 = 0 + 0x + 0x2 + · · · + 0xn
   The additive inverse (Property AI) is also no surprise, though consider how we
have chosen to write it.
                                    
 − a0 + a1 x + a2 x2 + · · · + an xn = (−a0 ) + (−a1 )x + (−a2 )x2 + · · · + (−an )xn
    Now let us prove the associativity of vector addition (Property AA). This is a
bit tedious, though necessary. Throughout, the plus sign (“+”) does triple-duty. You
might ask yourself what each plus sign represents as you work through this proof.
u+(v + w)
      = (a0 + a1 x + · · · + an xn ) + ((b0 + b1 x + · · · + bn xn ) + (c0 + c1 x + · · · + cn xn ))
      = (a0 + a1 x + · · · + an xn ) + ((b0 + c0 ) + (b1 + c1 )x + · · · + (bn + cn )xn )
      = (a0 + (b0 + c0 )) + (a1 + (b1 + c1 ))x + · · · + (an + (bn + cn ))xn
      = ((a0 + b0 ) + c0 ) + ((a1 + b1 ) + c1 )x + · · · + ((an + bn ) + cn )xn
      = ((a0 + b0 ) + (a1 + b1 )x + · · · + (an + bn )xn ) + (c0 + c1 x + · · · + cn xn )
      = ((a0 + a1 x + · · · + an xn ) + (b0 + b1 x + · · · + bn xn )) + (c0 + c1 x + · · · + cn xn )
      = (u + v) + w
    Notice how it is the application of the associativity of the (old) addition of
complex numbers in the middle of this chain of equalities that makes the whole proof
happen. The remainder is successive applications of our (new) definition of vector
(polynomial) addition. Proving the remainder of the ten properties is similar in style
and tedium. You might try proving the commutativity of vector addition (Property
C), or one of the distributivity properties (Property DVA, Property DSA).           4
Example VSIS The vector space of infinite sequences
Set: C∞ = { (c0 , c1 , c2 , c3 , . . .)| ci ∈ C, i ∈ N}.
   Equality:
          (c0 , c1 , c2 , . . .) = (d0 , d1 , d2 , . . .) if and only if ci = di for all i ≥ 0
      Vector Addition:
           (c0 , c1 , c2 , . . .) + (d0 , d1 , d2 , . . .) = (c0 + d0 , c1 + d1 , c2 + d2 , . . .)
      Scalar Multiplication:
                       α(c0 , c1 , c2 , c3 , . . .) = (αc0 , αc1 , αc2 , αc3 , . . .)
    This should remind you of the vector space Cm , though now our lists of scalars are
written horizontally with commas as delimiters and they are allowed to be infinite in
length. What does the zero vector look like (Property Z)? Additive inverses (Property
AI)? Can you prove the associativity of vector addition (Property AA)?               4
Example VSF The vector space of functions
Let X be any set.
    Set: F = { f | f : X → C}.
    Equality: f = g if and only if f (x) = g(x) for all x ∈ X.
    Vector Addition: f + g is the function with outputs defined by (f + g)(x) =
f (x) + g(x).
    Scalar Multiplication: αf is the function with outputs defined by (αf )(x) =
αf (x).
    So this is the set of all functions of one variable that take elements of the set X
to a complex number. You might have studied functions of one variable that take a
real number to a real number, and that might be a more natural set to use as X.
But since we are allowing our scalars to be complex numbers, we need to specify
that the range of our functions is the complex numbers. Study carefully how the
definitions of the operation are made, and think about the different uses of “+” and
juxtaposition. As an example of what is required when verifying that this is a vector
space, consider that the zero vector (Property Z) is the function z whose definition
§V S                A First Course in Linear Algebra                                  201

is z(x) = 0 for every input x ∈ X.
    Vector spaces of functions are very important in mathematics and physics, where
the field of scalars may be the real numbers, so the ranges of the functions can in
turn also be the set of real numbers.                                            4
    Here is a unique example.
Example VSS The singleton vector space
Set: Z = {z}.
    Equality: Huh?
    Vector Addition: z + z = z.
    Scalar Multiplication: αz = z.
    This should look pretty wild. First, just what is z? Column vector, matrix,
polynomial, sequence, function? Mineral, plant, or animal? We aren’t saying! z just
is. And we have definitions of vector addition and scalar multiplication that are
sufficient for an occurrence of either that may come along.
    Our only concern is if this set, along with the definitions of two operations, fulfills
the ten properties of Definition VS. Let us check associativity of vector addition
(Property AA). For all u, v, w ∈ Z,
                              u + (v + w) = z + (z + z)
                                             =z+z
                                             = (z + z) + z
                                             = (u + v) + w
    What is the zero vector in this vector space (Property Z)? With only one element
in the set, we do not have much choice. Is z = 0? It appears that z behaves like the
zero vector should, so it gets the title. Maybe now the definition of this vector space
does not seem so bizarre. It is a set whose only element is the element that behaves
like the zero vector, so that lone element is the zero vector.                       4
    Perhaps some of the above definitions and verifications seem obvious or like
splitting hairs, but the next example should convince you that they are necessary.
We will study this one carefully. Ready? Check your preconceptions at the door.
 Example CVS The crazy vector space
 Set: C = { (x1 , x2 )| x1 , x2 ∈ C}.
    Vector Addition: (x1 , x2 ) + (y1 , y2 ) = (x1 + y1 + 1, x2 + y2 + 1).
    Scalar Multiplication: α(x1 , x2 ) = (αx1 + α − 1, αx2 + α − 1).
    Now, the first thing I hear you say is “You can’t do that!” And my response is,
“Oh yes, I can!” I am free to define my set and my operations any way I please. They
 may not look natural, or even useful, but we will now verify that they provide us
with another example of a vector space. And that is enough. If you are adventurous,
you might try first checking some of the properties yourself. What is the zero vector?
Additive inverses? Can you prove associativity? Ready, here we go.
    Property AC, Property SC: The result of each operation is a pair of complex
 numbers, so these two closure properties are fulfilled.
    Property C:
               u + v = (x1 , x2 ) + (y1 , y2 ) = (x1 + y1 + 1, x2 + y2 + 1)
                      = (y1 + x1 + 1, y2 + x2 + 1) = (y1 , y2 ) + (x1 , x2 )
                      =v+u
    Property AA:
           u + (v + w) = (x1 , x2 ) + ((y1 , y2 ) + (z1 , z2 ))
                         = (x1 , x2 ) + (y1 + z1 + 1, y2 + z2 + 1)
                         = (x1 + (y1 + z1 + 1) + 1, x2 + (y2 + z2 + 1) + 1)
                         = (x1 + y1 + z1 + 2, x2 + y2 + z2 + 2)
                         = ((x1 + y1 + 1) + z1 + 1, (x2 + y2 + 1) + z2 + 1)
                         = (x1 + y1 + 1, x2 + y2 + 1) + (z1 , z2 )
202                                    Ro b e rt B e e z e r                    §V S

                            = ((x1 , x2 ) + (y1 , y2 )) + (z1 , z2 )
                            = (u + v) + w
   Property Z: The zero vector is . . . 0 = (−1, −1). Now I hear you say, “No, no,
that can’t be, it must be (0, 0)!”. Indulge me for a moment and let us check my
proposal.
  u + 0 = (x1 , x2 ) + (−1, −1) = (x1 + (−1) + 1, x2 + (−1) + 1) = (x1 , x2 ) = u
Feeling better? Or worse?
    Property AI: For each vector, u, we must locate an additive inverse, −u. Here it
is, −(x1 , x2 ) = (−x1 −2, −x2 −2). As odd as it may look, I hope you are withholding
judgment. Check:
         u + (−u) = (x1 , x2 ) + (−x1 − 2, −x2 − 2)
                    = (x1 + (−x1 − 2) + 1, −x2 + (x2 − 2) + 1) = (−1, −1) = 0
      Property SMA:
             α(βu) = α(β(x1 , x2 ))
                     = α(βx1 + β − 1, βx2 + β − 1)
                     = (α(βx1 + β − 1) + α − 1, α(βx2 + β − 1) + α − 1)
                     = ((αβx1 + αβ − α) + α − 1, (αβx2 + αβ − α) + α − 1)
                     = (αβx1 + αβ − 1, αβx2 + αβ − 1)
                     = (αβ)(x1 , x2 )
                     = (αβ)u
   Property DVA: If you have hung on so far, here is where it gets even wilder. In
the next two properties we mix and mash the two operations.
  α(u + v)
         = α ((x1 , x2 ) + (y1 , y2 ))
         = α(x1 + y1 + 1, x2 + y2 + 1)
         = (α(x1 + y1 + 1) + α − 1, α(x2 + y2 + 1) + α − 1)
         = (αx1 + αy1 + α + α − 1, αx2 + αy2 + α + α − 1)
         = (αx1 + α − 1 + αy1 + α − 1 + 1, αx2 + α − 1 + αy2 + α − 1 + 1)
         = ((αx1 + α − 1) + (αy1 + α − 1) + 1, (αx2 + α − 1) + (αy2 + α − 1) + 1)
         = (αx1 + α − 1, αx2 + α − 1) + (αy1 + α − 1, αy2 + α − 1)
         = α(x1 , x2 ) + α(y1 , y2 )
         = αu + αv
      Property DSA:
  (α + β)u
        = (α + β)(x1 , x2 )
        = ((α + β)x1 + (α + β) − 1, (α + β)x2 + (α + β) − 1)
        = (αx1 + βx1 + α + β − 1, αx2 + βx2 + α + β − 1)
        = (αx1 + α − 1 + βx1 + β − 1 + 1, αx2 + α − 1 + βx2 + β − 1 + 1)
        = ((αx1 + α − 1) + (βx1 + β − 1) + 1, (αx2 + α − 1) + (βx2 + β − 1) + 1)
        = (αx1 + α − 1, αx2 + α − 1) + (βx1 + β − 1, βx2 + β − 1)
        = α(x1 , x2 ) + β(x1 , x2 )
        = αu + βu
      Property O: After all that, this one is easy, but no less pleasing.
                1u = 1(x1 , x2 ) = (x1 + 1 − 1, x2 + 1 − 1) = (x1 , x2 ) = u
   That is it, C is a vector space, as crazy as that may seem.
   Notice that in the case of the zero vector and additive inverses, we only had to
propose possibilities and then verify that they were the correct choices. You might
§V S               A First Course in Linear Algebra                                203

try to discover how you would arrive at these choices, though you should understand
why the process of discovering them is not a necessary component of the proof itself.
4

Subsection VSP
Vector Space Properties
Subsection VS.EVS has provided us with an abundance of examples of vector spaces,
most of them containing useful and interesting mathematical objects along with
natural operations. In this subsection we will prove some general properties of
vector spaces. Some of these results will again seem obvious, but it is important
to understand why it is necessary to state and prove them. A typical hypothesis
will be “Let V be a vector space.” From this we may assume the ten properties of
Definition VS, and nothing more. It is like starting over, as we learn about what can
happen in this new algebra we are learning. But the power of this careful approach is
that we can apply these theorems to any vector space we encounter — those in the
previous examples, or new ones we have not yet contemplated. Or perhaps new ones
that nobody has ever contemplated. We will illustrate some of these results with
examples from the crazy vector space (Example CVS), but mostly we are stating
theorems and doing proofs. These proofs do not get too involved, but are not trivial
either, so these are good theorems to try proving yourself before you study the proof
given here. (See Proof Technique P.)
    First we show that there is just one zero vector. Notice that the properties only
require there to be at least one, and say nothing about there possibly being more.
That is because we can use the ten properties of a vector space (Definition VS) to
learn that there can never be more than one. To require that this extra condition
be stated as an eleventh property would make the definition of a vector space more
complicated than it needs to be.
Theorem ZVU Zero Vector is Unique
Suppose that V is a vector space. The zero vector, 0, is unique.

Proof. To prove uniqueness, a standard technique is to suppose the existence of two
objects (Proof Technique U). So let 01 and 02 be two zero vectors in V . Then
                  01 = 01 + 02                    Property Z for 02
                     = 02 + 01                    Property C
                     = 02                         Property Z for 01
   This proves the uniqueness since the two zero vectors are really the same.        

Theorem AIU Additive Inverses are Unique
Suppose that V is a vector space. For each u ∈ V , the additive inverse, −u, is unique.

Proof. To prove uniqueness, a standard technique is to suppose the existence of two
objects (Proof Technique U). So let −u1 and −u2 be two additive inverses for u.
Then
               −u1 = −u1 + 0                             Property Z
                    = −u1 + (u + −u2 )                   Property AI
                    = (−u1 + u) + −u2                    Property AA
                    = 0 + −u2                            Property AI
                    = −u2                                Property Z
   So the two additive inverses are really the same.                                 

   As obvious as the next three theorems appear, nowhere have we guaranteed that
the zero scalar, scalar multiplication and the zero vector all interact this way. Until
we have proved it, anyway.
Theorem ZSSM Zero Scalar in Scalar Multiplication
Suppose that V is a vector space and u ∈ V . Then 0u = 0.
204                             Ro b e rt B e e z e r                              §V S

Proof. Notice that 0 is a scalar, u is a vector, so Property SC says 0u is again a
vector. As such, 0u has an additive inverse, −(0u) by Property AI.
               0u = 0 + 0u                               Property Z
                   = (−(0u) + 0u) + 0u                   Property AI
                   = −(0u) + (0u + 0u)                   Property AA
                   = −(0u) + (0 + 0)u                    Property DSA
                   = −(0u) + 0u                          Property ZCN
                   =0                                    Property AI
                                                                                      

    Here is another theorem that looks like it should be obvious, but is still in need
of a proof.
Theorem ZVSM Zero Vector in Scalar Multiplication
Suppose that V is a vector space and α ∈ C. Then α0 = 0.

Proof. Notice that α is a scalar, 0 is a vector, so Property SC means α0 is again a
vector. As such, α0 has an additive inverse, −(α0) by Property AI.
               α0 = 0 + α0                               Property Z
                   = (−(α0) + α0) + α0                   Property AI
                   = −(α0) + (α0 + α0)                   Property AA
                   = −(α0) + α (0 + 0)                   Property DVA
                   = −(α0) + α0                          Property Z
                   =0                                    Property AI

    Here is another one that sure looks obvious. But understand that we have chosen
to use certain notation because it makes the theorem’s conclusion look so nice. The
theorem is not true because the notation looks so good; it still needs a proof. If we
had really wanted to make this point, we might have used notation like u] for the
additive inverse of u. Then we would have written the defining property, Property
AI, as u + u] = 0. This theorem would become u] = (−1)u. Not really quite as
pretty, is it?
Theorem AISM Additive Inverses from Scalar Multiplication
Suppose that V is a vector space and u ∈ V . Then −u = (−1)u.

Proof.
              −u = −u + 0                               Property Z
                  = −u + 0u                             Theorem ZSSM
                  = −u + (1 + (−1)) u                   Property AICN
                  = −u + (1u + (−1)u)                   Property DSA
                  = −u + (u + (−1)u)                    Property O
                  = (−u + u) + (−1)u                    Property AA
                  = 0 + (−1)u                           Property AI
                  = (−1)u                               Property Z
                                                                                      

    Because of this theorem, we can now write linear combinations like 6u1 + (−4)u2
as 6u1 − 4u2 , even though we have not formally defined an operation called vector
subtraction.
    Our next theorem is a bit different from several of the others in the list. Rather
than making a declaration (“the zero vector is unique”) it is an implication (“if. . . ,
then. . . ”) and so can be used in proofs to convert a vector equality into two possibil-
ities, one a scalar equality and the other a vector equality. It should remind you of
the situation for complex numbers. If α, β ∈ C and αβ = 0, then α = 0 or β = 0.
§V S               A First Course in Linear Algebra                              205

This critical property is the driving force behind using a factorization to solve a
polynomial equation.
Theorem SMEZV Scalar Multiplication Equals the Zero Vector
Suppose that V is a vector space and α ∈ C. If αu = 0, then either α = 0 or u = 0.
Proof. We prove this theorem by breaking up the analysis into two cases. The first
seems too trivial, and it is, but the logic of the argument is still legitimate.
    Case 1. Suppose α = 0. In this case our conclusion is true (the first part of the
either/or is true) and we are done. That was easy.
    Case 2. Suppose α 6= 0.
     u = 1u            Property O
             
           1
       =     α u       α 6= 0, < acroref type = ”property”acro = ”M ICN ”/ >
           α
         1
       = (αu)          Property SMA
         α
         1
       = (0)           Hypothesis
         α
       =0              Theorem ZVSM
   So in this case, the conclusion is true (the second part of the either/or is true)
and we are done since the conclusion was true in each of the two cases.            
Example PCVS Properties for the Crazy Vector Space
Several of the above theorems have interesting demonstrations when applied to the
crazy vector space, C (Example CVS). We are not proving anything new here, or
learning anything we did not know already about C. It is just plain fun to see how
these general theorems apply in a specific instance. For most of our examples, the
applications are obvious or trivial, but not with C.
    Suppose u ∈ C. Then, as given by Theorem ZSSM,
            0u = 0(x1 , x2 ) = (0x1 + 0 − 1, 0x2 + 0 − 1) = (−1, −1) = 0
And as given by Theorem ZVSM,
                α0 = α(−1, −1) = (α(−1) + α − 1, α(−1) + α − 1)
                    = (−α + α − 1, −α + α − 1) = (−1, −1) = 0
Finally, as given by Theorem AISM,
         (−1)u = (−1)(x1 , x2 ) = ((−1)x1 + (−1) − 1, (−1)x2 + (−1) − 1)
                = (−x1 − 2, −x2 − 2) = −u
                                                                                   4

Subsection RD
Recycling Definitions
When we say that V is a vector space, we then know we have a set of objects (the
“vectors”), but we also know we have been provided with two operations (“vector
addition” and “scalar multiplication”) and these operations behave with these objects
according to the ten properties of Definition VS. One combines two vectors and
produces a vector, the other takes a scalar and a vector, producing a vector as the
result. So if u1 , u2 , u3 ∈ V then an expression like
                                 5u1 + 7u2 − 13u3
would be unambiguous in any of the vector spaces we have discussed in this section.
And the resulting object would be another vector in the vector space. If you were
tempted to call the above expression a linear combination, you would be right. Four
of the definitions that were central to our discussions in Chapter V were stated in
the context of vectors being column vectors, but were purposely kept broad enough
that they could be applied in the context of any vector space. They only rely on the
presence of scalars, vectors, vector addition and scalar multiplication to make sense.
206                               Ro b e rt B e e z e r                                §V S

We will restate them shortly, unchanged, except that their titles and acronyms no
longer refer to column vectors, and the hypothesis of being in a vector space has
been added. Take the time now to look forward and review each one, and begin
to form some connections to what we have done earlier and what we will be doing
in subsequent sections and chapters. Specifically, compare the following pairs of
definitions:
      • Definition LCCV and Definition LC
      • Definition SSCV and Definition SS
      • Definition RLDCV and Definition RLD
      • Definition LICV and Definition LI

Reading Questions
1. Comment on how the vector space Cm went from a theorem (Theorem VSPCV) to an
   example (Example VSCV).
2. In the crazy vector space, C, (Example CVS) compute the linear combination
                                     2(3, 4) + (−6)(1, 2).

3. Suppose that α is a scalar and 0 is the zero vector. Why should we prove anything as
   obvious as α0 = 0 such as we did in Theorem ZVSM?

Exercises
M10 Define a possibly new vector space by beginning with the set and vector addition
from C2 (Example VSCV) but change the definition of scalar multiplication to
                            
                            0
                  αx = 0 =                        α ∈ C, x ∈ C2
                            0
Prove that the first nine properties required for a vector space hold, but Property O does
not hold.

This example shows us that we cannot expect to be able to derive Property O as a
consequence of assuming the first nine properties. In other words, we cannot slim down our
list of properties by jettisoning the last one, and still have the same collection of objects
qualify as vector spaces.
M11† Let V be the set C2 with the usual vector addition, but with scalar multiplication
defined by
                                       
                                     x      αy
                                  α     =
                                     y      αx
Determine whether or not V is a vector space with these operations.
M12† Let V be the set C2 with the usual scalar multiplication, but with vector addition
defined by
                                              
                               x     z       y+w
                                  +      =
                               y     w       x+z
Determine whether or not V is a vector space with these operations.
M13† Let V be the set M22 with the usual scalar multiplication, but with addition defined
by A + B = O2,2 for all 2 × 2 matrices A and B. Determine whether or not V is a vector
space with these operations.
M14† Let V be the set M22 with the usual addition, but with scalar multiplication defined
by αA = O2,2 for all 2 × 2 matrices A and scalars α. Determine whether or not V is a
vector space with these operations.
M15† Consider the following sets of 3 × 3 matrices, where the symbol ∗ indicates the
position of an arbitrary complex number. Determine whether or not these sets form vector
spaces with the usual operations of addition and scalar multiplication for matrices.
                                        
                                 ∗ ∗ 1
   1. All matrices of the form ∗ 1 ∗
                                 1 ∗ ∗
§V S                A First Course in Linear Algebra                                    207
                                          
                                 ∗   0   ∗
   2. All matrices of the form 0    ∗   0
                                 ∗   0   ∗
                                          
                                 ∗   0   0
   3. All matrices of the form 0    ∗   0 (These are the diagonal matrices.)
                                 0   0   ∗
                                          
                                 ∗   ∗   ∗
   4. All matrices of the form 0    ∗   ∗ (These are the upper triangular matrices.)
                                 0   0   ∗

M20† Explain why we need to define the vector space Pn as the set of all polynomials
with degree up to and including n instead of the more obvious set of all polynomials of
degree exactly n.
                                                                           
                                                                 m
M21† The set of integers is denoted Z. Does the set Z2 =              m, n ∈ Z with the
                                                                  n
operations of standard addition and scalar multiplication of vectors form a vector space?
T10 Prove each of the ten properties of Definition VS for each of the following examples
of a vector space: Example VSP, Example VSIS, Example VSF, Example VSS.

The next three problems suggest that under the right situations we can “cancel.” In practice,
these techniques should be avoided in other proofs. Prove each of the following statements.
   T21† Suppose that V is a vector space, and u, v, w ∈ V . If w + u = w + v, then
   u = v.
   T22† Suppose V is a vector space, u, v ∈ V and α is a nonzero scalar from C. If
   αu = αv, then u = v.
   T23† Suppose V is a vector space, u 6= 0 is a vector in V and α, β ∈ C. If αu = βu,
   then α = β.
   T30† Suppose that V is a vector space and α ∈ C is a scalar such that αx = x
   for every x ∈ V . Prove that α = 1. In other words, Property O is not duplicated for
   any other scalar but the “special” scalar, 1. (This question was suggested by James
   Gallagher.)
208   Ro b e rt B e e z e r   §V S
Section S
Subspaces
A subspace is a vector space that is contained within another vector space. So every
subspace is a vector space in its own right, but it is also defined relative to some
other (larger) vector space. We will discover shortly that we are already familiar
with a wide variety of subspaces from previous sections.


Subsection S
Subspaces
Here is the principal definition for this section.
Definition S Subspace
Suppose that V and W are two vector spaces that have identical definitions of vector
addition and scalar multiplication, and that W is a subset of V , W ⊆ V . Then W is
a subspace of V .                                                                 
   Let us look at an example of a vector space inside another vector space.
Example SC3 A subspace of C3
We know that C3 is a vector space (Example VSCV). Consider the subset,
                            (" #                     )
                               x1
                      W =      x2 2x1 − 5x2 + 7x3 = 0
                               x3
    It is clear that W ⊆ C3 , since the objects in W are column vectors of size 3. But
is W a vector space? Does it satisfy the ten properties of Definition VS when we use
the same operations? That is the main question.
                   " #            " #
                     x1            y1
    Suppose x = x2 and y = y2 are vectors from W . Then we know that these
                     x3            y3
vectors cannot be totally arbitrary, they must have gained membership in W by
virtue of meeting the membership test. For example, we know that x must satisfy
2x1 − 5x2 + 7x3 = 0 while y must satisfy 2y1 − 5y2 + 7y3 = 0. Our first property
(Property AC) asks the question, is x + y ∈ W ? When our set of vectors was C3 ,
this was an easy question to answer. Now it is not so obvious. Notice first that
                                   " # " # "                #
                                     x1      y1     x1 + y1
                         x + y = x2 + y2 = x2 + y2
                                     x3      y3     x3 + y3
and we can test this vector for membership in W as follows. Because x ∈ W we know
2x1 − 5x2 + 7x3 = 0 and because y ∈ W we know 2y1 − 5y2 + 7y3 = 0. Therefore,
   2(x1 + y1 ) − 5(x2 + y2 ) + 7(x3 + y3 ) = 2x1 + 2y1 − 5x2 − 5y2 + 7x3 + 7y3
                                          = (2x1 − 5x2 + 7x3 ) + (2y1 − 5y2 + 7y3 )
                                          =0+0
                                          =0
and by this computation we see that x + y ∈ W . One property down, nine to go.
    If α is a scalar and x ∈ W , is it always true that αx ∈ W ? This is what we need
to establish Property SC. Again, the answer is not as obvious as it was when our
set of vectors was all of C3 . Let us see. First, note that because x ∈ W we know
2x1 − 5x2 + 7x3 = 0. Therefore,
                                          " # "        #
                                           x1      αx1
                                αx = α x2 = αx2
                                           x3      αx3
and we can test this vector for membership in W . First, note that because x ∈ W
we know 2x1 − 5x2 + 7x3 = 0. Therefore,
                  2(αx1 ) − 5(αx2 ) + 7(αx3 ) = α(2x1 − 5x2 + 7x3 )

                                          209
210                              Ro b e rt B e e z e r                              §S

                                              = α0
                                              =0
and we see that indeed αx ∈ W . Always.
   If W has a zero vector, it will be unique (Theorem ZVU). The zero vector for C3
should also perform the required duties when added to elements of W . So the likely
candidate for a zero"vector
                       #    in W is the same zero vector that we know C3 has. You
                      0
can check that 0 = 0 is a zero vector in W too (Property Z).
                      0
   With a zero vector, we can now ask about additive inverses (Property AI). As
you might suspect, the natural candidate for an additive inverse in W is the same as
the additive inverse from C3 . However, we must insure that these additive inverses
actually are elements of W . Given x ∈ W , is −x ∈ W ?
                                           "    #
                                            −x1
                                    −x = −x2
                                            −x3
and we can test this vector for membership in W . As before, because x ∈ W we
know 2x1 − 5x2 + 7x3 = 0.
                 2(−x1 ) − 5(−x2 ) + 7(−x3 ) = −(2x1 − 5x2 + 7x3 )
                                              = −0
                                              =0
and we now believe that −x ∈ W .
    Is the vector addition in W commutative (Property C)? Is x + y = y + x? Of
course! Nothing about restricting the scope of our set of vectors will prevent the
operation from still being commutative. Indeed, the remaining five properties are
unaffected by the transition to a smaller set of vectors, and so remain true. That
was convenient.
    So W satisfies all ten properties, is therefore a vector space, and thus earns the
title of being a subspace of C3 .                                                   4

Subsection TS
Testing Subspaces
In Example SC3 we proceeded through all ten of the vector space properties before
believing that a subset was a subspace. But six of the properties were easy to prove,
and we can lean on some of the properties of the vector space (the superset) to make
the other four easier. Here is a theorem that will make it easier to test if a subset is
a vector space. A shortcut if there ever was one.
Theorem TSS Testing Subsets for Subspaces
Suppose that V is a vector space and W is a subset of V , W ⊆ V . Endow W with
the same operations as V . Then W is a subspace if and only if three conditions are
met

   1. W is nonempty, W 6= ∅.

   2. If x ∈ W and y ∈ W , then x + y ∈ W .

   3. If α ∈ C and x ∈ W , then αx ∈ W .

Proof. (⇒) We have the hypothesis that W is a subspace, so by Property Z we know
that W contains a zero vector. This is enough to show that W 6= ∅. Also, since W is
a vector space it satisfies the additive and scalar multiplication closure properties
(Property AC, Property SC), and so exactly meets the second and third conditions.
If that was easy, the other direction might require a bit more work.
    (⇐) We have three properties for our hypothesis, and from this we should
conclude that W has the ten defining properties of a vector space. The second and
third conditions of our hypothesis are exactly Property AC and Property SC. Our
§S                A First Course in Linear Algebra                                  211

hypothesis that V is a vector space implies that Property C, Property AA, Property
SMA, Property DVA, Property DSA and Property O all hold. They continue to be
true for vectors from W since passing to a subset, and keeping the operation the
same, leaves their statements unchanged. Eight down, two to go.
    Suppose x ∈ W . Then by the third part of our hypothesis (scalar closure), we
know that (−1)x ∈ W . By Theorem AISM (−1)x = −x, so together these statements
show us that −x ∈ W . −x is the additive inverse of x in V , but will continue in
this role when viewed as element of the subset W . So every element of W has an
additive inverse that is an element of W and Property AI is completed. Just one
property left.
    While we have implicitly discussed the zero vector in the previous paragraph, we
need to be certain that the zero vector (of V ) really lives in W . Since W is nonempty,
we can choose some vector z ∈ W . Then by the argument in the previous paragraph,
we know −z ∈ W . Now by Property AI for V and then by the second part of our
hypothesis (additive closure) we see that
                                  0 = z + (−z) ∈ W
    So W contains the zero vector from V . Since this vector performs the required
duties of a zero vector in V , it will continue in that role as an element of W . This
gives us, Property Z, the final property of the ten required. (Sarah Fellez contributed
to this proof.)                                                                      

    So just three conditions, plus being a subset of a known vector space, gets us all
ten properties. Fabulous! This theorem can be paraphrased by saying that a subspace
is “a nonempty subset (of a vector space) that is closed under vector addition and
scalar multiplication.”
    You might want to go back and rework Example SC3 in light of this result,
perhaps seeing where we can now economize or where the work done in the example
mirrored the proof and where it did not. We will press on and apply this theorem in
a slightly more abstract setting.
Example SP4 A subspace of P4
P4 is the vector space of polynomials with degree at most 4 (Example VSP). Define
a subset W as
                             W = { p(x)| p ∈ P4 , p(2) = 0}
so W is the collection of those polynomials (with degree 4 or less) whose graphs
cross the x-axis at x = 2. Whenever we encounter a new set it is a good idea to gain
a better understanding of the set by finding a few elements in the set, and a few
outside it. For example x2 − x − 2 ∈ W , while x4 + x3 − 7 6∈ W .
   Is W nonempty? Yes, x − 2 ∈ W .
   Additive closure? Suppose p ∈ W and q ∈ W . Is p + q ∈ W ? p and q are not
totally arbitrary, we know that p(2) = 0 and q(2) = 0. Then we can check p + q for
membership in W ,
               (p + q)(2) = p(2) + q(2)                 Addition in P4
                           =0+0                         p ∈ W, q ∈ W
                           =0
so we see that p + q qualifies for membership in W .
   Scalar multiplication closure? Suppose that α ∈ C and p ∈ W . Then we know
that p(2) = 0. Testing αp for membership,
              (αp)(2) = αp(2)                Scalar multiplication in P4
                      = α0                   p∈W
                      =0
so αp ∈ W .
   We have shown that W meets the three conditions of Theorem TSS and so
qualifies as a subspace of P4 . Notice that by Definition S we now know that W is
212                              Ro b e rt B e e z e r                              §S

also a vector space. So all the properties of a vector space (Definition VS) and the
theorems of Section VS apply in full.                                             4
    Much of the power of Theorem TSS is that we can easily establish new vector
spaces if we can locate them as subsets of other vector spaces, such as the vector
spaces presented in Subsection VS.EVS.
    It can be as instructive to consider some subsets that are not subspaces. Since
Theorem TSS is an equivalence (see Proof Technique E) we can be assured that a
subset is not a subspace if it violates one of the three conditions, and in any example
of interest this will not be the “nonempty” condition. However, since a subspace has
to be a vector space in its own right, we can also search for a violation of any one
of the ten defining properties in Definition VS or any inherent property of a vector
space, such as those given by the basic theorems of Subsection VS.VSP. Notice also
that a violation need only be for a specific vector or pair of vectors.
Example NSC2Z A non-subspace in C2 , zero vector
Consider the subset W below as a candidate for being a subspace of C2
                                                        
                                     x1
                             W =          3x1 − 5x2 = 12
                                     x2
                                 
                                 0
    The zero vector of C2 , 0 =     will need to be the zero vector in W also. However,
                                 0
0 6∈ W since 3(0) − 5(0) = 0 6= 12. So W has no zero vector and fails Property Z
of Definition VS. This subspace also fails to be closed under addition and scalar
multiplication. Can you find examples of this?                                      4
Example NSC2A A non-subspace in C2 , additive closure
Consider the subset X below as a candidate for being a subspace of C2
                                               
                                    x1
                            X=          x1 x2 = 0
                                    x2
   You can check that 0 ∈ X, so the approach
                                              of the last
                                                        example will not get us
                                     1                  0
anywhere. However, notice that x =       ∈ X and y =       ∈ X. Yet
                                     0                  1
                                      
                                   1      0     1
                         x+y =        +     =      6∈ X
                                   0      1     1
  So X fails the additive closure requirement of either Property AC or Theorem
TSS, and is therefore not a subspace.                                        4
Example NSC2S A non-subspace in C2 , scalar multiplication closure
Consider the subset Y below as a candidate for being a subspace of C2
                                                  
                                  x1
                          Y =         x1 ∈ Z, x2 ∈ Z
                                  x2
Z is the set of integers, so we are only allowing “whole numbers” as the constituents
of our vectors. Now, 0 ∈ Y , and additive closure also holds (can you prove these
claims?).
         So we will have to try something different. Note that α = 12 ∈ C and
 2
     ∈ Y , but
 3
                                            
                                       1 2       1
                                αx =         = 3 6∈ Y
                                       2 3       2
So Y fails the scalar multiplication closure requirement of either Property SC or
Theorem TSS, and is therefore not a subspace.                                  4
    There are two examples of subspaces that are trivial. Suppose that V is any
vector space. Then V is a subset of itself and is a vector space. By Definition S, V
qualifies as a subspace of itself. The set containing just the zero vector Z = {0} is
also a subspace as can be seen by applying Theorem TSS or by simple modifications
of the techniques hinted at in Example VSS. Since these subspaces are so obvious
(and therefore not too interesting) we will refer to them as being trivial.
§S               A First Course in Linear Algebra                                 213

Definition TS Trivial Subspaces
Given the vector space V , the subspaces V and {0} are each called a trivial
subspace.                                                                 
    We can also use Theorem TSS to prove more general statements about subspaces,
as illustrated in the next theorem.
Theorem NSMS Null Space of a Matrix is a Subspace
Suppose that A is an m × n matrix. Then the null space of A, N (A), is a subspace
of Cn .

Proof. We will examine the three requirements of Theorem TSS. Recall that Defini-
tion NSM can be formulated as N (A) = { x ∈ Cn | Ax = 0}.
    First, 0 ∈ N (A), which can be inferred as a consequence of Theorem HSC. So
N (A) 6= ∅.
    Second, check additive closure by supposing that x ∈ N (A) and y ∈ N (A). So
we know a little something about x and y: Ax = 0 and Ay = 0, and that is all we
know. Question: Is x + y ∈ N (A)? Let us check.
             A(x + y) = Ax + Ay                  Theorem MMDAA
                       =0+0                      x ∈ N (A) , y ∈ N (A)
                       =0                        Theorem VSPCV
So, yes, x + y qualifies for membership in N (A).
    Third, check scalar multiplication closure by supposing that α ∈ C and x ∈ N (A).
So we know a little something about x: Ax = 0, and that is all we know. Question:
Is αx ∈ N (A)? Let us check.
                A(αx) = α(Ax)                    Theorem MMSMM
                       = α0                      x ∈ N (A)
                       =0                        Theorem ZVSM
So, yes, αx qualifies for membership in N (A).
    Having met the three conditions in Theorem TSS we can now say that the null
space of a matrix is a subspace (and hence a vector space in its own right!). 

     Here is an example where we can exercise Theorem NSMS.
Example RSNS Recasting a subspace as a null space
Consider the subset of C5 defined as
                                                                
                    
                       x1                                         
                                                                   
                    
                     x2  3x1 + x2 − 5x3 + 7x4 + x5 = 0,         
                                                                   
                        
               W = x3  4x1 + 6x2 + 3x3 − 6x4 − 5x5 = 0,
                    
                        −2x + 4x + 7x + x = 0                   
                                                                   
                     x4
                                  1  2     4    5                 
                                                                   
                        x5
    It is possible to show that W is a subspace of C5 by checking the three conditions
of Theorem TSS directly, but it will get tedious rather quickly. Instead, give W
a fresh look and notice that it is a set of solutions to a homogeneous system of
equations. Define the matrix
                                 "                       #
                                    3 1 −5 7           1
                             A = 4 6 3 −6 −5
                                   −2 4 0         7    1
and then recognize that W = N (A). By Theorem NSMS we can immediately see
that W is a subspace. Boom!                                            4

Subsection TSS
The Span of a Set
The span of a set of column vectors got a heavy workout in Chapter V and Chapter
M. The definition of the span depended only on being able to formulate linear
combinations. In any of our more general vector spaces we always have a definition
214                                 Ro b e rt B e e z e r                                    §S

of vector addition and of scalar multiplication. So we can build linear combinations
and manufacture spans. This subsection contains two definitions that are just mild
variants of definitions we have seen earlier for column vectors. If you have not already,
compare them with Definition LCCV and Definition SSCV.
Definition LC Linear Combination
Suppose that V is a vector space. Given n vectors u1 , u2 , u3 , . . . , un and n scalars
α1 , α2 , α3 , . . . , αn , their linear combination is the vector
                            α1 u1 + α2 u2 + α3 u3 + · · · + αn un .
                                                                                             
Example LCM A linear combination of matrices
In the vector space M23 of 2 × 3 matrices, we have the vectors
                                                                                  
              1 3 −2                  3 −1 2                   4            2       −4
        x=                       y=                      z=
              2 0 7                   5 5 1                    1            1        1
and we can form linear combinations such as
                                                                                    
                            1 3 −2          3 −1 2           4                  2    −4
     2x + 4y + (−1)z = 2              +4             + (−1)
                            2 0 7           5 5 1            1                  1      1
                                                                                
                          2 6 −4        12 −4 8       −4 −2                      4
                      =             +             +
                          4 0 14        20 20 4       −1 −1                     −1
                                   
                          10 0 8
                      =
                          23 19 17
or,
                                                             
                             1      3−2       3 −1 2       4 2 −4
          4x − 2y + 3z = 4                −2           +3
                             2      0 7       5 5 1        1 1 1
                                                                 
                           4      12 −8      −6   2   −4     12 6 −12
                       =                  +               +
                           8      0 28       −10 −10 −2       3 3   3
                                          
                           10      20 −24
                       =
                            1      −7 29
                                                                                             4
    When we realize that we can form linear combinations in any vector space, then
it is natural to revisit our definition of the span of a set, since it is the set of all
possible linear combinations of a set of vectors.
Definition SS Span of a Set
Suppose that V is a vector space. Given a set of vectors S = {u1 , u2 , u3 , . . . , ut },
their span, hSi, is the set of all possible linear combinations of u1 , u2 , u3 , . . . , ut .
Symbolically,
            hSi = { α1 u1 + α2 u2 + α3 u3 + · · · + αt ut | αi ∈ C, 1 ≤ i ≤ t}
                  ( t                             )
                    X
                =       αi ui αi ∈ C, 1 ≤ i ≤ t
                      i=1

                                                                                             
Theorem SSS Span of a Set is a Subspace
Suppose V is a vector space. Given a set of vectors S = {u1 , u2 , u3 , . . . , ut } ⊆ V ,
their span, hSi, is a subspace.

Proof. By Definition SS, the span contains linear combinations of vectors from the
vector space V , so by repeated use of the closure properties, Property AC and
Property SC, hSi can be seen to be a subset of V .
   We will then verify the three conditions of Theorem TSS. First,
            0 = 0 + 0 + 0 + ... + 0                            Property Z
               = 0u1 + 0u2 + 0u3 + · · · + 0ut                 Theorem ZSSM
§S                  A First Course in Linear Algebra                                   215

    So we have written 0 as a linear combination of the vectors in S and by Definition
SS, 0 ∈ hSi and therefore hSi =    6 ∅.
    Second, suppose x ∈ hSi and y ∈ hSi. Can we conclude that x + y ∈ hSi? What
do we know about x and y by virtue of their membership in hSi? There must be
scalars from C, α1 , α2 , α3 , . . . , αt and β1 , β2 , β3 , . . . , βt so that
                          x = α1 u1 + α2 u2 + α3 u3 + · · · + αt ut
                          y = β1 u1 + β2 u2 + β3 u3 + · · · + βt ut
Then
      x + y = α1 u1 + α2 u2 + α3 u3 + · · · + αt ut
                 + β1 u1 + β2 u2 + β3 u3 + · · · + βt ut
            = α1 u1 + β1 u1 + α2 u2 + β2 u2
                 + α3 u3 + β3 u3 + · · · + αt ut + βt ut     Property AA, Property C
            = (α1 + β1 )u1 + (α2 + β2 )u2
                 + (α3 + β3 )u3 + · · · + (αt + βt )ut       Property DSA
Since each αi + βi is again a scalar from C we have expressed the vector sum x + y
as a linear combination of the vectors from S, and therefore by Definition SS we can
say that x + y ∈ hSi.
    Third, suppose α ∈ C and x ∈ hSi. Can we conclude that αx ∈ hSi? What do
we know about x by virtue of its membership in hSi? There must be scalars from C,
α1 , α2 , α3 , . . . , αt so that
                          x = α1 u1 + α2 u2 + α3 u3 + · · · + αt ut


Then
       αx = α (α1 u1 + α2 u2 + α3 u3 + · · · + αt ut )
           = α(α1 u1 ) + α(α2 u2 ) + α(α3 u3 ) + · · · + α(αt ut )    Property DVA
           = (αα1 )u1 + (αα2 )u2 + (αα3 )u3 + · · · + (ααt )ut        Property SMA


Since each ααi is again a scalar from C we have expressed the scalar multiple αx as
a linear combination of the vectors from S, and therefore by Definition SS we can
say that αx ∈ hSi.
    With the three conditions of Theorem TSS met, we can say that hSi is a subspace
(and so is also a vector space, Definition VS). (See Exercise SS.T20, Exercise SS.T21,
Exercise SS.T22.)                                                                   

Example SSP Span of a set of polynomials
In Example SP4 we proved that
                              W = { p(x)| p ∈ P4 , p(2) = 0}
is a subspace of P4 , the vector space of polynomials of degree at most 4. Since W is
a vector space itself, let us construct a span within W . First let
                  
             S = x4 − 4x3 + 5x2 − x − 2, 2x4 − 3x3 − 6x2 + 6x + 4
and verify that S is a subset of W by checking that each of these two polynomials
has x = 2 as a root. Now, if we define U = hSi, then Theorem SSS tells us that U is
a subspace of W . So quite quickly we have built a chain of subspaces, U inside W ,
and W inside P4 .
   Rather than dwell on how quickly we can build subspaces, let us try to gain a
better understanding of just how the span construction creates subspaces, in the
context of this example. We can quickly build representative elements of U ,
3(x4 −4x3 +5x2 −x−2)+5(2x4 −3x3 −6x2 +6x+4) = 13x4 −27x3 −15x2 +27x+14
and
(−2)(x4 −4x3 +5x2 −x−2)+8(2x4 −3x3 −6x2 +6x+4) = 14x4 −16x3 −58x2 +50x+36
216                               Ro b e rt B e e z e r                             §S

and each of these polynomials must be in W since it is closed under addition and
scalar multiplication. But you might check for yourself that both of these polynomials
have x = 2 as a root.
    I can tell you that y = 3x4 − 7x3 − x2 + 7x − 2 is not in U , but would you believe
me? A first check shows that y does have x = 2 as a root, but that only shows that
y ∈ W . What does y have to do to gain membership in U = hSi? It must be a linear
combination of the vectors in S, x4 − 4x3 + 5x2 − x − 2 and 2x4 − 3x3 − 6x2 + 6x + 4.
So let us suppose that y is such a linear combination,
           y = 3x4 − 7x3 − x2 + 7x − 2
             = α1 (x4 − 4x3 + 5x2 − x − 2) + α2 (2x4 − 3x3 − 6x2 + 6x + 4)
             = (α1 + 2α2 )x4 + (−4α1 − 3α2 )x3 + (5α1 − 6α2 )x2
                 + (−α1 + 6α2 )x + (−2α1 + 4α2 )
   Notice that operations above are done in accordance with the definition of the
vector space of polynomials (Example VSP). Now, if we equate coefficients, which is
the definition of equality for polynomials, then we obtain the system of five linear
equations in two variables
                                     α1 + 2α2 = 3
                                  −4α1 − 3α2 = −7
                                    5α1 − 6α2 = −1
                                    −α1 + 6α2 = 7
                                  −2α1 + 4α2 = −2
      Build an augmented matrix   from the system and row-reduce,
                                                         
                        1    2       3           1   0    0
                      −4 −3        −7        0    1    0
                                       RREF 
                                                            
                                                            
                       5 −6        −1 −−−−→  0    0    1 
                      −1 6                   
                                     7           0   0    0
                       −2 4         −2           0   0    0
   Since the final column of the row-reduced augmented matrix is a pivot column,
Theorem RCLS tells us the system of equations is inconsistent. Therefore, there are
no scalars, α1 and α2 , to establish y as a linear combination of the elements in U .
So y 6∈ U .                                                                       4
      Let us again examine membership in a span.
Example SM32 A subspace of M32
The set of all 3 × 2 matrices forms a vector space when we use the operations of
matrix addition (Definition MA) and scalar matrix multiplication (Definition MSM),
as was shown in Example VSM. Consider the subset
            ("        # "         # "           # "         # "          #)
                3 1        1    1      3    −1       4    2       3    1
       S=       4 2 , 2 −1 , −1              2 , 1 −2 , −4 0
                5 −5      14 −1       −19 −11       14 −2        −17 7
and define a new subset of vectors W in M32 using the span (Definition SS), W = hSi.
So by Theorem SSS we know that W is a subspace of M32 . While W is an infinite set,
and this is a precise description, it would still be worthwhile to investigate whether
or not W contains certain elements.
   First, is
                                        "           #
                                          9      3
                                    y= 7         3
                                         10 −11
in W ? To answer this, we want to determine if y can be written as a linear combination
of the five matrices in S. Can we find scalars, α1 , α2 , α3 , α4 , α5 so that
 "          #
    9     3
    7     3
   10 −11
§S                A First Course in Linear Algebra                                 217

     "      #      "       #     "         #      "       #    "                     #
      3 1             1  1          3   −1          4   2          3               1
 = α1 4 2 + α2 2 −1 + α3 −1              2 + α4 1 −2 + α5 −4                       0
      5 −5           14 −1         −19 −11          14 −2        −17               7
   "                                                             #
      3α1 + α2 + 3α3 + 4α4 + 3α5       α1 + α2 − α3 + 2α4 + α5
 =     4α1 + 2α2 − α3 + α4 − 4α5        2α1 − α2 + 2α3 − 2α4
    5α1 + 14α2 − 19α3 + 14α4 − 17α5 −5α1 − α2 − 11α3 − 2α4 + 7α5
    Using our definition of matrix equality (Definition ME) we can translate this
statement into six equations in the five unknowns,
                           3α1 + α2 + 3α3 + 4α4 + 3α5 = 9
                               α1 + α2 − α3 + 2α4 + α5 = 3
                             4α1 + 2α2 − α3 + α4 − 4α5 = 7
                                  2α1 − α2 + 2α3 − 2α4 = 3
                     5α1 + 14α2 − 19α3 + 14α4 − 17α5 = 10
                        −5α1 − α2 − 11α3 − 2α4 + 7α5 = −11
   This is a linear system of equations, which we can represent with an augmented
matrix and row-reduce in search of solutions. The matrix that is row-equivalent to
the augmented matrix is
                                                5
                                                        
                            1    0    0    0     8    2
                          0     1    0    0 −19     −1
                                                4      
                          0     0    1    0    −7
                                                      0 
                                                8      
                                               17      
                          0     0    0    1          1 
                          0     0    0    0    0
                                                 8
                                                      0
                            0    0    0    0    0     0
    So we recognize that the system is consistent since the final column is not a pivot
column (Theorem RCLS), and compute n − r = 5 − 4 = 1 free variables (Theorem
FVCS). While there are infinitely many solutions, we are only in pursuit of a single
solution, so let us choose the free variable α5 = 0 for simplicity’s sake. Then we
easily see that α1 = 2, α2 = −1, α3 = 0, α4 = 1. So the scalars α1 = 2, α2 = −1,
α3 = 0, α4 = 1, α5 = 0 will provide a linear combination of the elements of S that
equals y, as we can verify by checking,
            "         #     "       #         "         #       "        #
              9     3         3 1               1     1            4   2
              7     3 = 2 4 2 + (−1) 2 −1 + (1) 1 −2
              10 −11          5 −5              14 −1             14 −2
So with one particular linear combination in hand, we are convinced that y deserves
to be a member of W = hSi.
   Second, is
                                        "      #
                                          2 1
                                   x= 3 1
                                          4 −2
in W ? To answer this, we want to determine if x can be written as a linear combination
of the five matrices in S. Can we find scalars, α1 , α2 , α3 , α4 , α5 so that
 "        #
   2 1
   3 1
   4 −2
       "       #       "        #      "            #      "           #      "       #
         3 1              1  1            3     −1             4     2           3  1
  = α1 4 2 + α2 2 −1 + α3 −1                     2 + α4 1 −2 + α5 −4 0
         5 −5            14 −1           −19 −11              14 −2             −17 7
    "                                                                          #
         3α1 + α2 + 3α3 + 4α4 + 3α5            α1 + α2 − α3 + 2α4 + α5
  =       4α1 + 2α2 − α3 + α4 − 4α5              2α1 − α2 + 2α3 − 2α4
     5α1 + 14α2 − 19α3 + 14α4 − 17α5 −5α1 − α2 − 11α3 − 2α4 + 7α5
Using our definition of matrix equality (Definition ME) we can translate this state-
ment into six equations in the five unknowns,
                            3α1 + α2 + 3α3 + 4α4 + 3α5 = 2
218                              Ro b e rt B e e z e r                              §S

                                α1 + α2 − α3 + 2α4 + α5 = 1
                             4α1 + 2α2 − α3 + α4 − 4α5 = 3
                                   2α1 − α2 + 2α3 − 2α4 = 1
                      5α1 + 14α2 − 19α3 + 14α4 − 17α5 = 4
                         −5α1 − α2 − 11α3 − 2α4 + 7α5 = −2
This is a linear system of equations, which we can represent with an augmented
matrix and row-reduce in search of solutions. The matrix that is row-equivalent to
the augmented matrix is
                                                      
                                                5
                            1   0    0    0     8   0
                                                      
                         0     1    0    0 − 38  8 0
                                                      
                         0     0    1    0    − 78 0
                                                      
                         0     0    0    1 − 17    0
                                                 8    
                         0     0    0    0     0   1
                            0   0    0    0     0   0
Since the last column is a pivot column, Theorem RCLS tells us that the system is
inconsistent. Therefore, there are no values for the scalars that will place x in W ,
and so we conclude that x 6∈ W .                                                  4
    Notice how Example SSP and Example SM32 contained questions about mem-
bership in a span, but these questions quickly became questions about solutions to
a system of linear equations. This will be a common theme going forward.

Subsection SC
Subspace Constructions
Several of the subsets of vectors spaces that we worked with in Chapter M are also
subspaces — they are closed under vector addition and scalar multiplication in Cm .
Theorem CSMS Column Space of a Matrix is a Subspace
Suppose that A is an m × n matrix. Then C(A) is a subspace of Cm .

Proof. Definition CSM shows us that C(A) is a subset of Cm , and that it is defined
as the span of a set of vectors from Cm (the columns of the matrix). Since C(A) is a
span, Theorem SSS says it is a subspace.                                          

   That was easy! Notice that we could have used this same approach to prove that
the null space is a subspace, since Theorem SSNS provided a description of the null
space of a matrix as the span of a set of vectors. However, I much prefer the current
proof of Theorem NSMS. Speaking of easy, here is a very easy theorem that exposes
another of our constructions as creating subspaces.
Theorem RSMS Row Space of a Matrix is a Subspace
Suppose that A is an m × n matrix. Then R(A) is a subspace of Cn .

Proof. Definition RSM says R(A) = C(At ), so the row space of a matrix is a column
space, and every column space is a subspace by Theorem CSMS. That’s enough.

      One more.
Theorem LNSMS Left Null Space of a Matrix is a Subspace
Suppose that A is an m × n matrix. Then L(A) is a subspace of Cm .

Proof. Definition LNS says L(A) = N (At ), so the left null space is a null space, and
every null space is a subspace by Theorem NSMS. Done.                                

    So the span of a set of vectors, and the null space, column space, row space and
left null space of a matrix are all subspaces, and hence are all vector spaces, meaning
they have all the properties detailed in Definition VS and in the basic theorems
presented in Section VS. We have worked with these objects as just sets in Chapter
V and Chapter M, but now we understand that they have much more structure. In
§S                  A First Course in Linear Algebra                                  219

particular, being closed under vector addition and scalar multiplication means a
subspace is also closed under linear combinations.

Reading Questions
1. Summarize the three conditions that allow us to quickly test if a set is a subspace.
2. Consider the set of vectors
                                                       
                                     a                   
                                 W =  b  3a − 2b + c = 5
                                     c                   

     Is the set W a subspace of C3 ? Explain your answer.
3. Name five general constructions of sets of column vectors (subsets of Cm ) that we now
   know as subspaces.

Exercises
                                                                
                                                                4
C15†     Working within the vector space C3 , determine if b = 3 is in the subspace W ,
                                                                1
                                 *       +
                                    3       1    1     2 
                           W =       2 , 0 , 1 , 1
                                    3       3    0     3 

                                                                 1
                                                                
                                                                1
C16†     Working within the vector space C4 , determine if b =   is in the subspace W ,
                                                                 0
                                                                 1
                                   * 1         1     2 +
                                          
                                        2  0 1
                                                        
                             W =       −1 , 3 , 1
                                     
                                                        
                                                         
                                         1      1     2

                                                                 2
                                                                
                                                                1
C17†     Working within the vector space C , determine if b =   is in the subspace W ,
                                           4
                                                                 2
                                                                 1
                                 * 1        1    0     1 +
                                          
                                     2 0 1 1
                                                          
                           W =       0 , 3 , 0 , 2
                                   
                                                          
                                                           
                                      2      1    2     0

C20† Working within the vector space P3 of polynomials of degree 3 or less, determine if
p(x) = x3 + 6x + 4 is in the subspace W below.
                        W = x3 + x2 + x, x3 + 2x − 6, x2 − 5
                               

C21†     Consider the subspace
                                                          
                                  2     1     4   0       −3 1
                         W =               ,          ,
                                  3    −1     2   3        2    1
                                                                
                                                       −3     3
of the vector space of 2 × 2 matrices, M22 . Is C =                an element of W ?
                                                        6    −4
                                                     
                                 x1
C25 Show that the set W =             3x1 − 5x2 = 12 from Example NSC2Z fails Property
                                 x2
AC and Property SC.
                                                     
                                 x1
C26 Show that the set Y =              x1 ∈ Z, x2 ∈ Z from Example NSC2S has Property
                                 x2
AC.
M20† In C3 , the vector space of column vectors of size 3, prove that the set Z is a
subspace.
                                                     
                              x1                       
                        Z = x2  4x1 − x2 + 5x3 = 0
                              x                        
                                  3
220                                Ro b e rt B e e z e r                                §S

T20† A square matrix A of size n is upper triangular if [A]ij = 0 whenever i > j. Let
U Tn be the set of all upper triangular matrices of size n. Prove that U Tn is a subspace of
the vector space of all square matrices of size n, Mnn .
T30† Let P be the set of all polynomials, of any degree. The set P is a vector space. Let
E be the subset of P consisting of all polynomials with only terms of even degree. Prove or
disprove: the set E is a subspace of P .
T31† Let P be the set of all polynomials, of any degree. The set P is a vector space. Let
F be the subset of P consisting of all polynomials with only terms of odd degree. Prove or
disprove: the set F is a subspace of P .
Section LISS
Linear Independence and Spanning Sets
A vector space is defined as a set with two operations, meeting ten properties
(Definition VS). Just as the definition of span of a set of vectors only required
knowing how to add vectors and how to multiply vectors by scalars, so it is with
linear independence. A definition of a linearly independent set of vectors in an
arbitrary vector space only requires knowing how to form linear combinations and
equating these with the zero vector. Since every vector space must have a zero vector
(Property Z), we always have a zero vector at our disposal.
    In this section we will also put a twist on the notion of the span of a set of
vectors. Rather than beginning with a set of vectors and creating a subspace that is
the span, we will instead begin with a subspace and look for a set of vectors whose
span equals the subspace.
    The combination of linear independence and spanning will be very important
going forward.


Subsection LI
Linear Independence
Our previous definition of linear independence (Definition LICV) employed a relation
of linear dependence that was a linear combination on one side of an equality and a
zero vector on the other side. As a linear combination in a vector space (Definition
LC) depends only on vector addition and scalar multiplication, and every vector
space must have a zero vector (Property Z), we can extend our definition of linear
independence from the setting of Cm to the setting of a general vector space V with
almost no changes. Compare these next two definitions with Definition RLDCV and
Definition LICV.
Definition RLD Relation of Linear Dependence
Suppose that V is a vector space. Given a set of vectors S = {u1 , u2 , u3 , . . . , un },
an equation of the form
                       α1 u1 + α2 u2 + α3 u3 + · · · + αn un = 0
is a relation of linear dependence on S. If this equation is formed in a trivial
fashion, i.e. αi = 0, 1 ≤ i ≤ n, then we say it is a trivial relation of linear
dependence on S.
                                                                              
Definition LI Linear Independence
Suppose that V is a vector space. The set of vectors S = {u1 , u2 , u3 , . . . , un } from
V is linearly dependent if there is a relation of linear dependence on S that is not
trivial. In the case where the only relation of linear dependence on S is the trivial
one, then S is a linearly independent set of vectors.                                    
    Notice the emphasis on the word “only.” This might remind you of the definition
of a nonsingular matrix, where if the matrix is employed as the coefficient matrix of
a homogeneous system then the only solution is the trivial one.
Example LIP4 Linear independence in P4
In the vector space of polynomials with degree 4 or less, P4 (Example VSP) consider
the set S below
  4
  2x + 3x3 + 2x2 − x + 10, −x4 − 2x3 + x2 + 5x − 8, 2x4 + x3 + 10x2 + 17x − 2
   Is this set of vectors linearly independent or dependent? Consider that
                                                               
      3 2x4 + 3x3 + 2x2 − x + 10 + 4 −x4 − 2x3 + x2 + 5x − 8
                                             
         + (−1) 2x4 + x3 + 10x2 + 17x − 2 = 0x4 + 0x3 + 0x2 + 0x + 0 = 0
This is a nontrivial relation of linear dependence (Definition RLD) on the set S and
so convinces us that S is linearly dependent (Definition LI).

                                           221
222                            Ro b e rt B e e z e r                           §L I S S

   Now, I hear you say, “Where did those scalars come from?” Do not worry about
that right now, just be sure you understand why the above explanation is sufficient to
prove that S is linearly dependent. The remainder of the example will demonstrate
how we might find these scalars if they had not been provided so readily.
   Let us look at another set of vectors (polynomials) from P4 . Let
                 
            T = 3x4 − 2x3 + 4x2 + 6x − 1, −3x4 + 1x3 + 0x2 + 4x + 2,
                 4x4 + 5x3 − 2x2 + 3x + 1, 2x4 − 7x3 + 4x2 + 2x + 1
      Suppose we have a relation of linear dependence on this set,
         0 = 0x4 + 0x3 + 0x2 + 0x + 0
                                                                       
           = α1 3x4 − 2x3 + 4x2 + 6x − 1 + α2 −3x4 + 1x3 + 0x2 + 4x + 2
                                                                        
             + α3 4x4 + 5x3 − 2x2 + 3x + 1 + α4 2x4 − 7x3 + 4x2 + 2x + 1
  Using our definitions of vector addition and scalar multiplication in P4 (Example
VSP), we arrive at,
  0x4 + 0x3 + 0x2 + 0x + 0 =
       (3α1 − 3α2 + 4α3 + 2α4 ) x4 + (−2α1 + α2 + 5α3 − 7α4 ) x3 +
       (4α1 − 2α3 + 4α4 ) x2 + (6α1 + 4α2 + 3α3 + 2α4 ) x + (−α1 + 2α2 + α3 + α4 )
      Equating coefficients, we arrive at the homogeneous system of equations,
                              3α1 − 3α2 + 4α3 + 2α4 = 0
                             −2α1 + α2 + 5α3 − 7α4 = 0
                                    4α1 − 2α3 + 4α4 = 0
                              6α1 + 4α2 + 3α3 + 2α4 = 0
                                −α1 + 2α2 + α3 + α4 = 0
   We form the coefficient matrix of this    homogeneous system of equations and
row-reduce to find
                                                      
                               1    0        0    0
                             0     1        0    0    
                                                      
                             0     0        1    0    
                                                      
                                                      
                               0    0        0    1
                               0    0        0    0
     We expected the system to be consistent (Theorem HSC) and so can compute
n − r = 4 − 4 = 0 and Theorem CSRN tells us that the solution is unique. Since
this is a homogeneous system, this unique solution is the trivial solution (Definition
TSHSE), α1 = 0, α2 = 0, α3 = 0, α4 = 0. So by Definition LI the set T is linearly
independent.
     A few observations. If we had discovered infinitely many solutions, then we could
have used one of the nontrivial solutions to provide a linear combination in the
manner we used to show that S was linearly dependent. It is important to realize
that it is not interesting that we can create a relation of linear dependence with zero
scalars — we can always do that — but for T , this is the only way to create a relation
of linear dependence. It was no accident that we arrived at a homogeneous system
of equations in this example, it is related to our use of the zero vector in defining a
relation of linear dependence. It is easy to present a convincing statement that a set
is linearly dependent (just exhibit a nontrivial relation of linear dependence) but a
convincing statement of linear independence requires demonstrating that there is
no relation of linear dependence other than the trivial one. Notice how we relied
on theorems from Chapter SLE to provide this demonstration. Whew! There is a
lot going on in this example. Spend some time with it, we will be waiting patiently
right here when you get back.                                                        4
Example LIM32 Linear independence in M32
Consider the two sets of vectors R and S from the vector space of all 3 × 2 matrices,
§L I S S            A First Course in Linear Algebra                             223

M32 (Example VSM)
               ("            # "          #   "           # "        #)
                  3       −1    −2      3         6    −6      7   9
           R=     1        4 , 1       −3 ,       −1    0 , −4 −5
                  6       −6    −2     −6         7    −9      2   5
               ("            # "          #   "         # "       #)
                  2        0     −4     0         1    1    −5 3
           S=     1       −1 , −2       2 ,       −2   1 , −10 7
                  1        3     −2    −6         2    4     2   0
   One set is linearly independent, the other is not. Which is which? Let us examine
R first. Build a generic relation of linear dependence (Definition RLD),
            "       #       "         #      "        #      "        #
              3 −1           −2 3              6 −6            7    9
         α1 1 4 + α2 1 −3 + α3 −1 0 + α4 −4 −5 = 0
              6 −6           −2 −6             7 −9            2    5
   Massaging the left-hand side with our definitions of vector addition and scalar
multiplication in M32 (Example VSM) we obtain,
          "                                                  # "       #
            3α1 − 2α2 + 6α3 + 7α4 −α1 + 3α2 − 6α3 + 9α4           0 0
             α1 + α2 − α3 − 4α4       4α1 − 3α2 + −5α4         = 0 0
            6α1 − 2α2 + 7α3 + 2α4 −6α1 − 6α2 − 9α3 + 5α4          0 0
   Using our definition of matrix equality (Definition ME) to equate entries we get
the homogeneous system of six equations in four variables,
                             3α1 − 2α2 + 6α3 + 7α4 = 0
                             −α1 + 3α2 − 6α3 + 9α4 = 0
                                 α1 + α2 − α3 − 4α4 = 0
                                  4α1 − 3α2 + −5α4 = 0
                             6α1 − 2α2 + 7α3 + 2α4 = 0
                           −6α1 − 6α2 − 9α3 + 5α4 = 0
   Form the coefficient matrix of this homogeneous system and row-reduce to obtain
                                                 
                                  1     0   0   0
                               0       1   0   0
                                                 
                               0       0   1   0
                                                 
                                                 
                               0       0   0   1
                               0       0   0   0
                                  0     0   0   0
    Analyzing this matrix we are led to conclude that α1 = 0, α2 = 0, α3 = 0, α4 = 0.
This means there is only a trivial relation of linear dependence on the vectors of R
and so we call R a linearly independent set (Definition LI).
    So it must be that S is linearly dependent. Let us see if we can find a nontrivial
relation of linear dependence on S. We will begin as with R, by constructing a
relation of linear dependence (Definition RLD) with unknown scalars,
              "      #      "         #      "        #      "        #
                2 0           −4 0               1 1           −5 3
          α1 1 −1 + α2 −2 2 + α3 −2 1 + α4 −10 7 = 0
                1 3           −2 −6              2 4            2   0
   Massaging the left-hand side with our definitions of vector addition and scalar
multiplication in M32 (Example VSM) we obtain,
           "                                               # "        #
             2α1 − 4α2 + α3 − 5α4          α3 + 3α4              0 0
             α1 − 2α2 − 2α3 − 10α4 −α1 + 2α2 + α3 + 7α4 = 0 0
             α1 − 2α2 + 2α3 + 2α4      3α1 − 6α2 + 4α3           0 0
   Using our definition of matrix equality (Definition ME) to equate entries we get
the homogeneous system of six equations in four variables,
                             2α1 − 4α2 + α3 − 5α4 = 0
                                           α3 + 3α4 = 0
                            α1 − 2α2 − 2α3 − 10α4 = 0
224                             Ro b e rt B e e z e r                          §L I S S

                               −α1 + 2α2 + α3 + 7α4 = 0
                                α1 − 2α2 + 2α3 + 2α4 = 0
                                      3α1 − 6α2 + 4α3 = 0
      Form the coefficient matrix of this homogeneous system and row-reduce to obtain
                                                    
                                     1 −2 0 −4
                                  0       0   1   3
                                                    
                                  0       0   0   0
                                                    
                                  0       0   0   0
                                                    
                                     0     0   0   0
                                     0     0   0   0
    Analyzing this we see that the system is consistent (we expected this since the
system is homogeneous, Theorem HSC) and has n − r = 4 − 2 = 2 free variables,
namely α2 and α4 . This means there are infinitely many solutions, and in particular,
we can find a nontrivial solution, so long as we do not pick all of our free variables
to be zero. The mere presence of a nontrivial solution for these scalars is enough to
conclude that S is a linearly dependent set (Definition LI). But let us go ahead and
explicitly construct a nontrivial relation of linear dependence.
    Choose α2 = 1 and α4 = −1. There is nothing special about this choice, there
are infinitely many possibilities, some “easier” than this one, just avoid picking both
variables to be zero. (Why not?) Then we find the dependent variables to have values
α1 = −2 and α3 = 3. So the relation of linear dependence,
           "      #      "           #      "       #        "         # "        #
            2 0            −4 0               1 1               −5 3         0 0
     (−2) 1 −1 + (1) −2 2 + (3) −2 1 + (−1) −10 7 = 0 0
            1 3            −2 −6              2 4                2    0      0 0
is an iron-clad demonstration that S is linearly dependent. Can you construct another
such demonstration?                                                                4
Example LIC Linearly independent set in the crazy vector space
Is the set R = {(1, 0), (6, 3)} linearly independent in the crazy vector space C
(Example CVS)?
    We begin with an arbitrary relation of linear dependence on R
                 0 = a1 (1, 0) + a2 (6, 3)              Definition RLD


and then massage it to a point where we can apply the definition of equality in C.
Recall the definitions of vector addition and scalar multiplication in C are not what
you would expect.
 ( − 1, −1)
  =0                                                                 Example CVS
  = a1 (1, 0) + a2 (6, 3)                                            Definition RLD
  = (1a1 + a1 − 1, 0a1 + a1 − 1) + (6a2 + a2 − 1, 3a2 + a2 − 1)      Example CVS
  = (2a1 − 1, a1 − 1) + (7a2 − 1, 4a2 − 1)
  = (2a1 − 1 + 7a2 − 1 + 1, a1 − 1 + 4a2 − 1 + 1)                    Example CVS
  = (2a1 + 7a2 − 1, a1 + 4a2 − 1)
      Equality in C (Example CVS) then yields the two equations,
                                   2a1 + 7a2 − 1 = −1
                                    a1 + 4a2 − 1 = −1
which becomes the homogeneous system
                                      2a1 + 7a2 = 0
                                       a1 + 4a2 = 0
   Since the coefficient matrix of this system is nonsingular (check this!) the system
has only the trivial solution a1 = a2 = 0. By Definition LI the set R is linearly
independent. Notice that even though the zero vector of C is not what we might
§L I S S              A First Course in Linear Algebra                              225

have first suspected, a question about linear independence still concludes with a
question about a homogeneous system of equations. Hmmm.                        4

Subsection SS
Spanning Sets
In a vector space V , suppose we are given a set of vectors S ⊆ V . Then we can
immediately construct a subspace, hSi, using Definition SS and then be assured
by Theorem SSS that the construction does provide a subspace. We now turn the
situation upside-down. Suppose we are first given a subspace W ⊆ V . Can we find a
set S so that hSi = W ? Typically W is infinite and we are searching for a finite set
of vectors S that we can combine in linear combinations and “build” all of W .
    I like to think of S as the raw materials that are sufficient for the construction of
W . If you have nails, lumber, wire, copper pipe, drywall, plywood, carpet, shingles,
paint (and a few other things), then you can combine them in many different ways
to create a house (or infinitely many different houses for that matter). A fast-food
restaurant may have beef, chicken, beans, cheese, tortillas, taco shells and hot sauce
and from this small list of ingredients build a wide variety of items for sale. Or
maybe a better analogy comes from Ben Cordes — the additive primary colors (red,
green and blue) can be combined to create many different colors by varying the
intensity of each. The intensity is like a scalar multiple, and the combination of the
three intensities is like vector addition. The three individual colors, red, green and
blue, are the elements of the spanning set.
    Because we will use terms like “spanned by” and “spanning set,” there is the
potential for confusion with “the span.” Come back and reread the first paragraph of
this subsection whenever you are uncertain about the difference. Here is the working
definition.
Definition SSVS Spanning Set of a Vector Space
Suppose V is a vector space. A subset S of V is a spanning set of V if hSi = V .
In this case, we also frequently say S spans V .                             
    The definition of a spanning set requires that two sets (subspaces actually) be
equal. If S is a subset of V , then hSi ⊆ V , always. Thus it is usually only necessary
to prove that V ⊆ hSi. Now would be a good time to review Definition SE.
Example SSP4 Spanning set in P4
In Example SP4 we showed that
                             W = { p(x)| p ∈ P4 , p(2) = 0}
is a subspace of P4 , the vector space of polynomials with degree at most 4 (Example
VSP). In this example, we will show that the set
         
     S = x − 2, x2 − 4x + 4, x3 − 6x2 + 12x − 8, x4 − 8x3 + 24x2 − 32x + 16
is a spanning set for W . To do this, we require that W = hSi. This is an equality of
sets. We can check that every polynomial in S has x = 2 as a root and therefore
S ⊆ W . Since W is closed under addition and scalar multiplication, hSi ⊆ W also.
    So it remains to show that W ⊆ hSi (Definition SE). To do this, begin by
choosing an arbitrary polynomial in W , say r(x) = ax4 + bx3 + cx2 + dx + e ∈ W .
This polynomial is not as arbitrary as it would appear, since we also know it must
have x = 2 as a root. This translates to
              0 = a(2)4 + b(2)3 + c(2)2 + d(2) + e = 16a + 8b + 4c + 2d + e
as a condition on r.
    We wish to show that r is a polynomial in hSi, that is, we want to show that r
can be written as a linear combination of the vectors (polynomials) in S. So let us
try.
           r(x) = ax4 + bx3 + cx2 + dx + e
                                                                   
               = α1 (x − 2) + α2 x2 − 4x + 4 + α3 x3 − 6x2 + 12x − 8
                                                 
                 + α4 x4 − 8x3 + 24x2 − 32x + 16
226                            Ro b e rt B e e z e r                           §L I S S

             = α4 x4 + (α3 − 8α4 ) x3 + (α2 − 6α3 + 24α4 ) x2
               + (α1 − 4α2 + 12α3 − 32α4 ) x + (−2α1 + 4α2 − 8α3 + 16α4 )
   Equating coefficients (vector equality in P4 ) gives the system of five equations in
four variables,
                                                       α4 = a
                                             α3 − 8α4 = b
                                     α2 − 6α3 + 24α4 = c
                             α1 − 4α2 + 12α3 − 32α4 = d
                            −2α1 + 4α2 − 8α3 + 16α4 = e


    Any solution to this system of equations will provide the linear combination we
need to determine if r ∈ hSi, but we need to be convinced there is a solution for any
values of a, b, c, d, e that qualify r to be a member of W . So the question is: is this
system of equations consistent? We will form the augmented matrix, and row-reduce.
(We probably need to do this by hand, since the matrix is symbolic — reversing the
order of the first four rows is the best way to start). We obtain a matrix in reduced
row-echelon form
                                                                 
                         1    0     0    0     32a + 12b + 4c + d
                       0     1     0    0        24a + 6b + c    
                                                                 
                       0     0     1    0           8a + b       
                                                                 
                                                                 
                         0    0     0    1             a
                         0    0     0    0 16a + 8b + 4c + 2d + e
                                                              
                         1    0     0    0 32a + 12b + 4c + d
                       0     1     0    0      24a + 6b + c 
                                                              
                    = 0     0     1    0         8a + b      
                                                               
                                                              
                         0    0     0    1           a
                         0    0     0    0           0
    For your results to match our first matrix, you may find it necessary to multiply
the final row of your row-reduced matrix by the appropriate scalar, and/or add
multiples of this row to some of the other rows. To obtain the second version of the
matrix, the last entry of the last column has been simplified to zero according to the
one condition we were able to impose on an arbitrary polynomial from W . Since the
last column is not a pivot column, Theorem RCLS tells us this system is consistent.
Therefore, any polynomial from W can be written as a linear combination of the
polynomials in S, so W ⊆ hSi. Therefore, W = hSi and S is a spanning set for W
by Definition SSVS.
    Notice that an alternative to row-reducing the augmented matrix by hand would
be to appeal to Theorem FS by expressing the column space of the coefficient matrix
as a null space, and then verifying that the condition on r guarantees that r is in
the column space, thus implying that the system is always consistent. Give it a try,
we will wait. This has been a complicated example, but worth studying carefully.4
    Given a subspace and a set of vectors, as in Example SSP4 it can take some
work to determine that the set actually is a spanning set. An even harder problem is
to be confronted with a subspace and required to construct a spanning set with no
guidance. We will now work an example of this flavor, but some of the steps will be
unmotivated. Fortunately, we will have some better tools for this type of problem
later on.
Example SSM22 Spanning set in M22
In the space of all 2 × 2 matrices, M22 consider the subspace
                                                                   
                   a b
          Z=              a + 3b − c − 5d = 0, −2a − 6b + 3c + 14d = 0
                    c d
and find a spanning set for Z.
   We need to construct a limited number of matrices in Z so that every matrix
§L I S S                A First Course in Linear Algebra                          227

in Z can be expressed
                    as a linear combination of this limited number of matrices.
                    a b
Suppose that B =           is a matrix in Z. Then we can form a column vector with
                    c d
the entries of B and write
                         
                           a                          
                        b            1    3 −1 −5
                         c  ∈ N −2 −6 3            14
                           d
    Row-reducing this matrix and applying Theorem REMES we obtain the equivalent
statement,
                          
                           a                       
                         b           1 3 0 −1
                         c ∈ N
                                       0 0 1       4
                           d
   We can then express the subspace Z in the following equal forms,
                                                                 
                a b
         Z=             a + 3b − c − 5d = 0, −2a − 6b + 3c + 14d = 0
                c d
                                                     
                a b
           =            a + 3b − d = 0, c + 4d = 0
                c d
                                                
                a b
           =            a = −3b + d, c = −4d
                c d
                                        
                −3b + d b
           =                     b, d ∈ C
                  −4d        d
                                                
                −3b b            d     0
           =               +                b, d ∈ C
                 0     0       −4d d
                                                 
                 −3 1              1 0
           = b             +d                b, d ∈ C
                  0 0             −4 1
                                 
                  −3 1          1 0
           =               ,
                   0 0         −4 1
   So the set
                                                        
                                      −3       1     1    0
                                 Q=               ,
                                       0       0    −4    1
spans Z by Definition SSVS.                                                        4
Example SSC Spanning set in the crazy vector space
In Example LIC we determined that the set R = {(1, 0), (6, 3)} is linearly indepen-
dent in the crazy vector space C (Example CVS). We now show that R is a spanning
set for C.
    Given an arbitrary vector (x, y) ∈ C we desire to show that it can be written as
a linear combination of the elements of R. In other words, are there scalars a1 and
a2 so that
                                 (x, y) = a1 (1, 0) + a2 (6, 3)
   We will act as if this equation is true and try to determine just what a1 and
a2 would be (as functions of x and y). Recall that our vector space operations are
unconventional and are defined in Example CVS.
           (x, y) = a1 (1, 0) + a2 (6, 3)
                  = (1a1 + a1 − 1, 0a1 + a1 − 1) + (6a2 + a2 − 1, 3a2 + a2 − 1)
                  = (2a1 − 1, a1 − 1) + (7a2 − 1, 4a2 − 1)
                  = (2a1 − 1 + 7a2 − 1 + 1, a1 − 1 + 4a2 − 1 + 1)
                  = (2a1 + 7a2 − 1, a1 + 4a2 − 1)
   Equality in C then yields the two equations,
                                      2a1 + 7a2 − 1 = x
228                            Ro b e rt B e e z e r                            §L I S S

                                   a1 + 4a2 − 1 = y
which becomes the linear system with a matrix representation
                                            
                              2 7 a1        x+1
                                        =
                              1 4 a2        y+1
   The coefficient matrix of this system is nonsingular, hence invertible (Theorem
NI), and we can employ its inverse to find a solution (Theorem TTMI, Theorem
SNCM),
                   −1                                         
         a1       2 7       x+1         4 −7 x + 1           4x − 7y − 3
              =                     =                    =
         a2       1 4       y+1        −1 2       y+1       −x + 2y + 1
    We could chase through the above implications backwards and take the existence
of these solutions as sufficient evidence for R being a spanning set for C. Instead, let
us view the above as simply scratchwork and now get serious with a simple direct
proof that R is a spanning set. Ready? Suppose (x, y) is any vector from C, then
compute the following linear combination using the definitions of the operations in
C,
(4x − 7y − 3)(1, 0) + (−x + 2y + 1)(6, 3)
      = (1(4x − 7y − 3) + (4x − 7y − 3) − 1, 0(4x − 7y − 3) + (4x − 7y − 3) − 1) +
        (6(−x + 2y + 1) + (−x + 2y + 1) − 1, 3(−x + 2y + 1) + (−x + 2y + 1) − 1)
      = (8x − 14y − 7, 4x − 7y − 4) + (−7x + 14y + 6, −4x + 8y + 3)
      = ((8x − 14y − 7) + (−7x + 14y + 6) + 1, (4x − 7y − 4) + (−4x + 8y + 3) + 1)
      = (x, y)
    This final sequence of computations in C is sufficient to demonstrate that any
element of C can be written (or expressed) as a linear combination of the two vectors
in R, so C ⊆ hRi. Since the reverse inclusion hRi ⊆ C is trivially true, C = hRi and
we say R spans C (Definition SSVS). Notice that this demonstration is no more or
less valid if we hide from the reader our scratchwork that suggested a1 = 4x − 7y − 3
and a2 = −x + 2y + 1.                                                              4


Subsection VR
Vector Representation
In Chapter R we will take up the matter of representations fully, where Theorem
VRRB will be critical for Definition VR. We will now motivate and prove a critical
theorem that tells us how to “represent” a vector. This theorem could wait, but
working with it now will provide some extra insight into the nature of linearly
independent spanning sets. First an example, then the theorem.
Example AVR A vector representation
Consider the set
                         (" # " # "        #)
                            −7      −6 −12
                    S=       5 , 5 ,    7
                             1       0  4
from the vector space C3 . Let A be the matrix whose columns are the set S, and
verify that A is nonsingular. By Theorem NMLIC the elements of S form a linearly
independent set. Suppose that b ∈ C3 . Then LS(A, b) has a (unique) solution
(Theorem NMUS) and hence is consistent. By Theorem SLSLC, b ∈ hSi. Since b
is arbitrary, this is enough to show that hSi = C3 , and therefore S is a spanning
set for C3 (Definition SSVS). (This set comes from the columns of the coefficient
matrix of Archetype B.)
                                                                   "    #
                                                                     −33
    Now examine the situation for a particular choice of b, say b = 24 . Because
                                                                      5
S is a spanning set for C , we know we can write b as a linear combination of the
                          3
§L I S S              A First Course in Linear Algebra                                       229

vectors in S,
                      "       #     " #     " #     "    #
                          −33        −7      −6      −12
                           24 = (−3) 5 + (5) 5 + (2) 7 .
                            5        1       0        4
    The nonsingularity of the matrix A tells that the scalars in this linear combination
are unique. More precisely, it is the linear independence of S that provides the
uniqueness. We will refer to the scalars a1 = −3, a2 = 5, a3 = 2 as a “representation
of b relative to S.” In other words, once we settle on S as a linearly independent
set that spans C3 , the vector b is recoverable just by knowing the scalars a1 = −3,
a2 = 5, a3 = 2 (use these scalars in a linear combination of the vectors in S). This is
all an illustration of the following important theorem, which we prove in the setting
of a general vector space.                                                            4
Theorem VRRB Vector Representation Relative to a Basis
Suppose that V is a vector space and B = {v1 , v2 , v3 , . . . , vm } is a linearly inde-
pendent set that spans V . Let w be any vector in V . Then there exist unique scalars
a1 , a2 , a3 , . . . , am such that
                           w = a1 v1 + a2 v2 + a3 v3 + · · · + am vm .

Proof. That w can be written as a linear combination of the vectors in B follows
from the spanning property of the set (Definition SSVS). This is good, but not the
meat of this theorem. We now know that for any choice of the vector w there exist
some scalars that will create w as a linear combination of the basis vectors. The
real question is: Is there more than one way to write w as a linear combination of
{v1 , v2 , v3 , . . . , vm }? Are the scalars a1 , a2 , a3 , . . . , am unique? (Proof Technique
U)
     Assume there are two different linear combinations of {v1 , v2 , v3 , . . . , vm }
that equal the vector w. In other words there exist scalars a1 , a2 , a3 , . . . , am and
b1 , b2 , b3 , . . . , bm so that
                           w = a1 v1 + a2 v2 + a3 v3 + · · · + am vm
                           w = b1 v1 + b2 v2 + b3 v3 + · · · + bm vm .
    Then notice that
    0 = w + (−w)                                               Property AI
      = w + (−1)w                                              Theorem AISM
      = (a1 v1 + a2 v2 + a3 v3 + · · · + am vm )+
           (−1)(b1 v1 + b2 v2 + b3 v3 + · · · + bm vm )
      = (a1 v1 + a2 v2 + a3 v3 + · · · + am vm )+
           (−b1 v1 − b2 v2 − b3 v3 − . . . − bm vm )           Property DVA
      = (a1 − b1 )v1 + (a2 − b2 )v2 + (a3 − b3 )v3 +
           · · · + (am − bm )vm                                Property C, Property DSA
   But this is a relation of linear dependence on a linearly independent set of
vectors (Definition RLD)! Now we are using the other assumption about B, that
{v1 , v2 , v3 , . . . , vm } is a linearly independent set. So by Definition LI it must
happen that the scalars are all zero. That is,
    (a1 − b1 ) = 0         (a2 − b2 ) = 0    (a3 − b3 ) = 0       ...    (am − bm ) = 0
            a1 = b1               a2 = b2           a 3 = b3      ...           a m = bm .
    And so we find that the scalars are unique.                                               

    The converse of Theorem VRRB is true as well, but is not important enough to
rise beyond an exercise (see Exercise LISS.T51).
    This is a very typical use of the hypothesis that a set is linearly independent —
obtain a relation of linear dependence and then conclude that the scalars must all
be zero. The result of this theorem tells us that we can write any vector in a vector
space as a linear combination of the vectors in a linearly independent spanning set,
230                                Ro b e rt B e e z e r                                 §L I S S

but only just. There is only enough raw material in the spanning set to write each
vector one way as a linear combination. So in this sense, we could call a linearly
independent spanning set a “minimal spanning set.” These sets are so important
that we will give them a simpler name (“basis”) and explore their properties further
in the next section.

Reading Questions

1. Is the set of matrices below linearly independent or linearly dependent in the vector
   space M22 ? Why or why not?
                                                       
                                 1   3     −2   3      0    9
                                        ,          ,
                                −2 4       3   −5     −1 3

2. Explain the difference between the following two uses of the term “span”:

       1. S is a subset of the vector space V and the span of S is a subspace of V .
       2. W is a subspace of the vector space Y and T spans W .

3. The set
                                             
                                        6     4      5 
                                    S = 2 , −3 , 8
                                        1     1      2 
                                                               
                                                               −6
   is linearly independent and spans C . Write the vector x =  2  as a linear combination
                                             3

                                                                2
   of the elements of S. How many ways are there to answer this question, and which
   theorem allows you to say so?

Exercises
C20† In the vector space of 2 × 2 matrices, M22 , determine if the set S below is linearly
independent.
                                                    
                                2 −1        0    4    4 2
                        S=              ,          ,
                                1   3      −1 2       1 3

C21† In the crazy vector space C (Example CVS), is the set S = {(0, 2), (2, 8)} linearly
independent?
C22† In the vector space of polynomials P3 , determine if the set S is linearly independent
or linearly dependent.
                S = 2 + x − 3x2 − 8x3 , 1 + x + x2 + 5x3 , 3 − 4x2 − 7x3
                    


C23† Determine if the set S = {(3, 1), (7, 3)} is linearly independent in the crazy vector
space C (Example CVS).
C24† In the vector space of real-valued functions F = { f | f : R → R}, determine if the
following set S is linearly independent.
                                   S = sin2 x, cos2 x, 2
                                       


C25†     Let
                                                                    
                                         1       2     2    1     0   1
                              S=                    ,          ,
                                         2       1     −1   2     1   2

   1. Determine if S spans M22 .

   2. Determine if S is linearly independent.

C26†     Let
                                                                          
                              1   2     2         1     0   1     1   0     1   4
                    S=               ,               ,         ,         ,
                              2   1     −1        2     1   2     1   1     0   3

   1. Determine if S spans M22 .

   2. Determine if S is linearly independent.
§L I S S               A First Course in Linear Algebra                                      231

C30 In Example LIM32, find another nontrivial relation of linear dependence on the
linearly dependent set of 3 × 2 matrices, S.
C40† Determine if the set T = x2 − x + 5, 4x3 − x2 + 5x, 3x + 2 spans the vector
                                   

space of polynomials with degree 4 or less, P4 .
C41† The set W is a subspace of M22 , the vector space of all 2 × 2 matrices. Prove that
S is a spanning set for W .
                                                                     
              a b                                     1 0       0   1     0 0
      W =             2a − 3b + 4c − d = 0    S=             ,         ,
              c d                                     0 2       0 −3      1 4

C42† Determine if the set S = {(3, 1), (7, 3)} spans the crazy vector space C (Example
CVS).
M10†       Halfway through Example SSP4, we need to show that the system of equations
                                 0    0    0     1      a
                                                    
                               0    0    1    −8   b 
                           LS  0    1   −6    24  ,  c 
                                                    
                               1   −4 12 −32 d
                                −2    4   −8    16       e
is consistent for every choice of the vector of constants satisfying 16a + 8b + 4c + 2d + e = 0.

Express the column space of the coefficient matrix of this system as a null space, us-
ing Theorem FS. From this use Theorem CSCS to establish that the system is always
consistent. Notice that this approach removes from Example SSP4 the need to row-reduce
a symbolic matrix.
T20† Suppose that S is a finite linearly independent set of vectors from the vector space
V . Let T be any subset of S. Prove that T is linearly independent.
T40 Prove the following variant of Theorem EMMVP that has a weaker hypothesis: Sup-
pose that C = {u1 , u2 , u3 , . . . , up } is a linearly independent spanning set for Cn . Suppose
also that A and B are m×n matrices such that Aui = Bui for every 1 ≤ i ≤ n. Then A = B.

Can you weaken the hypothesis even further while still preserving the conclusion?
T50† Suppose that V is a vector space and u, v ∈ V are two vectors in V . Use the
definition of linear independence to prove that S = {u, v} is a linearly dependent set if
and only if one of the two vectors is a scalar multiple of the other. Prove this directly in
the context of an abstract vector space (V ), without simply giving an upgraded version of
Theorem DLDS for the special case of just two vectors.
T51†    Carefully formulate the converse of Theorem VRRB and provide a proof.
232   Ro b e rt B e e z e r   §L I S S
Section B
Bases
A basis of a vector space is one of the most useful concepts in linear algebra. It often
provides a concise, finite description of an infinite vector space.


Subsection B
Bases
We now have all the tools in place to define a basis of a vector space.
Definition B Basis
Suppose V is a vector space. Then a subset S ⊆ V is a basis of V if it is linearly
independent and spans V .                                                       
     So, a basis is a linearly independent spanning set for a vector space. The re-
quirement that the set spans V insures that S has enough raw material to build V ,
while the linear independence requirement insures that we do not have any more
raw material than we need. As we shall see soon in Section D, a basis is a minimal
spanning set.
     You may have noticed that we used the term basis for some of the titles of previous
theorems (e.g. Theorem BNS, Theorem BCS, Theorem BRS) and if you review each
of these theorems you will see that their conclusions provide linearly independent
spanning sets for sets that we now recognize as subspaces of Cm . Examples associated
with these theorems include Example NSLIL, Example CSOCD and Example IAS.
As we will see, these three theorems will continue to be powerful tools, even in the
setting of more general vector spaces.
     Furthermore, the archetypes contain an abundance of bases. For each coefficient
matrix of a system of equations, and for each archetype defined simply as a matrix,
there is a basis for the null space, three bases for the column space, and a basis for
the row space. For this reason, our subsequent examples will concentrate on bases
for vector spaces other than Cm .
     Notice that Definition B does not preclude a vector space from having many
bases, and this is the case, as hinted above by the statement that the archetypes
contain three bases for the column space of a matrix. More generally, we can grab
any basis for a vector space, multiply any one basis vector by a nonzero scalar and
create a slightly different set that is still a basis. For “important” vector spaces, it
will be convenient to have a collection of “nice” bases. When a vector space has
a single particularly nice basis, it is sometimes called the standard basis though
there is nothing precise enough about this term to allow us to define it formally —
it is a question of style. Here are some nice bases for important vector spaces.
Theorem SUVB Standard Unit Vectors are a Basis
The set of standard unit vectors for Cm (Definition SUV), B = { ei | 1 ≤ i ≤ m} is a
basis for the vector space Cm .

Proof. We must show that the set B is both linearly independent and a spanning
set for Cm . First, the vectors in B are, by Definition SUV, the columns of the
identity matrix, which we know is nonsingular (since it row-reduces to the identity
matrix, Theorem NMRRI). And the columns of a nonsingular matrix are linearly
independent by Theorem NMLIC.
    Suppose we grab an arbitrary vector from Cm , say
                                         
                                          v1
                                         v2 
                                        v 
                                    v=     3
                                         . .
                                         .. 
                                          vm
   Can we write v as a linear combination of the vectors in B? Yes, and quite

                                          233
234                                  Ro b e rt B e e z e r                          §B

simply.
                                                         
                    v1        1        0        0                0
                   v2      0      1      0              0
                  v                                      
                   3  = v1 0 + v2 0 + v3 1 + · · · + vm 0
                   .       .      .      .              .
                   ..       ..     ..     ..             .. 
                   vm            0          0         0             1
                        v = v1 e1 + v2 e2 + v3 e3 + · · · + vm em
    This shows that Cm ⊆ hBi, which is sufficient to show that B is a spanning set
for Cm .                                                                        

Example BP Bases for Pn
The vector space of polynomials with degree at most n, Pn , has the basis
                               
                          B = 1, x, x2 , x3 , . . . , xn .
  Another nice basis for Pn is
     
  C = 1, 1 + x, 1 + x + x2 , 1 + x + x2 + x3 , . . . , 1 + x + x2 + x3 + · · · + xn .
   Checking that each of B and C is a linearly independent spanning set are good
exercises.                                                                    4
Example BM A basis for the vector space of matrices
In the vector space Mmn of matrices (Example VSM) define the matrices Bk` ,
1 ≤ k ≤ m, 1 ≤ ` ≤ n by
                                   (
                                    1 if k = i, ` = j
                        [Bk` ]ij =
                                    0 otherwise
   So these matrices have entries that are all zeros, with the exception of a lone
entry that is one. The set of all mn of them,
                            B = { Bk` | 1 ≤ k ≤ m, 1 ≤ ` ≤ n}
forms a basis for Mmn . See Exercise B.M20.                                             4
   The bases described above will often be convenient ones to work with. However
a basis does not have to obviously look like a basis.
Example BSP4 A basis for a subspace of P4
In Example SSP4 we showed that
       
    S = x − 2, x2 − 4x + 4, x3 − 6x2 + 12x − 8, x4 − 8x3 + 24x2 − 32x + 16
is a spanning set for W = { p(x)| p ∈ P4 , p(2) = 0}. We will now show that S is also
linearly independent in W . Begin with a relation of linear dependence,
       0 + 0x + 0x2 + 0x3 + 0x4
                                                                  
              = α1 (x − 2) + α2 x2 − 4x + 4 + α3 x3 − 6x2 + 12x − 8
                                                  
                  + α4 x4 − 8x3 + 24x2 − 32x + 16
              = α4 x4 + (α3 − 8α4 ) x3 + (α2 − 6α3 + 24α4 ) x2
                  + (α1 − 4α2 + 12α3 − 32α4 ) x + (−2α1 + 4α2 − 8α3 + 16α4 )
    Equating coefficients (vector equality in P4 ) gives the homogeneous system of
five equations in four variables,
                                                          α4 = 0
                                                 α3 − 8α4 = 0
                                         α2 − 6α3 + 24α4 = 0
                                α1 − 4α2 + 12α3 − 32α4 = 0
                              −2α1 + 4α2 − 8α3 + 16α4 = 0


      We form the coefficient matrix, and row-reduce to obtain a matrix in reduced
§B                 A First Course in Linear Algebra                               235

row-echelon form
                                                     
                                    1   0    0    0
                                   0   1    0    0   
                                                     
                                   0   0    1    0   
                                                     
                                                     
                                    0   0    0    1
                                    0   0    0    0
    With only the trivial solution to this homogeneous system, we conclude that
only scalars that will form a relation of linear dependence are the trivial ones, and
therefore the set S is linearly independent (Definition LI). Finally, S has earned the
right to be called a basis for W (Definition B).                                    4
Example BSM22 A basis for a subspace of M22
In Example SSM22 we discovered that
                                        
                                −3 1    1 0
                        Q=           ,
                                 0 0   −4 1
is a spanning set for the subspace
                                                                   
                   a b
          Z=              a + 3b − c − 5d = 0, −2a − 6b + 3c + 14d = 0
                   c d
of the vector space of all 2 × 2 matrices, M22 . If we can also determine that Q is
linearly independent in Z (or in M22 ), then it will qualify as a basis for Z.
    Let us begin with a relation of linear dependence.
                                                        
                         0 0           −3 1            1 0
                                = α1           + α2
                         0 0            0 0           −4 1
                                                   
                                    −3α1 + α2 α1
                                =
                                       −4α2      α2
    Using our definition of matrix equality (Definition ME) we equate entries and
get a homogeneous system of four equations in two variables,
                                    −3α1 + α2 = 0
                                            α1 = 0
                                        −4α2 = 0
                                            α2 = 0
    We could row-reduce the coefficient matrix of this homogeneous system, but it is
not necessary. The second and fourth equations tell us that α1 = 0, α2 = 0 is the
only solution to this homogeneous system. This qualifies the set Q as being linearly
independent, since the only relation of linear dependence is trivial (Definition LI).
Therefore Q is a basis for Z (Definition B).                                      4
Example BC Basis for the crazy vector space
In Example LIC and Example SSC we determined that the set R = {(1, 0), (6, 3)}
from the crazy vector space, C (Example CVS), is linearly independent and is a
spanning set for C. By Definition B we see that R is a basis for C.         4
    We have seen that several of the sets associated with a matrix are subspaces
of vector spaces of column vectors. Specifically these are the null space (Theorem
NSMS), column space (Theorem CSMS), row space (Theorem RSMS) and left null
space (Theorem LNSMS). As subspaces they are vector spaces (Definition S) and it
is natural to ask about bases for these vector spaces. Theorem BNS, Theorem BCS,
Theorem BRS each have conclusions that provide linearly independent spanning sets
for (respectively) the null space, column space, and row space. Notice that each of
these theorems contains the word “basis” in its title, even though we did not know
the precise meaning of the word at the time. To find a basis for a left null space we
can use the definition of this subspace as a null space (Definition LNS) and apply
Theorem BNS. Or Theorem FS tells us that the left null space can be expressed as
a row space and we can then use Theorem BRS.
    Theorem BS is another early result that provides a linearly independent spanning
236                              Ro b e rt B e e z e r                             §B

set (i.e. a basis) as its conclusion. If a vector space of column vectors can be
expressed as a span of a set of column vectors, then Theorem BS can be employed
in a straightforward manner to quickly yield a basis.


Subsection BSCV
Bases for Spans of Column Vectors
We have seen several examples of bases in different vector spaces. In this subsection,
and the next (Subsection B.BNM), we will consider building bases for Cm and its
subspaces.
    Suppose we have a subspace of Cm that is expressed as the span of a set of vectors,
S, and S is not necessarily linearly independent, or perhaps not very attractive.
Theorem REMRS says that row-equivalent matrices have identical row spaces, while
Theorem BRS says the nonzero rows of a matrix in reduced row-echelon form are a
basis for the row space. These theorems together give us a great computational tool
for quickly finding a basis for a subspace that is expressed originally as a span.
Example RSB Row space basis
When we first defined the span of a set of column vectors, in Example SCAD we
looked at the set
                          *(" # " # " # " #)+
                                2      1      7    −7
                     W =       −3 , 4 , −5 , −6
                                1      1      4    −5
with an eye towards realizing W as the span of a smaller set. By building relations
of linear dependence (though we did not know them by that name then) we were
able to remove two vectors and write W as the span of the other two vectors. These
two remaining vectors formed a linearly independent set, even though we did not
know that at the time.
    Now we know that W is a subspace and must have a basis. Consider the matrix,
C, whose rows are the vectors in the spanning set for W ,
                                                 
                                      2 −3 1
                                   1     4     1
                              C=
                                      7 −5 4 
                                     −7 −6 −5
    Then, by Definition RSM, the row space of C will be W , R(C) = W . Theorem
BRS tells us that if we row-reduce C, the nonzero rows of the row-equivalent matrix
in reduced row-echelon form will be a basis for R(C), and hence a basis for W . Let
us do it — C row-reduces to
                                                
                                               7
                                    1    0 11
                                  0     1 11  1 
                                                
                                  0     0    0
                                    0    0    0
      If we convert the two nonzero rows to column vectors then we have a basis,
                                         
                                       1       0 
                                B =  0 ,  1 
                                       7       1 
                                         11     11
and
                                  *   +
                                    1       0 
                              W =    0 ,  1 
                                    7       1 
                                         11      11

    For aesthetic reasons, we might wish to multiply each vector in B by 11, which
will not change the spanning or linear independence properties of B as a basis. Then
we can also write
                                    *(" # " #)+
                                         11    0
                              W =         0 , 11
                                          7    1
§B                A First Course in Linear Algebra                                  237

                                                                                     4
   Example IAS provides another example of this flavor, though now we can notice
that X is a subspace, and that the resulting set of three vectors is a basis. This is
such a powerful technique that we should do one more example.
Example RS Reducing a span
In Example RSC5 we began with a set of n = 4 vectors from C5 ,
                                                    
                                      
                                        1     2      0      4 
                                                               
                                      
                                       2  1  −7  1   
                                                     
            R = {v1 , v2 , v3 , v4 } = −1 , 3 ,  6  , 2
                                      
                                         1 −11 1     
                                       3
                                                              
                                                               
                                         2     2     −2      6
and defined V = hRi. Our goal in that problem was to find a relation of linear
dependence on the vectors in R, solve the resulting equation for one of the vectors,
and re-express V as the span of a set of three vectors.
   Here is another way to accomplish something similar. The row space of the matrix
                                                       
                                1 2 −1          3     2
                              2 1        3     1     2
                          A=
                                0 −7 6 −11 −2
                                4 1       2     1     6
is equal to hRi. By Theorem BRS we can row-reduce this matrix, ignore any zero
rows, and use the nonzero rows as column vectors that are a basis for the row space
of A. Row-reducing A creates the matrix
                                         1     30
                                                    
                              1 0 0 − 17        17
                             0 1 0      25      2 
                                              − 17
                                        17         
                             0 0 1 − 2 − 8 
                                          17     17
                              0 0 0      0      0
     So
                                                 
                               1         0        0 
                            
                                                       
                             0        1   0     
                               0       0   1 
                                  ,        ,      
                            
                             − 1      25  − 2   
                            
                             30                    17 
                                                    8 
                                17        17
                                            2
                                 17      − 17     − 17
is a basis for V . Our theorem tells us this is a basis, there is no need to verify that
the subspace spanned by three vectors (rather than four) is the identical subspace,
and there is no need to verify that we have reached the limit in reducing the set,
since the set of three vectors is guaranteed to be linearly independent.              4


Subsection BNM
Bases and Nonsingular Matrices
A quick source of diverse bases for Cm is the set of columns of a nonsingular matrix.
Theorem CNMB Columns of Nonsingular Matrix are a Basis
Suppose that A is a square matrix of size m. Then the columns of A are a basis of
Cm if and only if A is nonsingular.

Proof. (⇒) Suppose that the columns of A are a basis for Cm . Then Definition B
says the set of columns is linearly independent. Theorem NMLIC then says that A
is nonsingular.
    (⇐) Suppose that A is nonsingular. Then by Theorem NMLIC this set of
columns is linearly independent. Theorem CSNM says that for a nonsingular matrix,
C(A) = Cm . This is equivalent to saying that the columns of A are a spanning set
for the vector space Cm . As a linearly independent spanning set, the columns of A
qualify as a basis for Cm (Definition B).                                        

Example CABAK Columns as Basis, Archetype K
238                             Ro b e rt B e e z e r                             §B

Archetype K is the 5 × 5 matrix
                                             
                             10 18  24 24 −12
                           12  −2 −6   0  −18
                                             
                      K = −30 −21 −23 −30 39 
                           27  30  36 37 −30
                             18 24  30 30 −20
which is row-equivalent to the 5 × 5 identity matrix I5 . So by Theorem NMRRI, K
is nonsingular. Then Theorem CNMB says the set
                                                     
                  
                     10        18       24       24       −12  
                  
                   12   −2   −6   0  −18             
                                                      
                    −30 , −21 , −23 , −30 ,  39 
                  
                          30   36   37  −30          
                   27
                                                                
                                                                 
                      18        24       30       30       −20
is a (novel) basis of C5 .                                                         4
   Perhaps we should view the fact that the standard unit vectors are a basis
(Theorem SUVB) as just a simple corollary of Theorem CNMB? (See Proof Technique
LC.)
   With a new equivalence for a nonsingular matrix, we can update our list of
equivalences.
Theorem NME5 Nonsingular Matrix Equivalences, Round 5
Suppose that A is a square matrix of size n. The following are equivalent.

   1. A is nonsingular.
   2. A row-reduces to the identity matrix.
   3. The null space of A contains only the zero vector, N (A) = {0}.
   4. The linear system LS(A, b) has a unique solution for every possible choice of
      b.
   5. The columns of A are a linearly independent set.
   6. A is invertible.
   7. The column space of A is Cn , C(A) = Cn .
   8. The columns of A are a basis for Cn .

Proof. With a new equivalence for a nonsingular matrix in Theorem CNMB we can
expand Theorem NME4.                                                        

Subsection OBC
Orthonormal Bases and Coordinates
We learned about orthogonal sets of vectors in Cm back in Section O, and we also
learned that orthogonal sets are automatically linearly independent (Theorem OSLI).
When an orthogonal set also spans a subspace of Cm , then the set is a basis. And
when the set is orthonormal, then the set is an incredibly nice basis. We will back up
this claim with a theorem, but first consider how you might manufacture such a set.
    Suppose that W is a subspace of Cm with basis B. Then B spans W and is
a linearly independent set of nonzero vectors. We can apply the Gram-Schmidt
Procedure (Theorem GSP) and obtain a linearly independent set T such that
hT i = hBi = W and T is orthogonal. In other words, T is a basis for W , and is an
orthogonal set. By scaling each vector of T to norm 1, we can convert T into an
orthonormal set, without destroying the properties that make it a basis of W . In
short, we can convert any basis into an orthonormal basis. Example GSTV, followed
by Example ONTV, illustrates this process.
    Unitary matrices (Definition UM) are another good source of orthonormal bases
(and vice versa). Suppose that Q is a unitary matrix of size n. Then the n columns of
§B                 A First Course in Linear Algebra                                 239

Q form an orthonormal set (Theorem CUMOS) that is therefore linearly independent
(Theorem OSLI). Since Q is invertible (Theorem UMI), we know Q is nonsingular
(Theorem NI), and then the columns of Q span Cn (Theorem CSNM). So the columns
of a unitary matrix of size n are an orthonormal basis for Cn .
    Why all the fuss about orthonormal bases? Theorem VRRB told us that any
vector in a vector space could be written, uniquely, as a linear combination of basis
vectors. For an orthonormal basis, finding the scalars for this linear combination
is extremely easy, and this is the content of the next theorem. Furthermore, with
vectors written this way (as linear combinations of the elements of an orthonormal
set) certain computations and analysis become much easier. Here is the promised
theorem.
Theorem COB Coordinates and Orthonormal Bases
Suppose that B = {v1 , v2 , v3 , . . . , vp } is an orthonormal basis of the subspace W
of Cm . For any w ∈ W ,
            w = hv1 , wi v1 + hv2 , wi v2 + hv3 , wi v3 + · · · + hvp , wi vp

Proof. Because B is a basis of W , Theorem VRRB tells us that we can write w
uniquely as a linear combination of the vectors in B. So it is not this aspect of
the conclusion that makes this theorem interesting. What is interesting is that the
particular scalars are so easy to compute. No need to solve big systems of equations
— just do an inner product of w with vi to arrive at the coefficient of vi in the linear
combination.
   So begin the proof by writing w as a linear combination of the vectors in B,
using unknown scalars,
                           w = a1 v1 + a2 v2 + a3 v3 + · · · + ap vp
and compute,
                       *       p
                                             +
                               X
          hvi , wi =    vi ,         ak vk                       Theorem VRRB
                               k=1
                       p
                       X
                   =         hvi , ak vk i                       Theorem IPVA
                       k=1
                       Xp
                   =         ak hvi , vk i                       Theorem IPSM
                       k=1
                                          p
                                          X
                   = ai hvi , vi i +             ak hvi , vk i   Property C
                                          k=1
                                          k6=i
                                  p
                                  X
                   = ai (1) +            ak (0)                  Definition ONS
                                  k=1
                                  k6=i

                   = ai
   So the (unique) scalars for the linear combination are indeed the inner products
advertised in the conclusion of the theorem’s statement.                         

Example CROB4 Coordinatization relative to an orthonormal basis, C4
The set
                                                               
                              1+i     1 + 5i      −7 + 34i     −2 − 4i 
                                                                       
                               1   6 + 5i   −8 − 23i   6 + i 
      {x1 , x2 , x3 , x4 } =      ,         ,           ,       
                             1−i
                                     −7 − i      −10 + 22i      4 + 3i 
                                                                        
                                i      1 − 6i      30 + 13i       6−i
was proposed, and partially verified, as an orthogonal set in Example AOS. Let
us scale each vector to norm 1, so as to form an orthonormal set in C4 . Then by
Theorem OSLI the set will be linearly independent, and by Theorem NME5 the
set will be a basis for C4 . So, once scaled to norm 1, the adjusted set will be an
240                                Ro b e rt B e e z e r                              §B

orthonormal basis of C4 . The norms are,
             √                 √                √                           √
      kx1 k = 6       kx2 k = 174        kx3 k = 3451             kx4 k =       119

      So an orthonormal basis is
 B = {v1 , v2 , v3 , v4 }
                                                                     
             1+i               1 + 5i             −7 + 34i           −2 − 4i 
      1                   1  6 + 5i        1  −8 − 23i       1  6 + i 
             1 
   = √               , √             , √                , √           
      6 1−i
                          174 −7 − i        3451 −10 + 22i       119 4 + 3i 
                 i              1 − 6i             30 + 13i            6−i
                                                                                 
                                                                         2
                                                                       −3
   Now, to illustrate Theorem COB, choose any vector from C4 , say w =  ,
                                                                         1
                                                                         4
and compute
                        −5i                            −19 + 30i
             hv1 , wi = √                   hv2 , wi = √
                          6                                174
                        120 − 211i                     6 + 12i
             hv3 , wi = √                   hv4 , wi = √
                            3451                          119
   Then Theorem COB guarantees that
                                              
   2               1+i                         1 + 5i
 −3 −5i  1  1  −19 + 30i  1  6 + 5i 
  1  = √  √ 1 − i + √           √             
          6     6               174       174 −7 − i
   4                 i                         1 − 6i
                                                              
                                −7 + 34i                     −2 − 4i
            120 − 211i  1  −8 − 23i  6 + 12i  1  6 + i 
          + √          √                + √       √           
                3451      3451 −10 + 22i         119     119 4 + 3i
                                30 + 13i                      6−i
as you might want to check (if you have unlimited patience).                          4

    A slightly less intimidating example follows, in three dimensions and with just
real numbers.

Example CROB3 Coordinatization relative to an orthonormal basis, C3
The set
                                 (" # " # " #)
                                   1     −1     2
               {x1 , x2 , x3 } =   2 , 0 , 1
                                   1      1     1
is a linearly independent set, which the Gram-Schmidt Process (Theorem GSP)
converts to an orthogonal set, and which can then be converted to the orthonormal
set,
                                    (   " #      " #        " #)
                                      1 1      1 −1      1    1
               B = {v1 , v2 , v3 } = √ 2 , √       0 , √ −1
                                       6 1      2 1       3 1

which is therefore an orthonormal basis of C3 . With three vectors in C3 , all with
real number entries, the inner product (Definition IP) reduces to the usual “dot
product” (or scalar product) and the orthogonal pairs of vectors can be interpreted
as perpendicular pairs of directions. So the vectors in B serve as replacements for our
usual 3-D axes, or the usual 3-D unit vectors ~i, ~j and ~k. We would like to decompose
arbitrary vectors into “components” in the directions of each of these basis vectors.
It is Theorem COB that tells us how to do this.
                                   " #
                                      2
    Suppose that we choose w = −1 . Compute
                                      5
                        5                        3                       8
            hv1 , wi = √             hv2 , wi = √            hv3 , wi = √
                         6                        2                       3
§B                       A First Course in Linear Algebra                                      241

then Theorem COB guarantees that
        " #            " #!                                      " #!            " #!
          2     5    1 1         3                             1 −1     8      1  1
         −1 = √     √ 2       +√                              √   0   +√      √ −1
          5      6    6 1         2                             2 1      3      3 1
which you should be able to check easily, even if you do not have much patience.4
    Not only do the columns of a unitary matrix form an orthonormal basis, but there
is a deeper connection between orthonormal bases and unitary matrices. Informally,
the next theorem says that if we transform each vector of an orthonormal basis by
multiplying it by a unitary matrix, then the resulting set will be another orthonormal
basis. And more remarkably, any matrix with this property must be unitary! As an
equivalence (Proof Technique E) we could take this as our defining property of a
unitary matrix, though it might not have the same utility as Definition UM.
Theorem UMCOB Unitary Matrices Convert Orthonormal Bases
Let A be an n × n matrix and B = {x1 , x2 , x3 , . . . , xn } be an orthonormal basis of
Cn . Define
                                    C = {Ax1 , Ax2 , Ax3 , . . . , Axn }
     Then A is a unitary matrix if and only if C is an orthonormal basis of Cn .


Proof. (⇒) Assume A is a unitary matrix and establish several facts about C. First
we check that C is an orthonormal set (Definition ONS). By Theorem UMPIP, for
i 6= j,
                                         hAxi , Axj i = hxi , xj i = 0
     Similarly, Theorem UMPIP also gives, for 1 ≤ i ≤ n,
                                                kAxi k = kxi k = 1
   As C is an orthogonal set (Definition OSV), Theorem OSLI yields the linear
independence of C. Having established that the column vectors on C form a linearly
independent set, a matrix whose columns are the vectors of C is nonsingular (Theorem
NMLIC), and hence these vectors form a basis of Cn by Theorem CNMB.
   (⇐) Now assume that C is an orthonormal set. Let y be an arbitrary vector
from Cn . Since B spans Cn , there are scalars, a1 , a2 , a3 , . . . , an , such that
                               y = a1 x1 + a2 x2 + a3 x3 + · · · + an xn
     Now
               n
               X
  A∗ Ay =            hxi , A∗ Ayi xi                                         Theorem COB
               i=1
                n
                   *                 n
                                                     +
               X                     X
           =             xi , A∗ A           aj xj       xi                  Definition SSVS
               i=1                   j=1
               n
                     *        n
                                                 +
               X              X
                                     ∗
           =           xi ,         A Aaj xj             xi                  Theorem MMDAA
               i=1            j=1
               n
                     *        n
                                                 +
               X              X
                                         ∗
           =         xi ,     aj A Axj xi                                    Theorem MMSMM
               i=1        j=1
               Xn X n
           =           hxi , aj A∗ Axj i xi                                  Theorem IPVA
               i=1 j=1
               Xn X n
           =           aj hxi , A∗ Axj i xi                                  Theorem IPSM
               i=1 j=1
               Xn X n
           =               aj hAxi , Axj i xi                                Theorem AIP
               i=1 j=1
242                                            Ro b e rt B e e z e r                              §B

                 n X
                 X n                                 n
                                                     X
             =              aj hAxi , Axj i xi +            a` hAx` , Ax` i x`   Property C
                 i=1 j=1                              `=1
                     j6=i
                 Xn X  n                 n
                                         X
             =              aj (0)xi +          a` (1)x`                         Definition ONS
                 i=1 j=1                 `=1
                     j6=i
                  n
                 XX    n         n
                                 X
             =              0+         a` x`                                     Theorem ZSSM
                 i=1 j=1         `=1
                     j6=i
                 Xn
             =         a` x`                                                     Property Z
                 `=1
             =y
             = In y                                                              Theorem MMIM
   Since the choice of y was arbitrary, Theorem EMMVP tells us that A∗ A = In ,
so A is unitary (Definition UM).                                            

Reading Questions
1. The matrix below is nonsingular. What can                  you now say about its columns?
                                                                   
                                         −3                    0 1
                                    A= 1                      2 1
                                          5                    1 6
                                
                          6
2. Write the vector w =  6  as a linear combination of the columns of the matrix A
                         15
   above. How many ways are there to answer this question?
3. Why is an orthonormal basis desirable?

Exercises
C10†      Find a basis for hSi, where
                                     1  1   1   1   3 
                                          
                                 
                                   3 2 1 2 4
                                                      
                             S =  , , , ,  .
                                  2
                                       1   0   2   1 
                                     1  1   1   1   3

C11†      Find a basis for the subspace W of C4 ,
                                      a + b − 2c
                                                                  
                                                                    
                                  a + b − 2c + d 
                                                                    
                         W =                          a, b, c, d ∈ C
                               
                                  −2a + 2b + 4c − d 
                                                                     
                                                                     
                                         b+d

C12†   Find a basis for           space T of lower triangular 3 × 3 matrices; that is,
                                 the
                            vector
                              
                      ∗     0     0
matrices of the form ∗           ∗
                            0 where an asterisk represents any complex number.
                      ∗     ∗     ∗
C13† Find a basis for the subspace Q of P2 , Q = p(x) = a + bx + cx2 p(0) = 0 .
                                                  

C14† Find a basis for the subspace R of P2 , R = p(x) = a + bx + cx2 p0 (0) = 0 , where
                                                

p0 denotes the derivative.
C40† From Example RSB, form an arbitrary (and nontrivial) linear combination of the
four vectors in the original spanning set for W . So the result of this computation is of
course an element of W . As such, this vector should be a linear combination of the basis
vectors in B. Find the (unique) scalars that provide this linear combination. Repeat with
another linear combination of the original four vectors.
C80       Prove that {(1, 2), (2, 3)} is a basis for the crazy vector space C (Example CVS).
      †
M20 In Example BM provide the verifications (linear independence and spanning) to
show that B is a basis of Mmn .
§B                  A First Course in Linear Algebra                                         243

T50† Theorem UMCOB says that unitary matrices are characterized as those matrices
that “carry” orthonormal bases to orthonormal bases. This problem asks you to prove a
similar result: nonsingular matrices are characterized as those matrices that “carry” bases
to bases.

More precisely, suppose that A is a square matrix of size n and B = {x1 , x2 , x3 , . . . , xn }
is a basis of Cn . Prove that A is nonsingular if and only if C = {Ax1 , Ax2 , Ax3 , . . . , Axn }
is a basis of Cn . (See also Exercise PD.T33, Exercise MR.T20.)
T51† Use the result of Exercise B.T50 to build a very concise proof of Theorem CNMB.
(Hint: make a judicious choice for the basis B.)
244   Ro b e rt B e e z e r   §B
Section D
Dimension
Almost every vector space we have encountered has been infinite in size (an exception
is Example VSS). But some are bigger and richer than others. Dimension, once
suitably defined, will be a measure of the size of a vector space, and a useful tool
for studying its properties. You probably already have a rough notion of what a
mathematical definition of dimension might be — try to forget these imprecise ideas
and go with the new ones given here.

Subsection D
Dimension
Definition D Dimension
Suppose that V is a vector space and {v1 , v2 , v3 , . . . , vt } is a basis of V . Then the
dimension of V is defined by dim (V ) = t. If V has no finite bases, we say V has
infinite dimension.                                                                       
    This is a very simple definition, which belies its power. Grab a basis, any basis,
and count up the number of vectors it contains. That is the dimension. However, this
simplicity causes a problem. Given a vector space, you and I could each construct
different bases — remember that a vector space might have many bases. And what if
your basis and my basis had different sizes? Applying Definition D we would arrive
at different numbers! With our current knowledge about vector spaces, we would
have to say that dimension is not “well-defined.” Fortunately, there is a theorem
that will correct this problem.
    In a strictly logical progression, the next two theorems would precede the definition
of dimension. Many subsequent theorems will trace their lineage back to the following
fundamental result.
Theorem SSLD Spanning Sets and Linear Dependence
Suppose that S = {v1 , v2 , v3 , . . . , vt } is a finite set of vectors which spans the
vector space V . Then any set of t + 1 or more vectors from V is linearly dependent.
Proof. We want to prove that any set of t + 1 or more vectors from V is linearly
dependent. So we will begin with a totally arbitrary set of vectors from V , R =
{u1 , u2 , u3 , . . . , um }, where m > t. We will now construct a nontrivial relation of
linear dependence on R.
    Each vector u1 , u2 , u3 , . . . , um can be written as a linear combination of the
vectors v1 , v2 , v3 , . . . , vt since S is a spanning set of V . This means there exist
scalars aij , 1 ≤ i ≤ t, 1 ≤ j ≤ m, so that
                      u1 = a11 v1 + a21 v2 + a31 v3 + · · · + at1 vt
                      u2 = a12 v1 + a22 v2 + a32 v3 + · · · + at2 vt
                      u3 = a13 v1 + a23 v2 + a33 v3 + · · · + at3 vt
                            ..
                             .
                     um = a1m v1 + a2m v2 + a3m v3 + · · · + atm vt
     Now we form, unmotivated, the homogeneous system of t equations in the m
variables, x1 , x2 , x3 , . . . , xm , where the coefficients are the just-discovered scalars
aij ,
                       a11 x1 + a12 x2 + a13 x3 + · · · + a1m xm = 0
                       a21 x1 + a22 x2 + a23 x3 + · · · + a2m xm = 0
                       a31 x1 + a32 x2 + a33 x3 + · · · + a3m xm = 0
                                                            ..
                                                             .
                        at1 x1 + at2 x2 + at3 x3 + · · · + atm xm = 0



                                            245
246                                    Ro b e rt B e e z e r                                  §D

    This is a homogeneous system with more variables than equations (our hypothesis
is expressed as m > t), so by Theorem HMVEI there are infinitely many solutions.
Choose a nontrivial solution and denote it by x1 = c1 , x2 = c2 , x3 = c3 , . . . , xm =
cm . As a solution to the homogeneous system, we then have
                           a11 c1 + a12 c2 + a13 c3 + · · · + a1m cm = 0
                           a21 c1 + a22 c2 + a23 c3 + · · · + a2m cm = 0
                           a31 c1 + a32 c2 + a33 c3 + · · · + a3m cm = 0
                                                               ..
                                                                .
                            at1 c1 + at2 c2 + at3 c3 + · · · + atm cm = 0


    As a collection of nontrivial scalars, c1 , c2 , c3 , . . . , cm will provide the nontrivial
relation of linear dependence we desire,
   c1 u1 + c2 u2 + c3 u3 + · · · + cm um
      = c1 (a11 v1 + a21 v2 + a31 v3 + · · · + at1 vt )                     Definition SSVS
           + c2 (a12 v1 + a22 v2 + a32 v3 + · · · + at2 vt )
           + c3 (a13 v1 + a23 v2 + a33 v3 + · · · + at3 vt )
               ..
                .
           + cm (a1m v1 + a2m v2 + a3m v3 + · · · + atm vt )
      = c1 a11 v1 + c1 a21 v2 + c1 a31 v3 + · · · + c1 at1 vt               Property DVA
           + c2 a12 v1 + c2 a22 v2 + c2 a32 v3 + · · · + c2 at2 vt
           + c3 a13 v1 + c3 a23 v2 + c3 a33 v3 + · · · + c3 at3 vt
               ..
                .
           + cm a1m v1 + cm a2m v2 + cm a3m v3 + · · · + cm atm vt
      = (c1 a11 + c2 a12 + c3 a13 + · · · + cm a1m ) v1                     Property DSA
           + (c1 a21 + c2 a22 + c3 a23 + · · · + cm a2m ) v2
           + (c1 a31 + c2 a32 + c3 a33 + · · · + cm a3m ) v3
               ..
                .
           + (c1 at1 + c2 at2 + c3 at3 + · · · + cm atm ) vt
      = (a11 c1 + a12 c2 + a13 c3 + · · · + a1m cm ) v1                     Property CMCN
           + (a21 c1 + a22 c2 + a23 c3 + · · · + a2m cm ) v2
           + (a31 c1 + a32 c2 + a33 c3 + · · · + a3m cm ) v3
               ..
                .
           + (at1 c1 + at2 c2 + at3 c3 + · · · + atm cm ) vt
      = 0v1 + 0v2 + 0v3 + · · · + 0vt                                       cj as solution
      = 0 + 0 + 0 + ··· + 0                                                 Theorem ZSSM
      =0                                                                    Property Z
    That does it. R has been undeniably shown to be a linearly dependent set.
    The proof just given has some monstrous expressions in it, mostly owing to
the double subscripts present. Now is a great opportunity to show the value of a
more compact notation. We will rewrite the key steps of the previous proof using
summation notation, resulting in a more economical presentation, and even greater
insight into the key aspects of the proof. So here is an alternate proof — study it
carefully.
    Alternate Proof: We want to prove that any set of t + 1 or more vectors from
V is linearly dependent. So we will begin with a totally arbitrary set of vectors from
V , R = { uj | 1 ≤ j ≤ m}, where m > t. We will now construct a nontrivial relation
of linear dependence on R.
§D                 A First Course in Linear Algebra                                   247

    Each vector uj , 1 ≤ j ≤ m can be written as a linear combination of vi , 1 ≤ i ≤ t
since S is a spanning set of V . This means there are scalars aij , 1 ≤ i ≤ t, 1 ≤ j ≤ m,
so that
                           Xt
                     uj =      aij vi                   1≤j≤m
                             i=1

    Now we form, unmotivated, the homogeneous system of t equations in the m
variables, xj , 1 ≤ j ≤ m, where the coefficients are the just-discovered scalars aij ,
                      m
                      X
                            aij xj = 0                    1≤i≤t
                      j=1

    This is a homogeneous system with more variables than equations (our hypothesis
is expressed as m > t), so by Theorem HMVEI there are infinitely many solutions.
Choose one of these solutions that is not trivial and denote Pmit by xj = cj , 1 ≤ j ≤ m.
As a solution to the homogeneous system, we then have j=1 aij cj = 0 for 1 ≤ i ≤ t.
As a collection of nontrivial scalars, cj , 1 ≤ j ≤ m, will provide the nontrivial relation
of linear dependence we desire,
               m          m        t
                                              !
              X          X        X
                 cj uj =     cj        aij vi               Definition SSVS
             j=1          j=1         i=1
                          m X
                          X t
                      =               cj aij vi            Property DVA
                          j=1 i=1
                          t X
                          X m
                      =               cj aij vi            Property C
                          i=1 j=1
                          t X
                          X m
                      =               aij cj vi            Property CMCN
                          i=1 j=1
                                          
                          t
                          X      Xm
                      =            aij cj  vi            Property DSA
                          i=1       j=1
                          t
                          X
                      =         0vi                        cj as solution
                          i=1
                          Xt
                      =         0                          Theorem ZSSM
                          i=1
                      =0                                   Property Z
     That does it. R has been undeniably shown to be a linearly dependent set.          

    Notice how the swap of the two summations is so much easier in the third step
above, as opposed to all the rearranging and regrouping that takes place in the
previous proof. And using only about half the space. And there are no ellipses (. . . ).
    Theorem SSLD can be viewed as a generalization of Theorem MVSLD. We know
that Cm has a basis with m vectors in it (Theorem SUVB), so it is a set of m vectors
that spans Cm . By Theorem SSLD, any set of more than m vectors from Cm will be
linearly dependent. But this is exactly the conclusion we have in Theorem MVSLD.
Maybe this is not a total shock, as the proofs of both theorems rely heavily on
Theorem HMVEI. The beauty of Theorem SSLD is that it applies in any vector
space. We illustrate the generality of this theorem, and hint at its power, in the next
example.
Example LDP4 Linearly dependent set in P4
In Example SSP4 we showed that
       
    S = x − 2, x2 − 4x + 4, x3 − 6x2 + 12x − 8, x4 − 8x3 + 24x2 − 32x + 16
is a spanning set for W = { p(x)| p ∈ P4 , p(2) = 0}. So we can apply Theorem SSLD
248                             Ro b e rt B e e z e r                             §D

to W with t = 4. Here is a set of five vectors from W , as you may check by verifying
that each is a polynomial of degree 4 or less and has x = 2 as a root,
                           T = {p1 , p2 , p3 , p4 , p5 } ⊆ W


                           p1 = x4 − 2x3 + 2x2 − 8x + 8
                           p2 = −x3 + 6x2 − 5x − 6
                           p3 = 2x4 − 5x3 + 5x2 − 7x + 2
                           p4 = −x4 + 4x3 − 7x2 + 6x
                           p5 = 4x3 − 9x2 + 5x − 6
   By Theorem SSLD we conclude that T is linearly dependent, with no further
computations.                                                             4
   Theorem SSLD is indeed powerful, but our main purpose in proving it right now
was to make sure that our definition of dimension (Definition D) is well-defined.
Here is the theorem.
Theorem BIS Bases have Identical Sizes
Suppose that V is a vector space with a finite basis B and a second basis C. Then B
and C have the same size.

Proof. Suppose that C has more vectors than B. (Allowing for the possibility that
C is infinite, we can replace C by a subset that has more vectors than B.) As a
basis, B is a spanning set for V (Definition B), so Theorem SSLD says that C is
linearly dependent. However, this contradicts the fact that as a basis C is linearly
independent (Definition B). So C must also be a finite set, with size less than, or
equal to, that of B.
    Suppose that B has more vectors than C. As a basis, C is a spanning set for V
(Definition B), so Theorem SSLD says that B is linearly dependent. However, this
contradicts the fact that as a basis B is linearly independent (Definition B). So C
cannot be strictly smaller than B.
    The only possibility left for the sizes of B and C is for them to be equal.   

    Theorem BIS tells us that if we find one finite basis in a vector space, then they
all have the same size. This (finally) makes Definition D unambiguous.

Subsection DVS
Dimension of Vector Spaces
We can now collect the dimension of some common, and not so common, vector
spaces.
Theorem DCM Dimension of Cm
The dimension of Cm (Example VSCV) is m.

Proof. Theorem SUVB provides a basis with m vectors.                                

Theorem DP Dimension of Pn
The dimension of Pn (Example VSP) is n + 1.

Proof. Example BP provides two bases with n + 1 vectors. Take your pick.            

Theorem DM Dimension of Mmn
The dimension of Mmn (Example VSM) is mn.

Proof. Example BM provides a basis with mn vectors.                                 

Example DSM22 Dimension of a subspace of M22
It should now be plausible that
                                                                  
                  a b
           Z=             2a + b + 3c + 4d = 0, −a + 3b − 5c − 2d = 0
                  c d
§D                A First Course in Linear Algebra                                   249

is a subspace of the vector space M22 (Example VSM). (It is.) To find the dimension
of Z we must first find a basis, though any old basis will do.
    First concentrate on the conditions relating a, b, c and d. They form a homoge-
neous system of two equations in four variables with coefficient matrix
                                                  
                                   2 1 3         4
                                  −1 3 −5 −2
     We can row-reduce this matrix to obtain
                                                     
                                  1    0    2       2
                                  0    1 −1         0
   Rewrite the two equations represented by each row of this matrix, expressing the
dependent variables (a and b) in terms of the free variables (c and d), and we obtain,
                                     a = −2c − 2d
                                     b=c
    We can now write a typical entry of Z strictly        in terms of c and d, and we can
decompose the result,
                                                                         
    a b      −2c − 2d c        −2c c        −2d           0       −2   1      −2   0
          =                =             +                    =c          +d
     c d         c      d        c   0        0           d        1   0       0   1
   This equation says that an arbitrary matrix in Z can be written as a linear
combination of the two vectors in
                                             
                                  −2 1    −2 0
                           S=           ,
                                   1 0     0 1
so we know that
                                                          
                                       −2      1   −2        0
                        Z = hSi =                ,
                                        1      0    0        1
    Are these two matrices (vectors) also linearly independent? Begin with a relation
of linear dependence on S,
                                                
                            −2 1            −2 0
                        a1          + a2             =O
                             1 0             0 1
                                                         
                                 −2a1 − 2a2 a1          0 0
                                                     =
                                      a1        a2      0 0
   From the equality of the two entries in the last row, we conclude that a1 = 0,
a2 = 0. Thus the only possible relation of linear dependence is the trivial one, and
therefore S is linearly independent (Definition LI). So S is a basis for Z (Definition B).
Finally, we can conclude that dim (Z) = 2 (Definition D) since S has two elements.
4
Example DSP4 Dimension of a subspace of P4
In Example BSP4 we showed that
       
    S = x − 2, x2 − 4x + 4, x3 − 6x2 + 12x − 8, x4 − 8x3 + 24x2 − 32x + 16
is a basis for W = { p(x)| p ∈ P4 , p(2) = 0}. Thus, the dimension of W is four,
dim (W ) = 4.
    Note that dim (P4 ) = 5 by Theorem DP, so W is a subspace of dimension 4
within the vector space P4 of dimension 5, illustrating the upcoming Theorem PSSD.
4
Example DC Dimension of the crazy vector space
In Example BC we determined that the set R = {(1, 0), (6, 3)} from the crazy
vector space, C (Example CVS), is a basis for C. By Definition D we see that C has
dimension 2, dim (C) = 2.                                                       4
    It is possible for a vector space to have no finite bases, in which case we say
it has infinite dimension. Many of the best examples of this are vector spaces of
functions, which lead to constructions like Hilbert spaces. We will focus exclusively
on finite-dimensional vector spaces. OK, one infinite-dimensional example, and then
250                             Ro b e rt B e e z e r                              §D

we will focus exclusively on finite-dimensional vector spaces.
Example VSPUD Vector space of polynomials with unbounded degree
Define the set P by
                         P = { p| p(x) is a polynomial in x}
     Our operations will be the same as those defined for Pn (Example VSP).
     With no restrictions on the possible degrees of our polynomials, any finite set
that is a candidate for spanning P will come up short. We will give a proof by
contradiction (Proof Technique CD). To this end, suppose that the dimension of P
is finite, say dim(P ) = n.
     The set T = 1, x, x2 , . . . , xn is a linearly independent set (check this!) con-
taining n + 1 polynomials from P . However, a basis of P will be a spanning set of
P containing n vectors. This situation is a contradiction of Theorem SSLD, so our
assumption that P has finite dimension is false. Thus, we say dim (P ) = ∞.          4


Subsection RNM
Rank and Nullity of a Matrix
For any matrix, we have seen that we can associate several subspaces — the null
space (Theorem NSMS), the column space (Theorem CSMS), row space (Theorem
RSMS) and the left null space (Theorem LNSMS). As vector spaces, each of these
has a dimension, and for the null space and column space, they are important enough
to warrant names.
Definition NOM Nullity Of a Matrix
Suppose that A is an m × n matrix. Then the nullity of A is the dimension of the
null space of A, n (A) = dim (N (A)).                                         
Definition ROM Rank Of a Matrix
Suppose that A is an m × n matrix. Then the rank of A is the dimension of the
column space of A, r (A) = dim (C(A)).                                     
Example RNM Rank and nullity of a matrix
Let us compute the rank and nullity of
                                                               
                          2 −4 −1 3        2             1    −4
                        1 −2 0         0 4              0    1
                                                               
                       −2 4         1  0 −5             −4   −8
                   A=
                        1 −2 1         1 6              1    −3
                                                                
                        2 −4 −1 1         4             −2   −1
                         −1 2        3 −1 6               3   −1
   To do this, we will first row-reduce the matrix   since that will help us determine
bases for the null space and column space.
                                                            
                           1 −2 0         0   4      0     1
                        0      0    1    0   3      0    −2
                                                            
                        0      0    0    1 −1       0    −3
                                                            
                                                            
                        0      0    0    0   0      1    1
                        0      0    0    0   0      0     0
                           0    0    0    0   0      0     0
    From this row-equivalent matrix in reduced row-echelon form we record D =
{1, 3, 4, 6} and F = {2, 5, 7}.
    For each index in D, Theorem BCS creates a single basis vector. In total the
basis will have 4 vectors, so the column space of A will have dimension 4 and we
write r (A) = 4.
    For each index in F , Theorem BNS creates a single basis vector. In total the
basis will have 3 vectors, so the null space of A will have dimension 3 and we write
n (A) = 3.                                                                        4
   There were no accidents or coincidences in the previous example — with the
row-reduced version of a matrix in hand, the rank and nullity are easy to compute.
§D                A First Course in Linear Algebra                                251

Theorem CRN Computing Rank and Nullity
Suppose that A is an m × n matrix and B is a row-equivalent matrix in reduced
row-echelon form. Let r denote the number of pivot columns (or the number of
nonzero rows). Then r (A) = r and n (A) = n − r.
Proof. Theorem BCS provides a basis for the column space by choosing columns
of A that that have the same indices as the pivot columns of B. In the analysis of
B, each leading 1 provides one nonzero row and one pivot column. So there are r
column vectors in a basis for C(A).
    Theorem BNS provides a basis for the null space by creating basis vectors of the
null space of A from entries of B, one basis vector for each column that is not a
pivot column. So there are n − r column vectors in a basis for n (A).             
    Every archetype (Archetypes) that involves a matrix lists its rank and nullity.
You may have noticed as you studied the archetypes that the larger the column space
is the smaller the null space is. A simple corollary states this trade-off succinctly.
(See Proof Technique LC.)
Theorem RPNC Rank Plus Nullity is Columns
Suppose that A is an m × n matrix. Then r (A) + n (A) = n.
Proof. Let r be the number of nonzero rows in a row-equivalent matrix in reduced
row-echelon form. By Theorem CRN,
                          r (A) + n (A) = r + (n − r) = n
    When we first introduced r as our standard notation for the number of nonzero
 rows in a matrix in reduced row-echelon form you might have thought r stood for
“rows.” Not really — it stands for “rank”!

Subsection RNNM
Rank and Nullity of a Nonsingular Matrix
Let us take a look at the rank and nullity of a square matrix.
Example RNSM Rank and           nullity of a square matrix
The matrix
                                                               
                   0              4    −1    2    2   3      1
                 2              −2     1   −1    0   −4     −3
                                                               
                 −2             −3     9   −3    9   −1      9
                                                               
             E = −3             −4     9    4   −1    6     −2
                 −3             −4     6   −2    5    9     −4
                                                               
                 9              −3     8   −2   −4    2      4
                   8              2     2   9    3     0      9
is row-equivalent to the matrix in   reduced row-echelon form,
                                                          
                          1    0      0    0    0   0    0
                        0     1      0    0    0   0    0
                                                          
                        0     0      1    0    0   0    0
                                                          
                                                          
                        0     0      0    1    0   0    0
                                                          
                        0     0      0    0    1   0    0
                                                          
                        0     0      0    0    0   1    0
                          0    0      0    0    0   0    1
    With n = 7 columns and r = 7 nonzero rows Theorem CRN tells us the rank is
r (E) = 7 and the nullity is n (E) = 7 − 7 = 0.                            4
  The value of either the nullity or the rank are enough to characterize a nonsingular
matrix.
Theorem RNNM Rank and Nullity of a Nonsingular Matrix
Suppose that A is a square matrix of size n. The following are equivalent.

   1. A is nonsingular.
252                               Ro b e rt B e e z e r                                §D

   2. The rank of A is n, r (A) = n.

   3. The nullity of A is zero, n (A) = 0.

Proof. (1 ⇒ 2) Theorem CSNM says that if A is nonsingular then C(A) = Cn . If
C(A) = Cn , then the column space has dimension n by Theorem DCM, so the rank
of A is n.
   (2 ⇒ 3) Suppose r (A) = n. Then Theorem RPNC gives
                 n (A) = n − r (A)                     Theorem RPNC
                         =n−n                          Hypothesis
                         =0
   (3 ⇒ 1) Suppose n (A) = 0, so a basis for the null space of A is the empty set.
This implies that N (A) = {0} and Theorem NMTNS says A is nonsingular.         

   With a new equivalence for a nonsingular matrix, we can update our list of
equivalences (Theorem NME5) which now becomes a list requiring double digits to
number.
Theorem NME6 Nonsingular Matrix Equivalences, Round 6
Suppose that A is a square matrix of size n. The following are equivalent.

   1. A is nonsingular.

   2. A row-reduces to the identity matrix.

   3. The null space of A contains only the zero vector, N (A) = {0}.

   4. The linear system LS(A, b) has a unique solution for every possible choice of
      b.

   5. The columns of A are a linearly independent set.

   6. A is invertible.

   7. The column space of A is Cn , C(A) = Cn .

   8. The columns of A are a basis for Cn .

   9. The rank of A is n, r (A) = n.

 10. The nullity of A is zero, n (A) = 0.

Proof. Building on Theorem NME5 we can add two of the statements from Theorem
RNNM.                                                                       

Reading Questions

1. What is the dimension of the vector space P6 , the set of all polynomials of degree 6 or
   less?
2. How are the rank and nullity of a matrix related?
3. Explain why we might say that a nonsingular matrix has “full rank.”


Exercises
C20 The archetypes listed below are matrices, or systems of equations with coefficient
matrices. For each, compute the nullity and rank of the matrix. This information is listed
for each archetype (along with the number of columns in the matrix, so as to illustrate
Theorem RPNC), and notice how it could have been computed immediately after the
determination of the sets D and F associated with the reduced row-echelon form of the
matrix.
Archetype A, Archetype B, Archetype C, Archetype D/Archetype E, Archetype F, Archetype
G/Archetype H, Archetype I, Archetype J, Archetype K, Archetype L
§D                   A First Course in Linear Algebra                                            253

                                                    a+b
                                                                       
                                                                         
                                                   a + c
                                                                         
C21†       Find the dimension of the subspace W =          a, b, c, d ∈ C of C4 .
                                                  a+d
                                                         
                                                                         
                                                                          
                                                      d
C22†       Find the dimension of the subspace W = a + bx + cx2 + dx3 a + b + c + d = 0
                                                 

of P3 .
                                                                                                 
                                                         a       b
C23†       Find the dimension of the subspace W =                    a + b = c, b + c = d, c + d = a
                                                         c       d
of M22 .
C30†       For the matrix A below, compute the dimension         of the null space of A, dim (N (A)).
                                      2 −1 −3 11                  9
                                                                    
                                     1   2     1  −7            −3
                                A=
                                      3   1    −3   6             8
                                      2   1     2  −5            −3

C31†       The set W below is a subspace of C4 . Find the dimension of W .
                                    * 2          3     −4 +
                                           
                                        −3  0  −3
                                                           
                             W =             ,
                                        4 1 2  ,
                                      
                                                           
                                                            
                                          1      −2      5

                                                          1  0 1
                                                                 
                                                       1    2 2
C35†       Find the rank and nullity of the matrix A =  2   1 1.
                                                                 
                                                       −1 0 1
                                                                 1    1   2
                                                                
                                                   1 2 1 1 1
     †
C36 Find the rank and nullity of the matrix A = 1 3 2 0 4.
                                                   1 2 1 1 1
                                                    3  2 1 1       1
                                                                    
                                                 2    3 0 1       1
C37† Find the rank and nullity of the matrix A = −1 1 2 1         0 .
                                                                    
                                                 1    1 0 1       1
                                                    0  1 1 2 −1
C40 In Example LDP4 we determined that the set of five polynomials, T , is linearly
dependent by a simple invocation of Theorem SSLD. Prove that T is linearly dependent
from scratch, beginning with Definition LI.
M20† M22 is the vector space of 2 × 2 matrices. Let S22 denote the set of all 2 × 2
symmetric matrices. That is
                            S22 = A ∈ M22 | At = A
                                 


   1. Show that S22 is a subspace of M22 .

   2. Exhibit a basis for S22 and prove that it has the required properties.

   3. What is the dimension of S22 ?

M21† A 2 × 2 matrix B is upper triangular if [B]21 = 0. Let U T2 be the set of all 2 × 2
upper triangular matrices. Then U T2 is a subspace of the vector space of all 2 × 2 matrices,
M22 (you may assume this). Determine the dimension of U T2 providing all of the necessary
justifications for your answer.
254   Ro b e rt B e e z e r   §D
Section PD
Properties of Dimension
Once the dimension of a vector space is known, then the determination of whether
or not a set of vectors is linearly independent, or if it spans the vector space, can
often be much easier. In this section we will state a workhorse theorem and then
apply it to the column space and row space of a matrix. It will also help us describe
a super-basis for Cm .

Subsection GT
Goldilocks’ Theorem
We begin with a useful theorem that we will need later, and in the proof of the
main theorem in this subsection. This theorem says that we can extend linearly
independent sets, one vector at a time, by adding vectors from outside the span of
the linearly independent set, all the while preserving the linear independence of the
set.
Theorem ELIS Extending Linearly Independent Sets
Suppose V is a vector space and S is a linearly independent set of vectors from V .
Suppose w is a vector such that w 6∈ hSi. Then the set S 0 = S ∪ {w} is linearly
independent.

Proof. Suppose S = {v1 , v2 , v3 , . . . , vm } and begin with a relation of linear depen-
dence on S 0 ,
                  a1 v1 + a2 v2 + a3 v3 + · · · + am vm + am+1 w = 0.
    There are two cases to consider. First suppose that am+1 = 0. Then the relation
of linear dependence on S 0 becomes
                       a1 v1 + a2 v2 + a3 v3 + · · · + am vm = 0.
and by the linear independence of the set S, we conclude that a1 = a2 = a3 = · · · =
am = 0. So all of the scalars in the relation of linear dependence on S 0 are zero.
   In the second case, suppose that am+1 6= 0. Then the relation of linear dependence
on S 0 becomes
            am+1 w = −a1 v1 − a2 v2 − a3 v3 − · · · − am vm
                        a1          a2           a3                 am
                 w=−        v1 −        v2 −          v3 − · · · −      vm
                      am+1        am+1        am+1                 am+1
    This equation expresses w as a linear combination of the vectors in S, contrary
to the assumption that w 6∈ hSi, so this case leads to a contradiction.
    The first case yielded only a trivial relation of linear dependence on S 0 and the
second case led to a contradiction. So S 0 is a linearly independent set since any
relation of linear dependence is trivial.                                           

    In the story Goldilocks and the Three Bears, the young girl Goldilocks visits the
empty house of the three bears while out walking in the woods. One bowl of porridge
is too hot, the other too cold, the third is just right. One chair is too hard, one too
soft, the third is just right. So it is with sets of vectors — some are too big (linearly
dependent), some are too small (they do not span), and some are just right (bases).
Here is Goldilocks’ Theorem.
Theorem G Goldilocks
Suppose that V is a vector space of dimension t. Let S = {v1 , v2 , v3 , . . . , vm } be a
set of vectors from V . Then

   1. If m > t, then S is linearly dependent.

   2. If m < t, then S does not span V .

   3. If m = t and S is linearly independent, then S spans V .

                                           255
256                             Ro b e rt B e e z e r                              §P D

  4. If m = t and S spans V , then S is linearly independent.

Proof. Let B be a basis of V . Since dim (V ) = t, Definition B and Theorem BIS
imply that B is a linearly independent set of t vectors that spans V .

   1. Suppose to the contrary that S is linearly independent. Then B is a smaller
      set of vectors that spans V . This contradicts Theorem SSLD.

   2. Suppose to the contrary that S does span V . Then B is a larger set of vectors
      that is linearly independent. This contradicts Theorem SSLD.

   3. Suppose to the contrary that S does not span V . Then we can choose a vector
      w such that w ∈ V and w 6∈ hSi. By Theorem ELIS, the set S 0 = S ∪ {w} is
      again linearly independent. Then S 0 is a set of m + 1 = t + 1 vectors that are
      linearly independent, while B is a set of t vectors that span V . This contradicts
      Theorem SSLD.

   4. Suppose to the contrary that S is linearly dependent. Then by Theorem DLDS
      (which can be upgraded, with no changes in the proof, to the setting of a
      general vector space), there is a vector in S, say vk that is equal to a linear
      combination of the other vectors in S. Let S 0 = S \ {vk }, the set of “other”
      vectors in S. Then it is easy to show that V = hSi = hS 0 i. So S 0 is a set of
      m − 1 = t − 1 vectors that spans V , while B is a set of t linearly independent
      vectors in V . This contradicts Theorem SSLD.

                                                                                         

    There is a tension in the construction of a basis. Make a set too big and you will
end up with relations of linear dependence among the vectors. Make a set too small
and you will not have enough raw material to span the entire vector space. Make a
set just the right size (the dimension) and you only need to have linear independence
or spanning, and you get the other property for free. These roughly-stated ideas are
made precise by Theorem G.
    The structure and proof of this theorem also deserve comment. The hypotheses
seem innocuous. We presume we know the dimension of the vector space in hand,
then we mostly just look at the size of the set S. From this we get big conclusions
about spanning and linear independence. Each of the four proofs relies on ultimately
contradicting Theorem SSLD, so in a way we could think of this entire theorem as a
corollary of Theorem SSLD. (See Proof Technique LC.) The proofs of the third and
fourth parts parallel each other in style: introduce w using Theorem ELIS or toss
vk using Theorem DLDS. Then obtain a contradiction to Theorem SSLD.
    Theorem G is useful in both concrete examples and as a tool in other proofs. We
will use it often to bypass verifying linear independence or spanning.
Example BPR Bases for Pn , reprised
In Example BP we claimed that
      
   B = 1, x, x2 , x3 , . . . , xn
      
   C = 1, 1 + x, 1 + x + x2 , 1 + x + x2 + x3 , . . . , 1 + x + x2 + x3 + · · · + xn .
were both bases for Pn (Example VSP). Suppose we had first verified that B was
a basis, so we would then know that dim (Pn ) = n + 1. The size of C is n + 1, the
right size to be a basis. We could then verify that C is linearly independent. We
would not have to make any special efforts to prove that C spans Pn , since Theorem
G would allow us to conclude this property of C directly. Then we would be able to
say that C is a basis of Pn also.                                                 4
Example BDM22 Basis by dimension in M22
In Example DSM22 we showed that
                                                     
                               −2 1    −2               0
                       B=            ,
                                1 0     0               1
§P D               A First Course in Linear Algebra                               257

is a basis for the subspace Z of M22 (Example VSM) given by
                                                                   
                     a b
             Z=             2a + b + 3c + 4d = 0, −a + 3b − 5c − d = 0
                     c d
   This tells us that dim (Z) = 2. In this example we will find another basis. We
can construct two new matrices in Z by forming linear combinations of the matrices
in B.
                                                      
                         −2 1            −2 0       2 2
                      2          + (−3)          =
                          1 0             0 1       2 −3
                                                      
                             −2 1        −2 0       −8 3
                          3         +1           =
                              1 0         0 1         3 1
   Then the set
                                                  
                                 2        2    −8    3
                            C=               ,
                                 2       −3     3    1
has the right size to be a basis of Z. Let us see if it is a linearly independent set.
The relation of linear dependence
                                              
                            2 2           −8 3
                        a1          + a2           =O
                            2 −3           3 1
                                                          
                           2a1 − 8a2 2a1 + 3a2          0 0
                                                   =
                           2a1 + 3a2 −3a1 + a2          0 0
leads to the homogeneous system of equations whose coefficient matrix
                                          
                                     2 −8
                                  2     3
                                  2     3
                                    −3 1
row-reduces to
                                             
                                       1    0
                                     0     1
                                             
                                     0     0
                                       0    0
    So with a1 = a2 = 0 as the only solution, the set is linearly independent. Now
we can apply Theorem G to see that C also spans Z and therefore is a second basis
for Z.                                                                           4
Example SVP4 Sets of vectors in P4
In Example BSP4 we showed that
       
    B = x − 2, x2 − 4x + 4, x3 − 6x2 + 12x − 8, x4 − 8x3 + 24x2 − 32x + 16
is a basis for W = { p(x)| p ∈ P4 , p(2) = 0}. So dim (W ) = 4.
    The set
                   2
                   3x − 5x − 2, 2x2 − 7x + 6, x3 − 2x2 + x − 2
is a subset of W (check this) and it happens to be linearly independent (check this,
too). However, by Theorem G it cannot span W .
    The set
  2
   3x − 5x − 2, 2x2 − 7x + 6, x3 − 2x2 + x − 2, −x4 + 2x3 + 5x2 − 10x, x4 − 16
is another subset of W (check this) and Theorem G tells us that it must be linearly
dependent.
    The set
                       
                         x − 2, x2 − 2x, x3 − 2x2 , x4 − 2x3
is a third subset of W (check this) and is linearly independent (check this). Since it
has the right size to be a basis, and is linearly independent, Theorem G tells us that
it also spans W , and therefore is a basis of W .                                   4
   A simple consequence of Theorem G is the observation that a proper subspace
258                              Ro b e rt B e e z e r                               §P D

has strictly smaller dimension that its parent vector space. Hopefully this may seem
intuitively obvious, but it still requires proof, and we will cite this result later.
Theorem PSSD Proper Subspaces have Smaller Dimension
 Suppose that U and V are subspaces of the vector space W , such that U ( V . Then
dim (U ) < dim (V ).

Proof. Suppose that dim (U ) = m and dim (V ) = t. Then U has a basis B of size
m. If m > t, then by Theorem G, B is linearly dependent, which is a contradiction.
If m = t, then by Theorem G, B spans V . Then U = hBi = V , also a contradiction.
All that remains is that m < t, which is the desired conclusion.                

    The final theorem of this subsection is an extremely powerful tool for establishing
the equality of two sets that are subspaces. Notice that the hypotheses include the
equality of two integers (dimensions) while the conclusion is the equality of two
sets (subspaces). It is the extra “structure” of a vector space and its dimension that
makes possible this huge leap from an integer equality to a set equality.
Theorem EDYES Equal Dimensions Yields Equal Subspaces
Suppose that U and V are subspaces of the vector space W , such that U ⊆ V and
dim (U ) = dim (V ). Then U = V .

Proof. We give a proof by contradiction (Proof Technique CD). Suppose to the
contrary that U 6= V . Since U ⊆ V , there must be a vector v such that v ∈ V and
v 6∈ U . Let B = {u1 , u2 , u3 , . . . , ut } be a basis for U . Then, by Theorem ELIS,
the set C = B ∪ {v} = {u1 , u2 , u3 , . . . , ut , v} is a linearly independent set of t + 1
vectors in V . However, by hypothesis, V has the same dimension as U (namely t)
and therefore Theorem G says that C is too big to be linearly independent. This
contradiction shows that U = V .                                                          

Subsection RT
Ranks and Transposes
We now prove one of the most surprising theorems about matrices. Notice the paucity
of hypotheses compared to the precision of the conclusion.
Theorem RMRT Rank of a Matrix is the Rank of the Transpose
Suppose A is an m × n matrix. Then r (A) = r (At ).

Proof. Suppose we row-reduce A to the matrix B in reduced row-echelon form, and
B has r nonzero rows. The quantity r tells us three things about B: the number of
leading 1’s, the number of nonzero rows and the number of pivot columns. For this
proof we will be interested in the latter two.
    Theorem BRS and Theorem BCS each has a conclusion that provides a basis, for
the row space and the column space, respectively. In each case, these bases contain
r vectors. This observation makes the following go.

                 r (A) = dim (C(A))                    Definition ROM
                      =r                               Theorem BCS
                      = dim (R(A))                     Theorem BRS
                                 
                      = dim C At                       Theorem CSRST
                             
                      = r At                           Definition ROM
      Jacob Linenthal helped with this proof.                                            

   This says that the row space and the column space of a matrix have the same
dimension, which should be very surprising. It does not say that column space
and the row space are identical. Indeed, if the matrix is not square, then the sizes
(number of slots) of the vectors in each space are different, so the sets are not even
comparable.
§P D               A First Course in Linear Algebra                                 259

    It is not hard to construct by yourself examples of matrices that illustrate Theorem
RMRT, since it applies equally well to any matrix. Grab a matrix, row-reduce it,
count the nonzero rows or the number of pivot columns. That is the rank. Transpose
the matrix, row-reduce that, count the nonzero rows or the pivot columns. That is
the rank of the transpose. The theorem says the two will be equal. Every time. Here
is an example anyway.
Example RRTI Rank, rank of transpose, Archetype I
Archetype I has a 4 × 7 coefficient matrix which row-reduces to
                                                       
                           1 4 0         0 2 1 −3
                        0 0 1           0 1 −3 5 
                                                       
                        0 0 0           1 2 −6 6 
                           0 0 0         0 0 0        0
so the rank is 3. Row-reducing the transpose yields
                                                  
                                1    0   0 − 31  7
                                              12 
                             0      1   0     7   
                                              13 
                             0      0   1         
                                              7 
                             0      0   0     0 .
                             0      0   0     0 
                                                  
                             0      0   0     0 
                                0    0   0     0
demonstrating that the rank of the transpose is also 3.                              4

Subsection DFS
Dimension of Four Subspaces
That the rank of a matrix equals the rank of its transpose is a fundamental and
surprising result. However, applying Theorem FS we can easily determine the
dimension of all four fundamental subspaces associated with a matrix.
Theorem DFS Dimensions of Four Subspaces
Suppose that A is an m × n matrix, and B is a row-equivalent matrix in reduced
row-echelon form with r nonzero rows. Then

  1. dim (N (A)) = n − r

  2. dim (C(A)) = r

  3. dim (R(A)) = r

  4. dim (L(A)) = m − r

Proof. If A row-reduces to a matrix in reduced row-echelon form with r nonzero
rows, then the matrix C of extended echelon form (Definition EEF) will be an r × n
matrix in reduced row-echelon form with no zero rows and r pivot columns (Theorem
PEEF). Similarly, the matrix L of extended echelon form (Definition EEF) will be
an m − r × m matrix in reduced row-echelon form with no zero rows and m − r
pivot columns (Theorem PEEF).

              dim (N (A)) = dim (N (C))                  Theorem FS
                           =n−r                          Theorem BNS


               dim (C(A)) = dim (N (L))                  Theorem FS
                           = m − (m − r)                 Theorem BNS
                           =r


              dim (R(A)) = dim (R(C))                    Theorem FS
260                               Ro b e rt B e e z e r                               §P D

                             =r                              Theorem BRS


                dim (L(A)) = dim (R(L))                      Theorem FS
                             =m−r                            Theorem BRS


                                                                                          
    There are many different ways to state and prove this result, and indeed, the
equality of the dimensions of the column space and row space is just a slight expansion
of Theorem RMRT. However, we have restricted our techniques to applying Theorem
FS and then determining dimensions with bases provided by Theorem BNS and
Theorem BRS. This provides an appealing symmetry to the results and the proof.

Reading Questions

1. Why does Theorem G have the title it does?
2. Why is Theorem RMRT so surprising ?
3. Row-reduce the matrix A to reduced row-echelon form. Without any further computa-
   tions, compute the dimensions of the four subspaces, (a) N (A), (b) C(A), (c) R(A) and
   (d) L(A).
                                     1 −1       2   8      5
                                                            
                                   1     1    1    4     −1
                              A=
                                     0    2    −3 −8 −6
                                     2    0    1    8     4

Exercises
C10    Example SVP4 leaves several details for the reader to check. Verify these five claims.
      †
C40 Determine if the set T = x2 − x + 5, 4x3 − x2 + 5x, 3x + 2 spans the vector
                                   

space of polynomials with degree 4 or less, P4 . (Compare the solution to this exercise with
Solution LISS.C40.)
T05 Trivially, if U and V are two subspaces of W with U = V , then dim (U ) = dim (V ).
Combine this fact, Theorem PSSD, and Theorem EDYES all into one grand combined
theorem. You might look to Theorem PIP for stylistic inspiration. (Notice this problem
does not ask you to prove anything. It just asks you to roll up three theorems into one
compact, logically equivalent statement.)
T10 Prove the following theorem, which could be viewed as a reformulation of parts (3) and
(4) of Theorem G, or more appropriately as a corollary of Theorem G (Proof Technique LC).

Suppose V is a vector space and S is a subset of V such that the number of vectors
in S equals the dimension of V . Then S is linearly independent if and only if S spans V .
T15 Suppose that A is an m × n matrix and let min(m, n) denote the minimum of m
and n. Prove that r (A) ≤ min(m, n). (If m and n are two numbers, then min(m, n) stands
for the number that is the smaller of the two. For example min(4, 6) = 4.)
T20† Suppose that A is an m × n matrix and b ∈ Cm . Prove that the linear system
LS(A, b) is consistent if and only if r (A) = r ([ A | b]).
T25 Suppose that V is a vector space with finite dimension. Let W be any subspace of
V . Prove that W has finite dimension.
T33† Part of Exercise B.T50 is the half of the proof where we assume the matrix A is
nonsingular and prove that a set is a basis. In Solution B.T50 we proved directly that the
set was both linearly independent and a spanning set. Shorten this part of the proof by
applying Theorem G. Be careful, there is one subtlety.
T60† Suppose that W is a vector space with dimension 5, and U and V are subspaces of
W , each of dimension 3. Prove that U ∩ V contains a nonzero vector. State a more general
result.
Chapter D
Determinants

The determinant is a function that takes a square matrix as an input and produces
a scalar as an output. So unlike a vector space, it is not an algebraic structure.
However, it has many beneficial properties for studying vector spaces, matrices and
systems of equations, so it is hard to ignore (though some have tried). While the
properties of a determinant can be very useful, they are also complicated to prove.



Section DM
Determinant of a Matrix
Before we define the determinant of a matrix, we take a slight detour to introduce
elementary matrices. These will bring us back to the beginning of the course and
our old friend, row operations.



Subsection EM
Elementary Matrices
Elementary matrices are very simple, as you might have suspected from their name.
Their purpose is to effect row operations (Definition RO) on a matrix through matrix
multiplication (Definition MM). Their definitions look much more complicated than
they really are, so be sure to skip over them on your first reading and head right for
the explanation that follows and the first example.
Definition ELEM Elementary Matrices



  1. For i 6= j, Ei,j is the square matrix      of size n with
                                          
                                          
                                           0    k   6= i, k 6= j, ` 6= k
                                          
                                          
                                          
                                          1     k    6= i, k 6= j, ` = k
                                          
                                          
                                          0     k   = i, ` 6= j
                              [Ei,j ]k` =
                                          
                                           1    k   = i, ` = j
                                          
                                          
                                          
                                           0    k   = j, ` 6= i
                                          
                                          
                                          
                                            1    k   = j, ` = i

  2. For α 6= 0, Ei (α) is the square matrix of        size n with
                                          
                                          
                                          0            ` 6= k
                              [Ei (α)]k` = 1            k 6= i, ` = k
                                          
                                          
                                            α           k = i, ` = i

                                           261
262                                  Ro b e rt B e e z e r                                        §D M

   3. For i 6= j, Ei,j (α) is the square matrix         of size n with
                                           
                                           
                                            0          k   6= j, ` 6= k
                                           
                                           
                                           
                                           1           k    6= j, ` = k
                                           
                             [Ei,j (α)]k` = 0           k     = j, ` 6= i, ` 6= j
                                           
                                           
                                           
                                            1          k     = j, ` = j
                                           
                                           
                                           α           k     = j, ` = i

                                                                                                       
    Again, these matrices are not as complicated as their definitions suggest, since
they are just small perturbations of the n × n identity matrix (Definition IM). Ei,j
is the identity matrix with rows (or columns) i and j trading places, Ei (α) is the
identity matrix where the diagonal entry in row i and column i has been replaced
by α, and Ei,j (α) is the identity matrix where the entry in row j and column i
has been replaced by α. (Yes, those subscripts look backwards in the description of
Ei,j (α)). Notice that our notation makes no reference to the size of the elementary
matrix, since this will always be apparent from the context, or unimportant.
    The raison d’etre for elementary matrices is to “do” row operations on matrices
with matrix multiplication. So here is an example where we will both see some
elementary matrices and see how they accomplish row operations when used with
matrix multiplication.
Example EMRO Elementary matrices and row operations
We will perform a sequence of row operations (Definition RO) on the 3 × 4 matrix
A, while also multiplying the matrix on the left by the appropriate 3 × 3 elementary
matrix.
                                     "             #
                                      2 1 3 1
                                A= 1 3 2 4
                                      5 0 3 1
               "                #              "       #"                         # "                  #
                   5   0   3   1                0   0 1 2              1   3    1     5   0   3    1
 R1 ↔ R3 :         1   3   2   4        E1,3 : 0    1 0 1              3   2    4 = 1     3   2    4
                   2   1   3   1                1   0 0 5              0   3    1     2   1   3    1
               "                 #             "       #"                         # "                  #
                   5   0   3   1                1   0 0 5              0   3    1     5   0   3    1
       2R2 :       2   6   4   8      E2 (2) : 0    2 0 1              3   2    4 = 2     6   4    8
                   2   1   3   1                0   0 1 2              1   3    1     2   1   3    1
               "                 #             "       #"                         # "                  #
                   9   2   9   3                1   0 2 5              0   3    1     9   2   9    3
 2R3 + R1 :        2   6   4   8     E3,1 (2) : 0   1 0 2              6   4    8 = 2     6   4    8
                   2   1   3   1                0   0 1 2              1   3    1     2   1   3    1
                                                                                                       4
   The next three theorems establish that each elementary matrix effects a row
operation via matrix multiplication.
Theorem EMDRO Elementary Matrices Do Row Operations
Suppose that A is an m×n matrix, and B is a matrix of the same size that is obtained
from A by a single row operation (Definition RO). Then there is an elementary
matrix of size m that will convert A to B via matrix multiplication on the left. More
precisely,

  1. If the row operation swaps rows i and j, then B = Ei,j A.

  2. If the row operation multiplies row i by α, then B = Ei (α) A.

  3. If the row operation multiplies row i by α and adds the result to row j, then
     B = Ei,j (α) A.

Proof. In each of the three conclusions, performing the row operation on A will
create the matrix B where only one or two rows will have changed. So we will
establish the equality of the matrix entries row by row, first for the unchanged rows,
§D M                    A First Course in Linear Algebra                                        263

then for the changed rows, showing in each case that the result of the matrix product
is the same as the result of the row operation. Here we go.
    Row k of the product Ei,j A, where k 6= i, k 6= j, is unchanged from A,
                        n
                        X
        [Ei,j A]k` =          [Ei,j ]kp [A]p`                               Theorem EMP
                        p=1
                                              n
                                              X
                  = [Ei,j ]kk [A]k` +                 [Ei,j ]kp [A]p`       Property CACN
                                              p=1
                                              p6=k
                                    n
                                    X
                  = 1 [A]k` +               0 [A]p`                         Definition ELEM
                                    p=1
                                    p6=k

                  = [A]k`
   Row i of the product Ei,j A is row j of A,
                        n
                        X
         [Ei,j A]i` =         [Ei,j ]ip [A]p`                               Theorem EMP
                        p=1
                                              n
                                              X
                   = [Ei,j ]ij [A]j` +               [Ei,j ]ip [A]p`        Property CACN
                                              p=1
                                              p6=j
                                    n
                                    X
                   = 1 [A]j` +              0 [A]p`                         Definition ELEM
                                    p=1
                                    p6=j

                   = [A]j`
   Row j of the product Ei,j A is row i of A,
                        n
                        X
         [Ei,j A]j` =         [Ei,j ]jp [A]p`                               Theorem EMP
                        p=1
                                              n
                                              X
                   = [Ei,j ]ji [A]i` +               [Ei,j ]jp [A]p`        Property CACN
                                              p=1
                                              p6=i
                                    n
                                    X
                   = 1 [A]i` +              0 [A]p`                         Definition ELEM
                                    p=1
                                    p6=i

                   = [A]i`
   So the matrix product Ei,j A is the same as the row operation that swaps rows i
and j.
   Row k of the product Ei (α) A, where k 6= i, is unchanged from A,
                        n
                        X
     [Ei (α) A]k` =           [Ei (α)]kp [A]p`                                Theorem EMP
                        p=1
                                                 n
                                                 X
                   = [Ei (α)]kk [A]k` +                  [Ei (α)]kp [A]p`     Property CACN
                                                 p=1
                                                 p6=k
                                     n
                                     X
                   = 1 [A]k` +              0 [A]p`                           Definition ELEM
                                     p=1
                                     p6=k

                   = [A]k`
   Row i of the product Ei (α) A is α times row i of A,
                        n
                        X
       [Ei (α) A]i` =         [Ei (α)]ip [A]p`                               Theorem EMP
                        p=1
264                                   Ro b e rt B e e z e r                                     §D M

                                                n
                                                X
                    = [Ei (α)]ii [A]i` +               [Ei (α)]ip [A]p`         Property CACN
                                                p=1
                                                p6=i
                                     n
                                     X
                    = α [A]i` +             0 [A]p`                             Definition ELEM
                                     p=1
                                     p6=i

                    = α [A]i`
   So the matrix product Ei (α) A is the same as the row operation that swaps
multiplies row i by α.
   Row k of the product Ei,j (α) A, where k 6= j, is unchanged from A,
                         n
                         X
      [Ei,j (α) A]k` =         [Ei,j (α)]kp [A]p`                                Theorem EMP
                         p=1
                                                   n
                                                   X
                    = [Ei,j (α)]kk [A]k` +                [Ei,j (α)]kp [A]p`     Property CACN
                                                   p=1
                                                   p6=k
                                     n
                                     X
                    = 1 [A]k` +             0 [A]p`                              Definition ELEM
                                     p=1
                                     p6=k

                    = [A]k`
    Row j of the product Ei,j (α) A, is α times row i of A and then added to row j
of A,
                    n
                    X
   [Ei,j (α) A]j` =   [Ei,j (α)]jp [A]p`                       Theorem EMP
                     p=1

                   = [Ei,j (α)]jj [A]j` +
                                                      n
                                                      X
                          [Ei,j (α)]ji [A]i` +             [Ei,j (α)]jp [A]p`     Property CACN
                                                  p=1
                                                 p6=j,i
                                                 n
                                                 X
                   = 1 [A]j` + α [A]i` +                  0 [A]p`                 Definition ELEM
                                                  p=1
                                                 p6=j,i

                   = [A]j` + α [A]i`
   So the matrix product Ei,j (α) A is the same as the row operation that multiplies
row i by α and adds the result to row j.                                          
      Later in this section we will need two facts about elementary matrices.
Theorem EMN Elementary Matrices are Nonsingular
If E is an elementary matrix, then E is nonsingular.
Proof. We show that we can row-reduce each elementary matrix to the identity
matrix. Given an elementary matrix of the form Ei,j , perform the row operation
that swaps row j with row i. Given an elementary matrix of the form Ei (α), with
α 6= 0, perform the row operation that multiplies row i by 1/α. Given an elementary
                                    6 0, perform the row operation that multiplies
matrix of the form Ei,j (α), with α =
row i by −α and adds it to row j. In each case, the result of the single row operation
is the identity matrix. So each elementary matrix is row-equivalent to the identity
matrix, and by Theorem NMRRI is nonsingular.                                         
   Notice that we have now made use of the nonzero restriction on α in the definition
of Ei (α). One more key property of elementary matrices.
Theorem NMPEM Nonsingular Matrices are Products of Elementary Matrices
Suppose that A is a nonsingular matrix. Then there exists elementary matrices
E1 , E2 , E3 , . . . , Et so that A = E1 E2 E3 . . . Et .
§D M                A First Course in Linear Algebra                                 265

Proof. Since A is nonsingular, it is row-equivalent to the identity matrix by Theorem
NMRRI, so there is a sequence of t row operations that converts I to A. For each of
these row operations, form the associated elementary matrix from Theorem EMDRO
and denote these matrices by E1 , E2 , E3 , . . . , Et . Applying the first row operation
to I yields the matrix E1 I. The second row operation yields E2 (E1 I), and the third
row operation creates E3 E2 E1 I. The result of the full sequence of t row operations
will yield A, so
                        A = Et . . . E3 E2 E1 I = Et . . . E3 E2 E1
   Other than the cosmetic matter of re-indexing these elementary matrices in the
opposite order, this is the desired result.                                    


Subsection DD
Definition of the Determinant
We will now turn to the definition of a determinant and do some sample computations.
The definition of the determinant function is recursive, that is, the determinant of
a large matrix is defined in terms of the determinant of smaller matrices. To this
end, we will make a few definitions.
Definition SM SubMatrix
Suppose that A is an m×n matrix. Then the submatrix A (i|j) is the (m−1)×(n−1)
matrix obtained from A by removing row i and column j.                      
Example SS Some submatrices
For the matrix
                                        "             #
                                     1      −2    3 9
                                  A= 4      −2    0 1
                                     3       5    2 1
we have the submatrices
                                                                         
                       1     −2    9                              −2   3   9
            A (2|3) =                                A (3|1) =
                       3      5    1                              −2   0   1
                                                                                       4
Definition DM Determinant of a Matrix
Suppose A is a square matrix. Then its determinant, det (A) = |A|, is an element
of C defined recursively by:

   1. If A is a 1 × 1 matrix, then det (A) = [A]11 .

   2. If A is a matrix of size n with n ≥ 2, then
         det (A) = [A]11 det (A (1|1)) − [A]12 det (A (1|2)) + [A]13 det (A (1|3)) −
                    [A]14 det (A (1|4)) + · · · + (−1)n+1 [A]1n det (A (1|n))

                                                                                       
    So to compute the determinant of a 5 × 5 matrix we must build 5 submatrices,
each of size 4. To compute the determinants of each the 4 × 4 matrices we need
to create 4 submatrices each, these now of size 3 and so on. To compute the
determinant of a 10 × 10 matrix would require computing the determinant of
10! = 10 × 9 × 8 × 7 × 6 × 5 × 4 × 3 × 2 = 3, 628, 800 1 × 1 matrices. Fortunately there
are better ways. However this does suggest an excellent computer programming
exercise to write a recursive procedure to compute a determinant.
    Let us compute the determinant of a reasonably sized matrix by hand.
Example D33M Determinant of a 3 × 3 matrix
Suppose that we have the 3 × 3 matrix
                                 "          #
                                    3  2 −1
                             A= 4      1  6
                                   −3 −1 2
266                                 Ro b e rt B e e z e r                               §D M

      Then
                        3           2   −1
        det (A) = |A| = 4           1    6
                        −3         −1    2
                             1      6     4     6         4        1
                       =3             −2          + (−1)
                            −1      2    −3     2        −3       −1
                       = 3 (1 |2| − 6 |−1|) − 2 (4 |2| − 6 |−3|) − (4 |−1| − 1 |−3|)
                       = 3 (1(2) − 6(−1)) − 2 (4(2) − 6(−3)) − (4(−1) − 1(−3))
                       = 24 − 52 + 1
                       = −27
                                                                                          4
   In practice it is a bit silly to decompose a 2 × 2 matrix down into a couple of
1 × 1 matrices and then compute the exceedingly easy determinant of these puny
matrices. So here is a simple theorem.
Theorem DMST Determinant
                             of Matrices of Size Two
                 a b
Suppose that A =      . Then det (A) = ad − bc.
                 c d

Proof. Applying Definition DM,
                               a    b
                                      = a |d| − b |c| = ad − bc
                               c    d
                                                                                          

      Do you recall seeing the expression ad − bc before? (Hint: Theorem TTMI)

Subsection CD
Computing Determinants
There are a variety of ways to compute the determinant. We will establish first
that we can choose to mimic our definition of the determinant, but by using matrix
entries and submatrices based on a row other than the first one.
Theorem DER Determinant Expansion about Rows
Suppose that A is a square matrix of size n. Then for 1 ≤ i ≤ n
        det (A) = (−1)i+1 [A]i1 det (A (i|1)) + (−1)i+2 [A]i2 det (A (i|2))
                  + (−1)i+3 [A]i3 det (A (i|3)) + · · · + (−1)i+n [A]in det (A (i|n))
which is known as expansion about row i.

Proof. First, the statement of the theorem coincides with Definition DM when i = 1,
so throughout, we need only consider i > 1.
    Given the recursive definition of the determinant, it should be no surprise that we
will use induction for this proof (Proof Technique I). When n = 1, there is nothing
to prove since there is but one row. When n = 2, we just examine expansion about
the second row,
      (−1)2+1 [A]21 det (A (2|1)) + (−1)2+2 [A]22 det (A (2|2))
                    = − [A]21 [A]12 + [A]22 [A]11                      Definition DM
                    = [A]11 [A]22 − [A]12 [A]21
                    = det (A)                                          Theorem DMST


   So the theorem is true for matrices of size n = 1 and n = 2. Now assume the
result is true for all matrices of size n − 1 as we derive an expression for expansion
about row i for a matrix of size n. We will abuse our notation for a submatrix slightly,
so A (i1 , i2 |j1 , j2 ) will denote the matrix formed by removing rows i1 and i2 , along
with removing columns j1 and j2 . Also, as we take a determinant of a submatrix,
§D M                   A First Course in Linear Algebra                                        267

we will need to “jump up” the index of summation partway through as we “skip
over” a missing column. To do this smoothly we will set
                                      (
                                       0 `<j
                                `j =
                                       1 `>j
   Now,
 det (A)
      Xn
   =     (−1)1+j [A]1j det (A (1|j))                                            Definition DM
       j=1
       Xn                         X
   =         (−1)1+j [A]1j            (−1)i−1+`−`j [A]i` det (A (1, i|j, `))   Induction
       j=1                    1≤`≤n
                               `6=j
       n
       X X
   =                (−1)j+i+`−`j [A]1j [A]i` det (A (1, i|j, `))               Property DCN
       j=1 1≤`≤n
            `6=j
        n
       X X
   =                (−1)j+i+`−`j [A]1j [A]i` det (A (1, i|j, `))               Property CACN
       `=1 1≤j≤n
            j6=`
       Xn                      X
                i+`
   =         (−1)     [A]i`           (−1)j−`j [A]1j det (A (1, i|j, `))       Property DCN
       `=1                    1≤j≤n
                               j6=`
       n
       X                       X
   =         (−1)i+` [A]i`            (−1)`j +j [A]1j det (A (i, 1|`, j))      2`j is even
       `=1                    1≤j≤n
                               j6=`
       n
       X
   =     (−1)i+` [A]i` det (A (i|`))                                            Definition DM
       `=1
                                                                                                

    We can also obtain a formula that computes a determinant by expansion about
a column, but this will be simpler if we first prove a result about the interplay of
determinants and transposes. Notice how the following proof makes use of the ability
to compute a determinant by expanding about any row.
Theorem DT Determinant of the Transpose
Suppose that A is a square matrix. Then det (At ) = det (A).

Proof. With our definition of the determinant (Definition DM) and theorems like
Theorem DER, using induction (Proof Technique I) is a natural approach to proving
properties of determinants. And so it is here. Let n be the size of the matrix A, and
we will use induction on n.
   For n = 1, the transpose of a matrix is identical to the original matrix, so
vacuously, the determinants are equal.
   Now assume the result is true for matrices of size n − 1. Then,
                   n
               1X           
    det At =          det At
                n i=1
                       n   n
                    1 XX                              
                =             (−1)i+j At ij det At (i|j)               Theorem DER
                    n i=1 j=1
                     n   n
                  1 XX                                
                =           (−1)i+j [A]ji det At (i|j)                 Definition TM
                  n i=1 j=1
                       n
                    1 XX
                              n                            
                                                          t
                =             (−1)i+j [A]ji det (A (j|i))              Definition TM
                    n i=1 j=1
268                                    Ro b e rt B e e z e r                          §D M

                        n     n
                   1 XX
               =             (−1)i+j [A]ji det (A (j|i))            Induction Hypothesis
                   n i=1 j=1
                        n     n
                   1 XX
               =             (−1)j+i [A]ji det (A (j|i))            Property CACN
                   n j=1 i=1
                        n
                   1X
               =         det (A)                                    Theorem DER
                   n j=1
               = det (A)
                                                                                           
   Now we can easily get the result that a determinant can be computed by expansion
about any column as well.
Theorem DEC Determinant Expansion about Columns
Suppose that A is a square matrix of size n. Then for 1 ≤ j ≤ n
       det (A) = (−1)1+j [A]1j det (A (1|j)) + (−1)2+j [A]2j det (A (2|j))
                   + (−1)3+j [A]3j det (A (3|j)) + · · · + (−1)n+j [A]nj det (A (n|j))
which is known as expansion about column j.
Proof.
                           
          det (A) = det At                                            Theorem DT
                    Xn
                                               
                  =    (−1)j+i At ji det At (j|i)                     Theorem DER
                        i=1
                        Xn
                                                         
                                                          t
                    =         (−1)j+i At ji det (A (i|j))             Definition TM
                        i=1
                        Xn
                                      
                    =         (−1)j+i At ji det (A (i|j))             Theorem DT
                        i=1
                        Xn
                    =         (−1)i+j [A]ij det (A (i|j))             Definition TM
                        i=1
                                                                                           
   That the determinant of an n × n matrix can be computed in 2n different (albeit
similar) ways is nothing short of remarkable. For the doubters among us, we will do
an example, computing a 4 × 4 matrix in two different ways.
Example TCSD Two computations,                     same determinant
Let
                                                               
                         −2                         3    0    1
                        9                         −2    0    1
                    A=
                          1                         3   −2   −1
                          4                         1    2   6
      Then expanding about the fourth row (Theorem DER with i = 4) yields,
                            3             0    1              −2        0    1
          |A| = (4)(−1)4+1 −2             0    1 + (1)(−1)4+2 9         0    1
                            3            −2   −1              1        −2   −1
                                 −2            3     1              −2       3    0
                    + (2)(−1)4+3 9            −2     1 + (6)(−1)4+4 9       −2    0
                                  1            3    −1               1       3   −2
              = (−4)(10) + (1)(−22) + (−2)(61) + 6(46) = 92
      Expanding about column 3 (Theorem DEC with j = 3) gives
                             9          −2     1              −2       3 1
            |A| = (0)(−1)1+3 1           3    −1 + (0)(−1)2+3 1        3 −1 +
                             4           1     6               4       1 6
§D M                A First Course in Linear Algebra                                269

                              −2        3       1              −2       3    1
                  (−2)(−1)3+3 9        −2       1 + (2)(−1)4+3 9       −2    1
                               4        1       6               1       3   −1
              = 0 + 0 + (−2)(−107) + (−2)(61) = 92
   Notice how much easier the second computation was. By choosing to expand
about the third column, we have two entries that are zero, so two 3 × 3 determinants
need not be computed at all!                                                      4
    When a matrix has all zeros above (or below) the diagonal, exploiting the zeros
by expanding about the proper row or column makes computing a determinant
insanely easy.
Example DUTM Determinant of an upper triangular matrix
Suppose that
                                            
                       2 3 −1 3            3
                      0 −1 5         2 −1
                                            
                  T = 0 0       3    9    2
                      0 0       0 −1 3 
                       0 0       0    0    5
   We will compute the determinant of this 5 × 5 matrix by consistently expanding
about the first column for each submatrix that arises and does not have a zero entry
multiplying it.
                                   2    3       −1     3      3
                                   0   −1        5     2     −1
                        det (T ) = 0    0        3     9     2
                                   0    0        0    −1      3
                                   0    0        0     0     5
                                                −1     5 2        −1
                                          1+1    0     3 9         2
                                = 2(−1)
                                                 0     0 −1        3
                                                 0     0 0         5
                                               3             9    2
                                = 2(−1)(−1)1+1 0            −1    3
                                               0             0    5
                                                            −1    3
                                = 2(−1)(3)(−1)1+1
                                                             0    5
                                = 2(−1)(3)(−1)(−1)1+1 |5|
                                = 2(−1)(3)(−1)(5) = 30
                                                                                     4
    When you consult other texts in your study of determinants, you may run into
the terms “minor” and “cofactor,” especially in a discussion centered on expansion
about rows and columns. We have chosen not to make these definitions formally
since we have been able to get along without them. However, informally, a minor is
a determinant of a submatrix, specifically det (A (i|j)) and is usually referenced as
the minor of [A]ij . A cofactor is a signed minor, specifically the cofactor of [A]ij is
(−1)i+j det (A (i|j)).


Reading Questions

1. Construct the elementary matrix that will effect the row operation −6R2 + R3 on a
   4 × 7 matrix.
2. Compute the determinant of the matrix
                                                       
                                     2   3           −1
                                   3    8           2
                                     4 −1            −3
270                           Ro b e rt B e e z e r                                §D M

3. Compute the determinant of the matrix
                                   3 9     −2      4   2
                                                        
                                 0 1       4     −2   7
                                 0 0      −2      5   2
                                                        
                                 0 0       0     −1   6
                                   0 0      0     0    4


Exercises
C21†   Doing the computations by hand, find the determinant of the matrix below.
                                           
                                       1 3
                                       6 2

C22†   Doing the computations by hand, find the determinant of the matrix below.
                                           
                                       1 3
                                       2 6

C23†   Doing the computations by hand, find     the determinant of the matrix below.
                                                  
                                      1 3        2
                                    4 1         3
                                      1 0        1

C24†   Doing the computations by hand, find the determinant of the matrix below.
                                                
                                   −2    3    −2
                                 −4 −2        1
                                   2     4     2

C25†   Doing the computations by hand, find the determinant of the matrix below.
                                              
                                     3 −1 4
                                   2    5    1
                                     2   0    6

C26†   Doing the computations by hand, find the determinant of the matrix A.
                                     2 0 3 2
                                                 
                                    5 1 2 4
                                A=
                                     3 0 1 2
                                     5 3 2 1

C27†   Doing the computations by hand, find the determinant of the matrix A.
                                     1 0     1    1
                                                  
                                   2 2 −1 1
                               A=
                                     2 1     3    0
                                     1 1     0    1

C28†   Doing the computations by hand, find the determinant of the matrix A.
                                     1   0    1    1
                                                    
                                   2 −1 −1 1
                              A=
                                     2   5    3    0
                                     1 −1     0    1

C29†   Doing the computations by hand,   find the determinant of the matrix A.
                                    2     3 0 2 1
                                                     
                                   0     1 1 1 2
                              A = 0      0 1 2 3
                                                     
                                   0     1 2 1 0
                                    0     0 0 1 2

C30†   Doing the computations by hand, find the determinant of the matrix A.
                                   2 1 1       0    1
                                                    
                                  2 1 2 −1 1
                             A = 0 0 1        2    0
                                                    
                                  1 0 3       1    1
                                   2 1 1       2    1
§D M                  A First Course in Linear Algebra                                    271
                                                             
                                                        2   4
M10†     Find a value of k so that the matrix A =               has det(A) = 0, or explain why
                                                        3   k
it is not possible.
                                                         
                                                 1 2 1
     †
M11 Find a value of k so that the matrix A = 2 0 1 has det(A) = 0, or explain
                                                 2 3 k
why it is not possible.
                                        
                              2−x     1
M15† Given the matrix B =                  , find all values of x that are solutions of
                               4    2−x
det(B) = 0.
                                                      
                              4−x      −4         −4
     †
M16 Given the matrix B =  2        −2 − x        −4 , find all values of x that are
                                3      −3       −4 − x
solutions of det(B) = 0.
M30 The two matrices below are row-equivalent. How would you confirm this? Since the
matrices are row-equivalent, there is a sequence of row operations that converts X into Y ,
which would be a product of elementary matrices, M , such that M X = Y . Find M . (This
approach could be used to find the “9 scalars” of the very early Exercise RREF.M40.)
Hint: Compute the extended echelon form for both matrices, and then use the property
from Theorem PEEF that reads B = JA, where A is the original matrix, B is the echelon
form of the matrix and J is a nonsingular matrix obtained from extended echelon form.
Combine the two square matrices in the right way to obtain M .
                −1   3     1    −2     8                  −1 2      2    0    0
                                                                             
              −1    3     2    −1     4               −3 6       8   −1 1 
         X=                                        Y =
                2   −4 −3        2    −7                  0   1 −2 −2 9 
                −2   5     3    −2     8                  −1 4 −3 −3 16
272   Ro b e rt B e e z e r   §D M
Section PDM
Properties of Determinants of Matrices
We have seen how to compute the determinant of a matrix, and the incredible fact
that we can perform expansion about any row or column to make this computation.
In this largely theoretical section, we will state and prove several more intriguing
properties about determinants. Our main goal will be the two results in Theorem
SMZD and Theorem DRMM, but more specifically, we will see how the value of
a determinant will allow us to gain insight into the various properties of a square
matrix.

Subsection DRO
Determinants and Row Operations
We start easy with a straightforward theorem whose proof presages the style of
subsequent proofs in this subsection.
Theorem DZRC Determinant with Zero Row or Column
Suppose that A is a square matrix with a row where every entry is zero, or a column
where every entry is zero. Then det (A) = 0.
Proof. Suppose that A is a square matrix of size n and row i has every entry equal
to zero. We compute det (A) via expansion about row i.
                        n
                        X
          det (A) =           (−1)i+j [A]ij det (A (i|j))     Theorem DER
                        j=1
                        Xn
                  =           (−1)i+j 0 det (A (i|j))         Row i is zeros
                        j=1
                        Xn
                  =           0=0
                        j=1

   The proof for the case of a zero column is entirely similar, or could be derived
from an application of Theorem DT employing the transpose of the matrix.          
Theorem DRCS Determinant for Row or Column Swap
Suppose that A is a square matrix. Let B be the square matrix obtained from A by
interchanging the location of two rows, or interchanging the location of two columns.
Then det (B) = − det (A).
Proof. Begin with the special case where A is a square matrix of size n and we form
B by swapping adjacent rows i and i + 1 for some 1 ≤ i ≤ n − 1. Notice that the
assumption about swapping adjacent rows means that B (i + 1|j) = A (i|j) for all
1 ≤ j ≤ n, and [B]i+1,j = [A]ij for all 1 ≤ j ≤ n. We compute det (B) via expansion
about row i + 1.
                 n
                 X
      det (B) =     (−1)(i+1)+j [B]i+1,j det (B (i + 1|j))    Theorem DER
                  j=1
                  n
                  X
              =         (−1)(i+1)+j [A]ij det (A (i|j))          Hypothesis
                  j=1
                  Xn
              =         (−1)1 (−1)i+j [A]ij det (A (i|j))
                  j=1
                         n
                         X
              = (−1)            (−1)i+j [A]ij det (A (i|j))
                          j=1

              = − det (A)                                        Theorem DER
  So the result holds for the special case where we swap adjacent rows of the
matrix. As any computer scientist knows, we can accomplish any rearrangement of

                                                273
274                              Ro b e rt B e e z e r                     §P D M

an ordered list by swapping adjacent elements. This principle can be demonstrated
by naı̈ve sorting algorithms such as “bubble sort.” In any event, we do not need to
discuss every possible reordering, we just need to consider a swap of two rows, say
rows s and t with 1 ≤ s < t ≤ n.
    Begin with row s, and repeatedly swap it with each row just below it, including
row t and stopping there. This will total t − s swaps. Now swap the former row t,
which currently lives in row t − 1, with each row above it, stopping when it becomes
row s. This will total another t − s − 1 swaps. In this way, we create B through a
sequence of 2(t − s) − 1 swaps of adjacent rows, each of which adjusts det (A) by a
multiplicative factor of −1. So
                                                t−s
      det (B) = (−1)2(t−s)−1 det (A) = (−1)2         (−1)−1 det (A) = − det (A)
as desired.
   The proof for the case of swapping two columns is entirely similar, or could be
derived from an application of Theorem DT employing the transpose of the matrix.


   So Theorem DRCS tells us the effect of the first row operation (Definition RO)
on the determinant of a matrix. Here is the effect of the second row operation.
Theorem DRCM Determinant for Row or Column Multiples
Suppose that A is a square matrix. Let B be the square matrix obtained from A by
multiplying a single row by the scalar α, or by multiplying a single column by the
scalar α. Then det (B) = α det (A).

Proof. Suppose that A is a square matrix of size n and we form the square matrix
B by multiplying each entry of row i of A by α. Notice that the other rows of A
and B are equal, so A (i|j) = B (i|j), for all 1 ≤ j ≤ n. We compute det (B) via
expansion about row i.
                   n
                   X
         det (B) =    (−1)i+j [B]ij det (B (i|j))         Theorem DER
                     j=1
                     Xn
                 =     (−1)i+j [B]ij det (A (i|j))         Hypothesis
                     j=1
                     Xn
                 =         (−1)i+j α [A]ij det (A (i|j))   Hypothesis
                     j=1
                       Xn
                 =α          (−1)i+j [A]ij det (A (i|j))
                      j=1

                 = α det (A)                               Theorem DER


   The proof for the case of a multiple of a column is entirely similar, or could be
derived from an application of Theorem DT employing the transpose of the matrix.


   Let us go for understanding the effect of all three row operations. But first we
need an intermediate result, but it is an easy one.
Theorem DERC Determinant with Equal Rows or Columns
Suppose that A is a square matrix with two equal rows, or two equal columns. Then
det (A) = 0.

Proof. Suppose that A is a square matrix of size n where the two rows s and t are
equal. Form the matrix B by swapping rows s and t. Notice that as a consequence
of our hypothesis, A = B. Then
                     1
          det (A) = (det (A) + det (A))
                     2
                     1
                   = (det (A) − det (B))            Theorem DRCS
                     2
§P D M                  A First Course in Linear Algebra                           275

                      1
                   =    (det (A) − det (A))            Hypothesis, A = B
                      2
                      1
                   = (0) = 0
                      2
   The proof for the case of two equal columns is entirely similar, or could be derived
from an application of Theorem DT employing the transpose of the matrix.             

   Now explain the third row operation. Here we go.
Theorem DRCMA Determinant for Row or Column Multiples and Addition
Suppose that A is a square matrix. Let B be the square matrix obtained from A
by multiplying a row by the scalar α and then adding it to another row, or by
multiplying a column by the scalar α and then adding it to another column. Then
det (B) = det (A).

Proof. Suppose that A is a square matrix of size n. Form the matrix B by multiplying
row s by α and adding it to row t. Let C be the auxiliary matrix where we replace
row t of A by row s of A. Notice that A (t|j) = B (t|j) = C (t|j) for all 1 ≤ j ≤ n.
We compute the determinant of B by expansion about row t.
               Xn
     det (B) =    (−1)t+j [B]tj det (B (t|j))                 Theorem DER
                 j=1
               n
               X                       
             =   (−1)t+j α [A]sj + [A]tj det (B (t|j))          Hypothesis
                 j=1
                 Xn
             =         (−1)t+j α [A]sj det (B (t|j))
                 j=1
                        n
                        X
                  +           (−1)t+j [A]tj det (B (t|j))
                    j=1
                  n
                  X
             =α          (−1)t+j [A]sj det (B (t|j))
                  j=1
                        n
                        X
                  +           (−1)t+j [A]tj det (B (t|j))
                        j=1
                  n
                  X
             =α          (−1)t+j [C]tj det (C (t|j))
                  j=1
                        n
                        X
                  +           (−1)t+j [A]tj det (A (t|j))
                        j=1

             = α det (C) + det (A)                              Theorem DER
             = α 0 + det (A) = det (A)                          Theorem DERC
   The proof for the case of adding a multiple of a column is entirely similar, or
could be derived from an application of Theorem DT employing the transpose of
the matrix.                                                                     

    Is this what you expected? We could argue that the third row operation is the
most popular, and yet it has no effect whatsoever on the determinant of a matrix!
We can exploit this, along with our understanding of the other two row operations,
to provide another approach to computing a determinant. We’ll explain this in the
context of an example.
Example DRO Determinant by row operations
Suppose we desire the determinant of the 4 × 4 matrix
                                                
                                   2 0 2 3
                                  1 3 −1 1
                            A=
                                  −1 1 −1 2
                                   3 5 4 0
276                           Ro b e rt B e e z e r                           §P D M

    We will perform a sequence of row operations on this matrix, shooting for an upper
triangular matrix, whose determinant will be simply the product of its diagonal
entries. For each row operation, we will track the effect on the determinant via
Theorem DRCS, Theorem DRCM, Theorem DRCMA.
                                      
                        2 0 2 3
                       1 3 −1 1
                  A=                      det (A)
                       −1 1 −1 2
                        3 5 4 0
                                      
                        1 3 −1 1
       R ↔R2           2 0 2 3
      −−1−−−→    A1 =                     = − det (A1 )        Theorem DRCS
                       −1 1 −1 2
                        3 5 4 0
                                        
                        1   3 −1 1
    −2R1 +R2           0 −6 4 1
    −−−−  −−→ A2 =                        = − det (A2 )        Theorem DRCMA
                       −1 1 −1 2
                        3   5     4 0
                                      
                       1 3 −1 1
      1R1 +R3         0 −6 4 1
     −−−   −−→ A3 =                       = − det (A3 )        Theorem DRCMA
                       0 4 −2 3
                       3 5     4 0
                                        
                       1 3 −1 1
    −3R1 +R4          0 −6 4          1
    −−−−  −−→ A4 =                        = − det (A4 )        Theorem DRCMA
                       0 4 −2 3 
                       0 −4 7 −3
                                        
                       1 3 −1 1
      1R3 +R2         0 −2 2          4
     −−−   −−→ A5 =                       = − det (A5 )        Theorem DRCMA
                       0 4 −2 3 
                       0 −4 7 −3
                                        
                       1 3 −1 1
           1
         − 2 R2       0 1 −1 −2
        −−−  −→ A6 =                      = 2 det (A6 )        Theorem DRCM
                       0 4 −2 3 
                       0 −4 7 −3
                                        
                       1 3 −1 1
    −4R2 +R3          0 1 −1 −2
    −−−−  −−→ A7 =                        = 2 det (A7 )        Theorem DRCMA
                       0 0     2      11 
                       0 −4 7 −3
                                       
                       1 3 −1       1
      4R2 +R4         0 1 −1 −2 
     −−−   −−→ A8 =                       = 2 det (A8 )        Theorem DRCMA
                       0 0 2        11 
                       0 0 3 −11
                                       
                       1 3 −1       1
    −1R3 +R4          0 1 −1 −2 
    −−−−  −−→ A9 =                        = 2 det (A9 )        Theorem DRCMA
                       0 0 2        11 
                       0 0 1 −22
                                       
                       1 3 −1       1
   −2R4 +R3           0 1 −1 −2 
   −−−−  −−→ A10 =                        = 2 det (A10 )       Theorem DRCMA
                       0 0 0        55 
                       0 0 1 −22
                                       
                       1 3 −1       1
      R ↔R4           0 1 −1 −2 
     −−3−−−→    A11 =                     = −2 det (A11 )      Theorem DRCS
                       0 0 1 −22
                       0 0 0        55
                                       
                       1 3 −1       1
         1
           R4         0 1 −1 −2 
       −55
         −−→ A12 =                        = −110 det (A12 ) Theorem DRCM
                       0 0 1 −22
                       0 0 0         1
§P D M                A First Course in Linear Algebra                           277

    The matrix A12 is upper triangular, so expansion about the first column (repeat-
edly) will result in det (A12 ) = (1)(1)(1)(1) = 1 (see Example DUTM) and thus,
det (A) = −110(1) = −110.
    Notice that our sequence of row operations was somewhat ad hoc, such as the
transformation to A5 . We could have been even more methodical, and strictly
followed the process that converts a matrix to reduced row-echelon form (Theorem
REMEF), eventually achieving the same numerical result with a final matrix that
equaled the 4 × 4 identity matrix. Notice too that we could have stopped with A8 ,
since at this point we could compute det (A8 ) by two expansions about first columns,
followed by a simple determinant of a 2 × 2 matrix (Theorem DMST).
    The beauty of this approach is that computationally we should already have
written a procedure to convert matrices to reduced-row echelon form, so all we
need to do is track the multiplicative changes to the determinant as the algorithm
proceeds. Further, for a square matrix of size n this approach requires on the order
of n3 multiplications, while a recursive application of expansion about a row or
column (Theorem DER, Theorem DEC) will require in the vicinity of (n − 1)(n!)
multiplications. So even for very small matrices, a computational approach utilizing
row operations will have superior run-time. Tracking, and controlling, the effects of
round-off errors is another story, best saved for a numerical linear algebra course.4


Subsection DROEM
Determinants, Row Operations, Elementary Matrices
As a final preparation for our two most important theorems about determinants,
we prove a handful of facts about the interplay of row operations and matrix
multiplication with elementary matrices with regard to the determinant. But first, a
simple, but crucial, fact about the identity matrix.
Theorem DIM Determinant of the Identity Matrix
For every n ≥ 1, det (In ) = 1.

Proof. It may be overkill, but this is a good situation to run through a proof by
induction on n (Proof Technique I). Is the result true when n = 1? Yes,
                    det (I1 ) = [I1 ]11                Definition DM
                              =1                       Definition IM


   Now assume the theorem is true for the identity matrix of size n−1 and investigate
the determinant of the identity matrix of size n with expansion about row 1,
               n
               X
   det (In ) =   (−1)1+j [In ]1j det (In (1|j))            Definition DM
              j=1

           = (−1)1+1 [In ]11 det (In (1|1))
                 Xn
               +    (−1)1+j [In ]1j det (In (1|j))
                     j=2
                                 n
                                 X
           = 1 det (In−1 ) +       (−1)1+j 0 det (In (1|j))   Definition IM
                                 j=2
                        n
                        X
           = 1(1) +           0=1                             Induction Hypothesis
                        j=2



                                                                                     

Theorem DEM Determinants of Elementary Matrices
For the three possible versions of an elementary matrix (Definition ELEM) we have
the determinants,
278                              Ro b e rt B e e z e r                        §P D M

  1. det (Ei,j ) = −1

  2. det (Ei (α)) = α

  3. det (Ei,j (α)) = 1

Proof. Swapping rows i and j of the identity matrix will create Ei,j (Definition
ELEM), so
                 det (Ei,j ) = − det (In )               Theorem DRCS
                            = −1                         Theorem DIM


  Multiplying row i of the identity matrix by α will create Ei (α) (Definition
ELEM), so
                det (Ei (α)) = α det (In )               Theorem DRCM
                             = α(1) = α                  Theorem DIM


    Multiplying row i of the identity matrix by α and adding to row j will create
Ei,j (α) (Definition ELEM), so
                det (Ei,j (α)) = det (In )               Theorem DRCMA
                              =1                         Theorem DIM


                                                                                    

Theorem DEMMM Determinants, Elementary Matrices, Matrix Multiplication
Suppose that A is a square matrix of size n and E is any elementary matrix of size
n. Then
                                det (EA) = det (E) det (A)

Proof. The proof procedes in three parts, one for each type of elementary matrix,
with each part very similar to the other two.
   First, let B be the matrix obtained from A by swapping rows i and j,
             det (Ei,j A) = det (B)                       Theorem EMDRO
                          = − det (A)                     Theorem DRCS
                          = det (Ei,j ) det (A)           Theorem DEM
      Second, let B be the matrix obtained from A by multiplying row i by α,
           det (Ei (α) A) = det (B)                          Theorem EMDRO
                           = α det (A)                       Theorem DRCM
                           = det (Ei (α)) det (A)            Theorem DEM
    Third, let B be the matrix obtained from A by multiplying row i by α and adding
to row j,
           det (Ei,j (α) A) = det (B)                        Theorem EMDRO
                           = det (A)                         Theorem DRCMA
                           = det (Ei,j (α)) det (A)          Theorem DEM
   Since the desired result holds for each variety of elementary matrix individually,
we are done.                                                                      

Subsection DNMMM
Determinants, Nonsingular Matrices, Matrix Multiplication
If you asked someone with substantial experience working with matrices about the
value of the determinant, they’d be likely to quote the following theorem as the first
thing to come to mind.
§P D M               A First Course in Linear Algebra                             279

Theorem SMZD Singular Matrices have Zero Determinants
Let A be a square matrix. Then A is singular if and only if det (A) = 0.

Proof. Rather than jumping into the two halves of the equivalence, we first establish
a few items. Let B be the unique square matrix that is row-equivalent to A and
in reduced row-echelon form (Theorem REMEF, Theorem RREFU). For each of
the row operations that converts B into A, there is an elementary matrix Ei which
effects the row operation by matrix multiplication (Theorem EMDRO). Repeated
applications of Theorem EMDRO allow us to write
                                 A = Es Es−1 . . . E2 E1 B
   Then
  det (A) = det (Es Es−1 . . . E2 E1 B)
          = det (Es ) det (Es−1 ) . . . det (E2 ) det (E1 ) det (B)   Theorem DEMMM
    From Theorem DEM we can infer that the determinant of an elementary matrix
is never zero (note the ban on α = 0 for Ei (α) in Definition ELEM). So the product
on the right is composed of nonzero scalars, with the possible exception of det (B).
More precisely, we can argue that det (A) = 0 if and only if det (B) = 0. With this
established, we can take up the two halves of the equivalence.
    (⇒) If A is singular, then by Theorem NMRRI, B cannot be the identity matrix.
Because (1) the number of pivot columns is equal to the number of nonzero rows, (2)
not every column is a pivot column, and (3) B is square, we see that B must have a
zero row. By Theorem DZRC the determinant of B is zero, and by the above, we
conclude that the determinant of A is zero.
    (⇐) We will prove the contrapositive (Proof Technique CP). So assume A is
nonsingular, then by Theorem NMRRI, B is the identity matrix and Theorem DIM
tells us that det (B) = 1 6= 0. With the argument above, we conclude that the
determinant of A is nonzero as well.                                              

  For the case of 2 × 2 matrices you might compare the application of Theorem
SMZD with the combination of the results stated in Theorem DMST and Theorem
TTMI.
Example ZNDAB Zero and nonzero determinant, Archetypes A and B
The coefficient matrix in Archetype A has a zero determinant (check this!) while the
coefficient matrix Archetype B has a nonzero determinant (check this, too). These
matrices are singular and nonsingular, respectively. This is exactly what Theorem
SMZD says, and continues our list of contrasts between these two archetypes. 4
    In Section MINM we said “singular matrices are a distinct minority.” If you built
a random matrix and took its determinant, how likely would it be that you got zero?
    Since Theorem SMZD is an equivalence (Proof Technique E) we can expand on
our growing list of equivalences about nonsingular matrices. The addition of the
condition det (A) 6= 0 is one of the best motivations for learning about determinants.
Theorem NME7 Nonsingular Matrix Equivalences, Round 7
Suppose that A is a square matrix of size n. The following are equivalent.

  1. A is nonsingular.

  2. A row-reduces to the identity matrix.

  3. The null space of A contains only the zero vector, N (A) = {0}.

  4. The linear system LS(A, b) has a unique solution for every possible choice of
     b.

  5. The columns of A are a linearly independent set.

  6. A is invertible.

  7. The column space of A is Cn , C(A) = Cn .
280                             Ro b e rt B e e z e r                           §P D M

  8. The columns of A are a basis for Cn .
  9. The rank of A is n, r (A) = n.
 10. The nullity of A is zero, n (A) = 0.
 11. The determinant of A is nonzero, det (A) 6= 0.

Proof. Theorem SMZD says A is singular if and only if det (A) = 0. If we negate
each of these statements, we arrive at two contrapositives that we can combine as
the equivalence, A is nonsingular if and only if det (A) 6= 0. This allows us to add a
new statement to the list found in Theorem NME6.                                    
    Computationally, row-reducing a matrix is the most efficient way to determine
if a matrix is nonsingular, though the effect of using division in a computer can
lead to round-off errors that confuse small quantities with critical zero quantities.
Conceptually, the determinant may seem the most efficient way to determine if a
matrix is nonsingular. The definition of a determinant uses just addition, subtraction
and multiplication, so division is never a problem. And the final test is easy: is the
determinant zero or not? However, the number of operations involved in computing a
determinant by the definition very quickly becomes so excessive as to be impractical.
    Now for the coup de grace. We will generalize Theorem DEMMM to the case of any
two square matrices. You may recall thinking that matrix multiplication was defined
in a needlessly complicated manner. For sure, the definition of a determinant seems
even stranger. (Though Theorem SMZD might be forcing you to reconsider.) Read
the statement of the next theorem and contemplate how nicely matrix multiplication
and determinants play with each other.
Theorem DRMM Determinant Respects Matrix Multiplication
Suppose that A and B are square matrices of the same size. Then det (AB) =
det (A) det (B).
Proof. This proof is constructed in two cases. First, suppose that A is singular.
Then det (A) = 0 by Theorem SMZD. By the contrapositive of Theorem NPNT,
AB is singular as well. So by a second application of Theorem SMZD, det (AB) = 0.
Putting it all together
                     det (AB) = 0 = 0 det (B) = det (A) det (B)
as desired.
   For the second case, suppose that A is nonsingular. By Theorem NMPEM there
are elementary matrices E1 , E2 , E3 , . . . , Es such that A = E1 E2 E3 . . . Es . Then
  det (AB) = det (E1 E2 E3 . . . Es B)
            = det (E1 ) det (E2 ) det (E3 ) . . . det (Es ) det (B)   Theorem DEMMM
            = det (E1 E2 E3 . . . Es ) det (B)                        Theorem DEMMM
            = det (A) det (B)
                                                                                      
   It is amazing that matrix multiplication and the determinant interact this way.
Might it also be true that det (A + B) = det (A) + det (B)? (Exercise PDM.M30)

Reading Questions

1. Consider the two matrices below, and suppose you already have computed det (A) = −120.
   What is det (B)? Why?
                     0    8     3    −4                     0    8    3   −4
                                                                          
                  −1     2    −2     5                 0     −4    2   −3
              A=                                   B=
                    −2    8     4     3                   −2    8   4     3
                     0   −4     2    −3                    −1    2   −2    5

2. State the theorem that allows us to make yet another extension to our NMEx series of
   theorems.
§P D M               A First Course in Linear Algebra                                   281

3. What is amazing about the interaction between matrix multiplication and the determi-
   nant?

Exercises
C30 Each of the archetypes below is a system of equations with a square coefficient
matrix, or is a square matrix itself. Compute the determinant of each matrix, noting how
Theorem SMZD indicates when the matrix is singular or nonsingular.

Archetype A, Archetype B, Archetype F, Archetype K, Archetype L
M20† Construct a 3 × 3 nonsingular matrix and call it A. Then, for each entry of the
matrix, compute the corresponding cofactor, and create a new 3 × 3 matrix full of these
cofactors by placing the cofactor of an entry in the same location as the entry it was based
on. Once complete, call this matrix C. Compute AC t . Any observations? Repeat with a
new matrix, or perhaps with a 4 × 4 matrix.
M30 Construct an example to show that the following statement is not true for all square
matrices A and B of the same size: det (A + B) = det (A) + det (B).
T10 Theorem NPNT says that if the product of square matrices AB is nonsingular, then
the individual matrices A and B are nonsingular also. Construct a new proof of this result
making use of theorems about determinants of matrices.
T15 Use Theorem DRCM to prove Theorem DZRC as a corollary. (See Proof Technique
LC.)
T20 Suppose that A is a square matrix of size n and α ∈ C is a scalar. Prove that
det (αA) = αn det (A).
T25 Employ Theorem DT to construct the second half of the proof of Theorem DRCM
(the portion about a multiple of a column).
282   Ro b e rt B e e z e r   §P D M
Chapter E
Eigenvalues

When we have a square matrix of size n, A, and we multiply it by a vector x from Cn
to form the matrix-vector product (Definition MVP), the result is another vector in
Cn . So we can adopt a functional view of this computation — the act of multiplying
by a square matrix is a function that converts one vector (x) into another one (Ax)
of the same size. For some vectors, this seemingly complicated computation is really
no more complicated than scalar multiplication. The vectors vary according to the
choice of A, so the question is to determine, for an individual choice of A, if there
are any such vectors, and if so, which ones. It happens in a variety of situations that
these vectors (and the scalars that go along with them) are of special interest.
    We will be solving polynomial equations in this chapter, which raises the specter of
complex numbers as roots. This distinct possibility is our main reason for entertaining
the complex numbers throughout the course. You might be moved to revisit Section
CNO and Section O.



Section EE
Eigenvalues and Eigenvectors
In this section, we will define the eigenvalues and eigenvectors of a matrix, and see
how to compute them. More theoretical properties will be taken up in the next
section.


Subsection EEM
Eigenvalues and Eigenvectors of a Matrix
We start with the principal definition for this chapter.
Definition EEM Eigenvalues and Eigenvectors of a Matrix
Suppose that A is a square matrix of size n, x 6= 0 is a vector in Cn , and λ is a
scalar in C. Then we say x is an eigenvector of A with eigenvalue λ if
                                       Ax = λx
                                                                                     
   Before going any further, perhaps we should convince you that such things ever
happen at all. Understand the next example, but do not concern yourself with where
the pieces come from. We will have methods soon enough to be able to discover
these eigenvectors ourselves.
Example SEE Some eigenvalues and eigenvectors
Consider the matrix
                                                          
                          204     98   −26             −10
                       −280 −134 36                    14 
                    A=
                          716    348 −90               −36
                         −472 −232 60                   28

                                          283
284                               Ro b e rt B e e z e r                           §E E

and the vectors
                                                                   
                                                                         
              1                  −3                 −3                 1
            −1                4                7                −1
       x=                  y=                 z=               w= 
              2                 −10                0                  4
              5                   4                 8                  0
      Then
                                                  
                204       98  −26 −10     1    4         1
              −280      −134 36   14  −1 −4      −1
         Ax =                              =      = 4   = 4x
                716       348 −90 −36  2   8       2
               −472      −232 60   28     5   20        5
so x is an eigenvector of A with eigenvalue λ = 4.
   Also,
                                                        
                 204      98   −26 −10       −3      0      −3
               −280 −134 36          14   4  0       4 
         Ay =                                     =   = 0      = 0y
                 716     348 −90 −36 −10 0            −10
                −472 −232 60          28      4      0      4
so y is an eigenvector of A with eigenvalue λ = 0.
   Also,
                                                       
                 204      98   −26 −10       −3      −6      −3
               −280 −134 36          14   7   14      7
         Az =                                     =    = 2   = 2z
                 716     348 −90 −36  0   0              0
                −472 −232 60          28      8      16       8
so z is an eigenvector of A with eigenvalue λ = 2.
    Also,
                                                      
                  204     98   −26 −10        1      2       1
               −280 −134 36           14  −1 −2      −1
         Aw =                                     =   = 2   = 2w
                  716     348 −90 −36  4   8            4
                 −472 −232 60          28     0      0       0
so w is an eigenvector of A with eigenvalue λ = 2.
    So we have demonstrated four eigenvectors of A. Are there more? Yes, any
nonzero scalar multiple of an eigenvector is again an eigenvector. In this example,
set u = 30x. Then
                   Au = A(30x)
                      = 30Ax                      Theorem MMSMM
                      = 30(4x)                    Definition EEM
                      = 4(30x)                    Property SMAM
                      = 4u
so that u is also an eigenvector of A for the same eigenvalue, λ = 4.
    The vectors z and w are both eigenvectors of A for the same eigenvalue λ = 2,
yet this is not as simple as the two vectors just being scalar multiples of each other
(they are not). Look what happens when we add them together, to form v = z + w,
and multiply by A,
                   Av = A(z + w)
                      = Az + Aw                    Theorem MMDAA
                      = 2z + 2w                    Definition EEM
                      = 2(z + w)                   Property DVAC
                      = 2v
so that v is also an eigenvector of A for the eigenvalue λ = 2. So it would appear
that the set of eigenvectors that are associated with a fixed eigenvalue is closed under
the vector space operations of Cn . Hmmm.
    The vector y is an eigenvector of A for the eigenvalue λ = 0, so we can use
Theorem ZSSM to write Ay = 0y = 0. But this also means that y ∈ N (A). There
would appear to be a connection here also.                                            4
§E E               A First Course in Linear Algebra                                285

   Example SEE hints at a number of intriguing properties, and there are many
more. We will explore the general properties of eigenvalues and eigenvectors in
Section PEE, but in this section we will concern ourselves with the question of
actually computing eigenvalues and eigenvectors. First we need a bit of background
material on polynomials and matrices.


Subsection PM
Polynomials and Matrices
A polynomial is a combination of powers, multiplication by scalar coefficients, and
addition (with subtraction just being the inverse of addition). We never have occasion
to divide when computing the value of a polynomial. So it is with matrices. We can
add and subtract matrices, we can multiply matrices by scalars, and we can form
powers of square matrices by repeated applications of matrix multiplication. We do
not normally divide matrices (though sometimes we can multiply by an inverse). If a
matrix is square, all the operations constituting a polynomial will preserve the size
of the matrix. So it is natural to consider evaluating a polynomial with a matrix,
effectively replacing the variable of the polynomial by a matrix. We will demonstrate
with an example.
Example PM Polynomial of a matrix
Let
                                                          "             #
                                                           −1       3 2
                                2      3    4
          p(x) = 14 + 19x − 3x − 7x + x                 D= 1        0 −2
                                                           −3       1 1
and we will compute p(D).
   First, the necessary powers of D. Notice that D0 is defined to be the multiplicative
identity, I3 , as will be the case in general.
                      "         #
                        1 0 0
           0
        D = I3 = 0 1 0
                        0 0 1
                      "             #
                        −1 3 2
        D1 = D = 1 0 −2
                        −3 1 1
                         "             #"         # "                   #
                           −1 3 2        −1 3 2          −2 −1 −6
           2        1
        D = DD = 1 0 −2                    1 0 −2 = 5           1     0
                           −3 1 1        −3 1 1            1 −8 −7
                         "             #"            # "                    #
                           −1 3 2        −2 −1 −6           19 −12 −8
           3        2
        D = DD = 1 0 −2                    5   1  0 = −4 15              8
                           −3 1 1          1 −8 −7          12 −4 11
                         "             #"             # "                     #
                           −1 3 2         19 −12 −8          −7 49         54
           4        3
        D = DD = 1 0 −2 −4 15                      8 = −5 −4 −30
                           −3 1 1         12 −4 11           −49 47        43


   Then
           p(D) = 14 + 19D − 3D2 − 7D3 + D4
                     "       #     "         #   "                        #
                       1 0 0        −1 3 2         −2 −1               −6
                = 14 0 1 0 + 19 1 0 −2 − 3 5             1              0
                       0 0 1         −3 1 1         1 −8               −7
                        "             # "              #
                          19 −12 −8       −7 49     54
                    − 7 −4 15       8 + −5 −4 −30
                          12 −4 11        −49 47    43
                  "                  #
                   −139 193     166
                =    27    −98 −124
                    −193 118     20
286                                 Ro b e rt B e e z e r                          §E E

      Notice that p(x) factors as
              p(x) = 14 + 19x − 3x2 − 7x3 + x4 = (x − 2)(x − 7)(x + 1)2
   Because D commutes with itself (DD = DD), we can use distributivity of matrix
multiplication across matrix addition (Theorem MMDAA) without being careful
with any of the matrix products, and just as easily evaluate p(D) using the factored
form of p(x),
         p(D) = 14 + 19D − 3D2 − 7D3 + D4 = (D − 2I3 )(D − 7I3 )(D + I3 )2
                "            #"             #"            #2
                 −3 3      2    −8 3      2    0 3 2
              = 1 −2 −2           1 −7 −2      1 1 −2
                 −3 1 −1        −3 1 −6 −3 1 2
                "                  #
                 −139 193     166
              = 27      −98 −124
                 −193 118      20
    This example is not meant to be too profound. It is meant to show you that it is
natural to evaluate a polynomial with a matrix, and that the factored form of the
polynomial is as good as (or maybe better than) the expanded form. And do not
forget that constant terms in polynomials are really multiples of the identity matrix
when we are evaluating the polynomial with a matrix.                               4


Subsection EEE
Existence of Eigenvalues and Eigenvectors
Before we embark on computing eigenvalues and eigenvectors, we will prove that
every matrix has at least one eigenvalue (and an eigenvector to go with it). Later, in
Theorem MNEM, we will determine the maximum number of eigenvalues a matrix
may have.
    The determinant (Definition DM) will be a powerful tool in Subsection EE.CEE
when it comes time to compute eigenvalues. However, it is possible, with some
more advanced machinery, to compute eigenvalues without ever making use of the
determinant. Sheldon Axler does just that in his book, Linear Algebra Done Right.
Here and now, we give Axler’s “determinant-free” proof that every matrix has an
eigenvalue. The result is not too startling, but the proof is most enjoyable.
Theorem EMHE Every Matrix Has an Eigenvalue
Suppose A is a square matrix. Then A has at least one eigenvalue.

Proof. Suppose that A has size n, and choose x as any nonzero vector from Cn .
(Notice how much latitude we have in our choice of x. Only the zero vector is
off-limits.) Consider the set
                              
                         S = x, Ax, A2 x, A3 x, . . . , An x
   This is a set of n + 1 vectors from Cn , so by Theorem MVSLD, S is linearly
dependent. Let a0 , a1 , a2 , . . . , an be a collection of n + 1 scalars from C, not all
zero, that provide a relation of linear dependence on S. In other words,
                   a0 x + a1 Ax + a2 A2 x + a3 A3 x + · · · + an An x = 0
    Some of the ai are nonzero. Suppose that just a0 6= 0, and a1 = a2 = a3 = · · · =
an = 0. Then a0 x = 0 and by Theorem SMEZV, either a0 = 0 or x = 0, which are
both contradictions. So ai 6= 0 for some i ≥ 1. Let m be the largest integer such
that am 6= 0. From this discussion we know that m ≥ 1. We can also assume that
am = 1, for if not, replace each ai by ai /am to obtain scalars that serve equally well
in providing a relation of linear dependence on S.
    Define the polynomial
                     p(x) = a0 + a1 x + a2 x2 + a3 x3 + · · · + am xm
   Because we have consistently used C as our set of scalars (rather than R), we
know that we can factor p(x) into linear factors of the form (x − bi ), where bi ∈ C.
§E E                  A First Course in Linear Algebra                                      287

So there are scalars, b1 , b2 , b3 , . . . , bm , from C so that,
                 p(x) = (x − bm )(x − bm−1 ) · · · (x − b3 )(x − b2 )(x − b1 )
    Put it all together and
   0 = a0 x + a1 Ax + a2 A2 x + · · · + an An x
       = a0 x + a1 Ax + a2 A2 x + · · · + am Am x                      ai = 0 for i > m