The mathematics of Google
Minicourse for AGORA days
Course Description
How does Google sort through the roughly 5,000,000,000 webpages to find
relevant sites? It turns out that the answer involves the mathematics
of linear algebra, which, roughly speaking, involves solving systems of
linear equations. Such systems are ubiquitous in science and engineering.
In this whirlwind class, we'll cover about 2/3 of the syllabus of a
class usually taken by junior engineering majors, learning about
Gaussian elimination, linear transformations and change of basis,
eigenvalues and eigenvectors, culminating with the page rank algorithm
developed by Page and Brin as grad students at Stanford.
Lecture 1 Introduction to systems of linear equations: this is nothing more than
solving a system like 2x+3y=6, x-2y=2 (run amok). The algorithm which
produces solutions is known as Gaussian elimination. We'll also study
the basics of matrix multiplication, and look at a simple example of
a Markov process: suppose in Smallville, every year 30% of nonsmokers
begin smoking, and 20% of smokers quit. If initially there are 8000
nonsmokers and 2000 smokers, what are the numbers at the end of 100
years? There is a beautiful and elegant solution to this problem,
which we begin to tackle in the next class.
Lecture 2 The problem introduced in the previous class can be solved by
successive matrix multiplications A*v(i)= v(i+1), where v(i) is
the vector representing the # of nonsmokers and smokers at time i
(hence v(0) = (8000,2000)), and A is the transition matrix
.7 .2
.3 .8
so that A*v(0) = v(1) = (6000,4000). Now iterate this process.
Of course, with large matrices this is very costly. The solution
is to find a better way to represent the matrix A. This leads us
to the notion of linear transformation and change of coordinates;
a change of coordinates allows us to replace A with a simpler matrix,
without losing information.
Lecture 3 With the notion of change of coordinates and linear transformation
under our belt, we introduce the notions of eigenvalue and eigenvector.
Formally, an eigenvector for a matrix A is a vector v such that Av=cv,
for some constant c. Despite the fancy German names, these are very
concrete objects and there is a simple algorithm to compute them.
In particular, eigenvectors give us the desired simple change of
coordinates that allows us to solve systems analogous to the one above
easily and by hand! This will be the key to the page rank algorithm.
Lecture 4 Today is the capstone of the course, where we use the tools developed
to understand the page rank algorithm. The key idea is
to rank a page by the number of pages which point to it, so essentially
we are representing the web as a directed graph (a set of nodes and
edges between nodes, with edges having direction). There are some
problems to overcome, because someone surfing the web may not follow
a link from the current page, but instead jump to another page. The
solution to this problem is simple and elegant, and involves perturbing
the transition matrix A (which encodes the directed graph modelling
the web) by a matrix representing the possibility the user makes a jump.
Other information:
While the class really requires no mathematical prerequisites,
you should bring an enjoyment of mathematics and willingness to learn.
Instructor
Dr. Hal Schenck
Mathematics Department, University of Illinois
schenck@math.uiuc.edu