Spectral Graph Theory Book

Spectral graph theory- a book focused on the definition and development of the normalized Laplacian written by Fan Chung, the first four chapters of the revised version are available online. This is the classic book for the normalized Laplacian. Spectral graph theory and its applications- a class website for a course taught at Yale by Dan. Graph Theory with Applications. The primary aim of this book is to present a coherent introduction to graph theory, suitable as a textbook for advanced undergraduate and beginning graduate students in mathematics and computer science.

Spectral Graph Theory Book

Beautifully written and elegantly presented, this book is based on 10 lectures given at the CBMS workshop on spectral graph theory in June 1994 at Fresno State University.

Spectral Graph Theory Book Pdf

Author: Fan R. K. Chung

Publisher: American Mathematical Soc.

ISBN: 9780821803158

Category: Mathematics

Page: 207

View: 547

Beautifully written and elegantly presented, this book is based on 10 lectures given at the CBMS workshop on spectral graph theory in June 1994 at Fresno State University. Chung's well-written exposition can be likened to a conversation with a good teacher--one who not only gives you the facts, but tells you what is really going on, why it is worth doing, and how it is related to familiar ideas in other areas. The monograph is accessible to the nonexpert who is interested in reading about this evolving area of mathematics.

Spectral Graph Theory studies graphs using associated matrices such as the adjacency matrix and graph Laplacian. Let (G(V, E)) be a graph. We’ll let (n = |V|) denote the number of vertices/nodes, and (m = |E|) denote the number of edges. We’ll assume that vertices are indexed by (0,dots,n-1), and edges are indexed by (0,dots,m-1).

The adjacency matrix(A) is a (ntimes n) matrix with (A_{i,j} = 1) if ((i,j) in E) is an edge, and (A_{i,j} = 0) if ((i,j) notin E). If (G) is an undirected graph, then (A) is symmetric. If (G) is directed, then (A) need not be symmetric.

The degree of a node (i), (deg(i)) is the number of neighbors of (i), meaning the number of edges which (i) participates in. You can calculate the vector of degrees (a vector (d) of length (n), where (d_i = deg(i))), using matrix-vector mulpilication:begin{equation}d = A 1end{equation}where (1) is the vector containing all 1s of length (n). You could also just sum the row entries of (A). We will also use (D = diag(d)) - a diagonal matrix with (D_{i,i} = d_i).

The incidence matrix(B) is a (n times m) matrix which encodes how edges and vertices are related. Let (e_k = (i,j)) be an edge. Then the (k)-th column of (B) is all zeros except (B_{i,k} = -1), and (B_{j,k} = +1) (for undirected graphs, it doesn’t matter which of (B_{i,k}) and (B_{j,k}) is (+1) and which is (-1) as long as they have opposite signs).Note that (B^T) acts as a sort of difference operator on functions of vertices, meaning (B^T f) is a vector of length (m) which encodes the difference in fuction value over each edge.

You can check that (B^T 1_C = 0), where (1_C) is a connected component indicator ((1_C[i] = 1) if (i in C), and (1_C[i] = 0) otherwise). (Csubseteq V) is a connected component of the graph if all vertices in (C) have a path between them, and there are no vertices in (V) that are connected to (C) which are not in (C). This implies (B^T 1 = 0).

The graph laplacian(L) is an (n times n) matrix (L = D- A = B B^T). If the graph lies on a regular grid, then (L = -Delta) up to scaling by a finite difference width (h^2), but the graph laplacian is defined for all graphs.

Note that the nullspace of (L) is the same as the nullspace of (B^T) (the span of indicators on connected components).

In most cases, it makes sense to store all these matrices in sparse format.

Exercise¶

For an undirected graph (G(V, E)), let (n = |V|) and (m = |E|). Give an expression for the number of non-zeros in each of (A), (B), and (L) in terms of (n) and (m).

(A) and (B) both have (2m) non-zeros. (L) has (n + 2m) non-zeros.

Random Walks on Graphs¶

In a random walk on a graph, we consider an agent who starts at a vertex (i), and then will chose a random neighbor of (i) and “walk” along the connecting edge. Typically, we will consider taking a walk where a neighbor is chosen uniformly at random (i.e. with probability (1/d_i)). We’ll assume that every vertex of the graph has at least one neighbor so (D^{-1}) makes sense.

This defines a Markov Chain with transition matrix (P = A D^{-1}) (columns are scaled to 1). Note that even if (A) is symmetric (for undirected graphs) that (P) need not be symmetric because of the scaling by (D^{-1}).

The stationary distribution (x) of the random walk is the top eigenvector of (P), is guaranteed to have eigenvalue (1), and is guaranteed to have non-negative entries. If we scale (x) so (|x|_1 = 1), The entry (x_i) can be interpreted as the probability that a random walker which has walked for a very large number of steps is at vertex (i).

Page Rank¶

PageRank is an early algorithm that was used to rank websites for search engines. The internet can be viewed as a directed graph of websites where there is a directed edge ((i, j)) if webpage (j) links to webpage (i). In this case, we compute the degree vector (d) using the out-degree (counting the number of links out of a webpage). Then the transition matrix (P = A D^{-1}) on the directed adjacency matrix defines a random walk on webpages where a user randomly clicks links to get from webpage to webpage. The idea is that more authoritative websites will have more links to them, so a random web surfer will be more likely to end up with them.

One of the issues with this model is that it is easy for a random walker to get “stuck” at a webpage with no out-going links. The idea of PageRank is to add a probability (alpha) that a web surfer will randomly go to another webpage which is not linked to by their current page. In this case, we can write the transition matrixbegin{equation}P = (1-alpha) A D^{-1} + frac{alpha}{n} 11^Tend{equation}We then calculate the stationary vector (x) of this matrix. Websites with a larger entry in (x_i) are deemed more authoritative.

Note that because (A) is sparse, you’ll typically want to encode (frac{1}{n}11^T) as a linear operator (this takes the average of a vector, and broadcasts it to the appropriate shape). For internet-sized graphs this is a necessity.

Let’s look at the Les Miserables graph, which encodes interactions between characters in the novel Les Miserables by Victor Hugo.

Let’s now construct the PageRank matrix and compute the top eigenpairs

The Graph Laplacian¶

Spectral Embeddings¶

Spectral embeddings are one way of obtaining locations of vertices of a graph for visualization. One way is to pretend that all edges are Hooke’s law springs, and to minimize the potential energy of a configuration of vertex locations subject to the constraint that we can’t have all points in the same location.

In one dimension:begin{equation}mathop{mathsf{minimize}}x sum{(i,j) in E} (x_i - x_j)^2text{subject to } x^T 1 = 0, |x|_2 = 1end{equation}

Note that the objective function is a quadratic form on the embedding vector (x):begin{equation}sum_{(i,j)in E} (x_i - x_j)^2 = x^T B B^T x = x^T L xend{equation}

Because the vector (1) is in the nullspace of (L), this is equivalent to finding the eigenvector with second-smallest eigenvalue.

Theory

For a higher-dimensional embedding, we can use the eigenvectors for the next-largest eigenvalues.

Attention: the first formula is not shown in the current notebook!

Spectral Clustering¶

Spectral Theory Pdf

Spectral clustering refers to using a spectral embedding to cluster nodes in a graph. Let (A, B subset V) with (A cap B = emptyset). We will denotebegin{equation}E(A, B) = {(i,j) in E mid iin A, jin B}end{equation}

One way to try to find clusters is to attempt to find a set of nodes (S subset V) with (bar{S} = V setminus S), so that we minimize the cut objectivebegin{equation}C(S) = frac{|E(S, bar{S})|}{min {|S|, |bar{S}|}}end{equation}

The Cheeger inequality bounds the second-smallest eigenvalue of (L) in terms of the optimal value of (C(S)). In fact, the way to construct a partition of the graph which is close to the optimal clustering minimizing (C(S)) is to look at the eigenvector (x) associated with the second smallest eigenvalue, and let (S = {i in V mid x_i < 0}).

As an example, let’s look at a graph generated by a stochastic block model with two clusters. The “ground-truth” clusters are the ground-truth communities in the model.

Now, let’s use spectral clustering to partition into two clusters

We’ll use the adjusted rand index to measure the quality of the clustering we obtained. A value of 1 means that we found the true clusters.

In general, you should use a dimension (d) embedding when looking for (d+1) clusters (so we used a dimension 1 embedding for 2 clusters). Let’s look at 4 clusters in a SBM

we’ll use K-means clustering in scikit learn to assign clusters.

Exercise¶

We’ll consider the stochastic block model with (k=5) clusters and (n=25) nodes per cluster.

Spectral Graph Theory Book Summary

Spectral Graph Theory Book

Let (p) be denote the probability of an edge between nodes in the same cluster, and (q) denote the probability of an edge between nodes in different clusters.

Spectral Graph Theory Book 2

Plot a phase diagram of the adjusted rand index (ARI) score as (p) and (q) both vary in the range ([0,1])