# Laplacianfaces

Post-publication activity

Curator: Xiaofei He

Figure 1: (a) Eigenfaces, (b) Fisherfaces, and (c) Laplacianfaces calculated from the face images in the YALE database.

Laplacianfaces refer to an appearance-based approach to human face representation and recognition. The approach uses Locality Preserving Projection(LPP) to learn a locality preserving subspace which seeks to capture the intrinsic geometry of the data and the local structure. When the projection is obtained, each face image in the image space is mapped to the low-dimensional face subspace, which is characterized by a set of feature images, they are called Laplacianfaces. Specifically, Laplacianfaces are the optimal linear approximations to the eigenfunctions of the Laplace Beltrami operator on the face manifold. The approach of using Laplacianfaces for recognition was developed by Xiaofei He et al. (Xiaofei He et al.2005). This is the first devoted work on face representation and recognition which explicitly considers the manifold structure.

The motivation of Laplacianfaces is:

• Recently, a number of research efforts have shown that the face images possibly reside on a nonlinear submanifold and some nonlinear method have yield impressive performance on some benchmark artificial data sets. However, for most of them, it is still unclear to evaluate the maps on novel test data points. So they might not be suitable for some computer vision tasks, such as face recognition. On the other hand, low-dimensional representations through kernel-based techniques have been developed for face recognition and can discover the nonlinear structure of the face images. But they are computationally expensive. The Laplacianfaces is proposed against this background.
• Laplacianfaces method aims to preserve the local structure of the image space. It considers the manifold structure which is modeled by an adjacency graph. In many real-world classification problems, the local manifold structure is more important than the global Euclidean structure, especially when nearest-neighbor like classifiers are used for classification.
• LPP (the algorithm generates Laplacianfaces) shares some similar properties to LLE, such as a locality preserving character. Moreover, it is linear and defined everywhere, may be simply applied to any new data point to locate it in the reduced representation space. In many real-world classification problems, the local manifold structure is more important than the global Euclidean structure, especially when nearest-neighbor like classifiers are used for classification. LPP seems to have discriminating power although it is unsupervised.

So Laplacianfaces are expected to be a natural alternative to Eigenfaces in face representation and recognition.

## Laplacianfaces generation

Figure 2: The original face image and the cropped image.

Before generating Laplacianfaces, original images are normalized (in scale and orientation) such that the two eyes are aligned at the same position. Then all resampled at the same pixel resolution. Give M face images with the size of $$h \times w \ ,$$ A face image can be represented as a high dimension vector of size $$D (D=h \times w)$$ in image space and placed into the set: {$$x_1,x_2,...,x_m$$}. Then we need to project the image set {$$x_i$$} into the PCA subspace by throwing away the smallest principle components as processing for noise reduction. For the sake of simplicity, we still use $$x$$ to denote the images in the PCA subspace in the following step. The transformation matrix of PCA is denoted by $$W_{pca}\ .$$ Laplacianfaces are extracted out of the image data space by means of Locality Preserving Projection (LPP) in the following manner:

Figure 3: Two-dimensional linear embedding of face images by Laplacianfaces. As can be seen, the face images are divided into two parts, the faces with open mouth and the faces with closed mouth. Moreover, it can be clearly seen that the pose and expression of human faces change continuously and smoothly, from top to bottom, from left to right. The bottom images correspond to points along the right path (linked by solid line), illustrating one particular mode of variability in pose.
Figure 4: Distribution of the 10 testing samples in the reduced representation subspace. As can be seen, these testing samples optimally find their coordinates which reflect their intrinsic properties, i.e., pose and expression.
1. Constructing the nearest-neighbor graph. Let G denote a graph with n nodes. The $$i$$th node corresponds to the face image $$x_i\ .$$ Put an edge between nodes $$i$$ and $$j$$ if $$x_i$$ and $$x_j$$ are "close". i.e., $$x_j$$ is among k nearest neighbors of $$x_i\ ,$$ or $$x_i$$ is among k nearest neighbors of $$x_j\ .$$ The constructed nearest-neighbor graph is an approximation of the local manifold structure.
2. Choosing the weights. If node$$i$$ and $$j$$ are connected, put
$$S_{ij} = e^{ - \frac{{\left\| {x_i - x_j } \right\|^2 } } {t} }$$
where $$t$$ is a suitable constant. Otherwise, put $$S_{ij}=0.$$ The weight matrix $$S$$ of graph $$G$$ models the face manifold structure by preserving local structure.
3. Eigenmap. Computer the eigenvectors and eigenvalues for the generalized eigenvector problem:
$$XLX^T w = \lambda XDX^T w$$
where $$D$$ is a diagonal matrix whose entries are column(or row, since $$S$$ is symmetric) sums of $$S\ ,$$ $$D_{ii} = \sum\nolimits_j {S_{ij} .} L=D-S$$ is the Laplacian matrix. The $$ith$$ row of matrix $$X$$ is $$x_i\ .$$
Let $$w_0, w_1, ..., w_{k-1}$$ be the solution of the equation above, ordered according to their eigenvalues, $$0 \leqslant \lambda _0 \leqslant \lambda _1 \leqslant ... \leqslant \lambda _{k - 1}\ ,$$ These eigenvalues are equal to or greater than zero because the matrices $$XLX^T$$ and $$XDX^T$$ are both symmetric and positive semidefinite. Thus, the embedding is as follow
$$x \to y = W^T x$$
$$W = W_{PCA} W_{LPP}$$
$$W_{LPP} = [w_0, w_1, ..., w_{k - 1} ]$$
where $$y$$ is a k-dimensional vector. $$W$$ is the transformation matrix. This linear mapping best preserves the manifold's estimated intrinsic geometry in a linear sense. The column vectors of $$W$$ are the so-called Laplacianfaces.

With its neighborhood preserving character, the Laplacianfaces seem to be able to capture the intrinsic face manifold structure to a larger extent. Figure 3 and Figure 4 shows examples that the face images with various pose and expression of a person are mapped into two-dimensional subspace.

## Locality Preserving Projection(LPP)

This section discusses the Locality Preserving Projection which is used to generate Laplacianfaces. The algorithm are linear projective maps that arise by solving a variational problem that optimally preserves the neighborhood structure of the data set.

1. The linear dimensionality reduction problem
The generic problem of linear dimensionality reduction is: Given a set $$x_1, x_2, ..., x_m$$ in $$R_l$$ ( $$l \ll n$$), such that $$y_i$$ "represents" $$x_I\ ,$$ where $$y_i=A^T x_i\ .$$ Our method is of particular applicability in the special case where $$x_1, x_2, ..., x_m \in\mathcal {M}$$ and $$\mathcal{M}$$ is a nonlinear manifold embedded in $$R_n\ .$$
2. The algorithm
Locality Preserving Projection is a linear approximation of the nonlinear Laplacian Eigenmap. The algorithmic procedure is formally stated below:
1. Constructing the adjacency graph: Let $$G$$ denote a graph with m nodes. We put an edge between nodes $$i$$ and $$j$$ if $$x_i$$ and $$x_j$$ are "close". There are two variations:
(a) $$\varepsilon$$-neighborhoods. [ parameter $$\varepsilon \in R$$ ] nodes $$i$$ and $$j$$ are connected by an edge if $$\left\| {x_i - x_j } \right\|^2 < \varepsilon$$ where the norm is the usual Euclidean norm in $$R^n\ .$$
(b) k nearest neighbors. [ parameter $$k \in R$$ ] nodes $$i$$ and $$j$$ are connected by an edge if $$i$$ is among $$k$$ nearest neighbors of $$j$$ or $$j$$ is among $$k$$ nearest neighbors of $$i\ .$$
2. Choosing the weights: Two variations for weighting the edges. $$W$$ is sparse symmetric $$m \times m$$ matrix with $$W_{ij}$$ having the weight of the edge joining vertices $$i$$ and $$j\ ,$$ and 0 if there is no such edge.
(a)Heat kernel. [parameter $$t \in R$$]. If nodes $$i$$ and $$j$$ are connected. put
$$W_{ij} = e^{ - \frac{{\left\| {x_i - x_j } \right\|^2 } } {t} }$$
(b) Simple-minded. [No parameter]. $$W_{ij}=1$$ if and only of vertices $$i$$ and $$j$$ are connected by an edge.
3. Eigenmaps: Computer the eigenvectors and eigenvalues for the generalized eigenvector problem:
$$XLX^T a = \lambda XDX^T a$$
where$$D$$ is a diagonal matrix whose entries are column(or row, since $$W$$ is symmetric) sums of $$W\ ,$$ $$D_{ii} = \sum\nolimits_j {W_{ji} }$$ is the Laplacian matrix. The $$i_{th}$$ column of matrix $$X$$ is $$x_i\ .$$
Let the column vectors $$a_0, ..., a_{l-1}$$ be the solutions of the generalized eigenvector problem, ordered according to their eigenvalues, $$\lambda _0 \leqslant \lambda _1 \leqslant ... \leqslant \lambda _{l - 1} \ .$$ Thus the embedding is as follows:
$$x \to y = A^T x_i,$$ $$A = (a_0 , a_1, ..., a_{l - 1} )$$
where $$y_i$$ is a $$l$$dimensional vector, and $$A$$ is a $$n \times l$$ matrix.

## The relation between LPP and PCA

This section discusses the close relationship between LPP and PCA. It is worthwhile to point out that if the Laplacian matrix $$L$$ is $$\frac{1} {n}I - \frac{1} {{n^2 } }ee^T \ ,$$ the matrix $$XLX^T$$ is the data covariance matrix. where $$n$$ is the number of data points, $$I$$ is the identity matrix, and $$e$$ is a column vector taking one at each entry. The Laplacian matrix here has the effect of removing the sample mean from the sample vectors. In this case, the weight matrix $$S$$ takes $$\frac{1} {{n^2 } }$$ at each entry, i.e., $$S_{ij} = \frac{1} {{n^2 } }\ .$$ $$D_{ii} = \sum\nolimits_j {S_{ij} = } \frac{1} {n} \ .$$ Hence, the Laplacian matrix is $$L = D - S = \frac{1} {n}I - \frac{1} {{n^2 } }ee^T \ .$$ Let $$m$$ denote the sample mean, i.e., $$m = \frac{1} {n}\sum\nolimits_i {x_i } \ .$$ The proof is follow: $XLX^T = \frac{1} {n}X(I - \frac{1} {n}ee^T )X^T \ :$ $= \frac{1} {n}XX^T - \frac{1} {{n^2 } }(Xe)(Xe)^T \ :$ $= \frac{1} {n}\sum\limits_i {x_i x_i ^T - \frac{1} {{n_2 } }(nm)(nm)^T }$

$= E[(x - m)(x - m)^T ] + 2mm^T - 2mm^T \ :$

$= E[(x - m)(x - m)^T ]$ where $$E[(x - m)(x - m)^T ]$$ is just the covariance matrix of the data set. The above analysis shows that the weight matrix $$S$$ play a key role in the LPP algorithm. When we aim at preserving the global structure (PCA), we take $$\varepsilon$$ (or $$k$$) to be infinity and choose the eigenvectors of
$$XLX^Tw = \lambda w$$
associated with the largest eigenvalues. Hence, the data points are projected along the directions of maximal variance. When we aim at preserving the local structure, we take $$\varepsilon$$ to be sufficiently small and choose the eigenvectors (of the matrix $$XLX^T$$) associated with the smallest eigenvalues. Hence, the data points are projected along the directions preserving locality. It is important to note that, when $$\varepsilon$$ (or $$k$$) is sufficiently small, the Laplacian matrix is no longer the data covariance matrix and, hence, the directions preserving locality are not the directions of minimal variance. In fact, the directions preserving locality are those minimizing local variance.

## The relation between LPP and LDA

This section presents a theoretical analysis of LPP and its connection to LDA. LDA seeks directions that efficient for discrimination. The projection is found by solving the generalized eigenvalue problem:
$$S_B w = \lambda S_W w$$
where $$S_W$$ and $$S_B$$ are defined as:
$$S_W = \sum\limits_{i = 1}^l {(\sum\limits_{j = 1}^{n_i } {(x_j ^{(i)} - m^{(i)} )(x_j ^{(i)} - m^{(i)} )^T } )}$$
$$S_B = \sum\limits_{i = 1}^l {n_i (m^{(i)} - m)(m^{(i)} - m)^T }$$
suppose there $$l$$ classes. The $$i$$th class contains $$n_i$$ sample points. Let $$m^{(i)}$$ denote the average vector of the $$x_{(i)}$$ denote the random vector associated to the $$i$$th class and $${x_j ^{(i)} }$$ denote the $$j$$th sample point in the $$i$$th class. We can rewrite the matrix $$S_W$$ as: $S_W = \sum\limits_{i = 1}^l {(\sum\limits_{j = 1}^{n_i } {(x_j ^{(i)} - m^{(i)} )(x_j ^{(i)} - m^{(i)} )^T } )} \ :$ $= \sum\limits_{i = 1}^l {(\sum\limits_{j = 1}^{n_i } {(x_j ^{(i)} (x_j ^{(i)} )^T - m^{(i)} (x_j ^{(i)} )^T - x_j ^{(i)} (m)^{(i)} } )^T + m^{(i)} (m^{(i)} )^T ))} \ :$ $= \sum\limits_{i = 1}^l {(\sum\limits_{j = 1}^{n_i } {(x_j ^{(i)} (x_j ^{(i)} )^T - m^{(i)} (x_j ^{(i)} )^T - X_j ^{(i)} (m^{(i)} )^T } + m^{(i)} (m^{(i)} )^T ))} \ :$ $= \sum\limits_{i = 1}^l {(\sum\limits_{j = 1}^{n_i } {x_j ^{(i)} (x_j ^{(i)} )^T } - n_i m^{(i)} (m^{(i)} )^T )} \ :$ $= \sum\limits_{i = 1}^l {(X_i X_i ^T - \frac{1} {{n_i } }(x_1 ^{(i)} + \cdot \cdot \cdot + x_{n_i } ^{(i)} )(x_1 ^{(i)} + \cdot \cdot \cdot + x_{n_i } ^{(i)} )^T )} \ :$ $= \sum\limits_{i = 1}^l {(X_i X_i ^T - \frac{1} {{n_i } }X_i (e_i e_i ^T )X_i ^T )} \ :$ $= \sum\limits_{i = 1}^l {(X_i L_i X_i ^T )}$
where $$\sum\limits_{i = 1}^l {(X_i L_i X_i ^T )}$$ is the data covariance matrix of the $$i$$th class and $$X_i = [x_1 ^{(i)} ,x_2 ^{(i)} , \cdot \cdot \cdot ,x_{n_i } ^{(i)} ]$$ is a $$d \times n_i$$ matrix. $$L_i = I - \frac{1} {{n_i } }e_i e_i ^T$$ is a $$n_i \times n_i$$ matrix where $$I$$ is the identity matrix and $$e_i=(1, 1, ..., 1)^T$$ is an $$n_i$$-dimensional vector. To further simplify the above equation, define$X = (x_i, x_2, \cdot \cdot \cdot, x_n )$
if $$x_i$$ and $$x_j$$ both belong to the $$k$$th class $$W_{ij}=\frac{1} {{n_k } }\ ,$$ otherwise, $$W_{ij}=0\ .$$ $$L = I - W$$
Thus, we get$S_W = XLX^T$
we could regard the matrix $$W$$ as the weight matrix of a graph with data points as its nodes. Specially, $$W_{ij}$$ is the weight of the edge ($$x_i, x_j$$). $$W$$ reflects the class relationships of the data points. The matrix $$L$$ is thus called graph Laplacian, which plays key role in LPP.
Similarly, we can computer the matrix $$S_B$$ as follow: $S_B = \sum\limits_{i = 1}^l {n_i (m^{(i)} - m)(m^{(i)} - m)^T } \ :$ $= (\sum\limits_{i = 1}^l {n_i m^{(i)} (m^{(i)} )^T ) - m(\sum\limits_{i = 1}^l {n_i (m^{(i)} )^T } )} - (\sum\limits_{i = 1}^l {n_i m^{(i)} } )m^T {\text{ + (}}\sum\limits_{i = 1}^l {n_i } {\text{)mm}}^{\text{T}} \ :$ $= (\sum\limits_{i = 1}^l {\frac{1} {{n_i } } } (x_1 ^{(i)} + \cdots + x_{n_i } ^{(i)} )(x_1 ^{(i)} + \cdots + x_{n_i } ^{(i)} )^T ) - 2nmm^T + nmm^T \ :$ $= XWX^T - X(\frac{1} {n}ee^T )X^T \ :$ $= X(W - \frac{1} {n}ee^T )X^T \ :$ $= X(W - I + I - \frac{1} {n}ee^T )X^T \ :$ $= - XLX^T + X(I - \frac{1} {n}ee^T )X^T \ :$ $= - XLX^T + C$
where $$e = (1, 1, ..., 1)^T$$ is a $$n$$-dimensional vector and $$C = X(I - \frac{1} {n}ee^T )^T$$ is the data covariance matrix. Thus, the generalized eigenvector problem of LDA can be written as follow: $S_B w = \lambda S_W w$ $\Rightarrow (C - XLX^T )w = \lambda XLX^T w$ $\Rightarrow Cw = (1 + \lambda )XLX^T w$ $\Rightarrow XLX^T w = \frac{1} {{1 + \lambda } }Cw$
Thus, the projection of LDA can be obtained by solving the following generalized eigenvalue problem,
$$XLX^T w = \lambda Cw$$
The optimal projections correspond to the eigenvectors associated with the smallest eigenvalues. If the sample mean of the data set is zero, the covariance matrix is simply $$XX^T$$ which is exactly the matrix $$XDX^T$$ in the LPP algorithm. The analysis above shows that LDA actually aims to preserve discriminating information and global geometrical structure. Moreover, LDA can be induced in the LPP framework. However, LDA is supervised while LPP can be performed in either supervised or unsupervised manner.

## Using Laplacianfaces in face representation and recognition

The locality preserving face subspace is spanned by a set of eigenvectors of $$\hat W_{lpp} = \{ w_0 ,w_1 ,...,w_{k - 1} \}\ .$$ Each eigenvector can be displayed by an image. These images are called Laplacianfaces. A face image can be mapped into the locality preserving subspace by using the Laplacianfaces.
When the Laplacianfaces are created, face recognition become a pattern classification task.
The recognition process has three steps:

• Calculate the Laplacianfaces from the training set of face images. $$\hat W_{lpp} = \{ w_0, w_1, ..., w_{k - 1} \}\ .$$ each column vector of $$\hat W_{lpp}$$ is a Laplacianfaces.
• A new face$$x_i$$ is projected into the face space by $$y_i = \hat W^T _{lpp} x_i\ .$$ Where $$\hat W_{lpp}$$ is the set of eigenvectors. The vector $$y_i$$ is the representation of the new face in face space.
• To determine which face class $$x_i$$ belongs to is find the minimum value of
$$d_k = \left\| {y_i - y_k } \right\|$$
where $$y_k$$ is the vector representing the $$k$$th face class. The face $$x_i$$ is considered as belonging to class $$k$$ if the minimum $$d_k$$ is smaller than some predefined threshold $$\theta _d\ ;$$ Otherwise, it is classified as unknown.

Different pattern classifiers, including nearest-neighbor, Bayesian, Support Vector Machine can be applied for face recognition.

## References

• He, Xiaofe; Yan, Shuicheng; Niyogi, Partha and Zhang, HongJiang (2005). Face Recognition Using Laplacianfaces. IEEE Trans. Pattern Anal. Mach. Intell. 27(3): 328-340.
• He, Xiaofei and Niyogi, Partha (2003). Locality Preserving Projections. NIPS, 2003.
• P, Belhumeur and J, Hespanha (1997). Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7): 711-720: 1997.
• Belkin, Mikhail and Niyogi, Partha (2001). Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering. NIPS, 2001.
• Belkin, Mikhail and Niyogi, Partha (2002). Using Manifold Structure for Partially Labeled Classification. NIPS, 2002.
• M. , Brand (2002). Charting a Manifold. NIPS, 2002.
• Chung, F.R.K (1997). Spectral Graph Theory. Regional Conf.Series in Math, 1997.
• Tenenbaum, J.B.; de Silva, V. and Langford, J.C. (2000). A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science, 2000.
• Y., Chang; Hu, C. and Turk, M. (2003). Manifold of Facial Expression. IEEE Int'l Workshop Analysis and Modeling of Faces and Gestures, 2003.

Internal references

• Sheng Zhang and Matthew Turk (2008) Eigenfaces. Scholarpedia, 3(9):4244.