CPSC 536F Page, Fall 2025: PSD (Positive Semidefinite) Matrices and Some Theoretical Foundations of Machine Learning

This page concerns CPSC 536F, Winter 2025-26, Term 1. This is a topics course in the applications of symmetric matrices -- most often PSD (positive semidefinite) matrices -- to a number of areas. Roughly 60% of the course will give the theory of PSD matrices and some basic applications. Roughly 40% of the course will focus on kernel functions and methods.

This page has the most up-to-date information on the course(s) and supersedes any other document.

Materials below may have errors! Errors will be corrected either here or in class. If you are not attending my classes: use these materials at your own risk!

This entire webpage is currently (August 2025) UNDER CONSTRUCTION.

News Sept 17, 2025: I'm adding some exercises to the course handout on positive definite matrices and kernels.
Sept 9, 2025: First office hours this week: 4-5:30pm, Wednesday, Sept 10.
Aug 27, 2025: Class meets: MWF, 1-2pm, Room 519, FORW (Frank Forward Building).
Symmetric and PSD matrices in Machine Learning This course will focus on symmetric matrices, usually positive semidefinite (PSD) matrices, and show how they are applied to machine learning and other algorithms.

The main point of this course is to develop the conceptual foundations behind PSD matrices and kernels, and give broad classes of examples of these objects. This course is theoretical and will complement more practically oriented courses.

There are a number of relatively simple applications of symmetric matrices, including the SVD (singular value decomposition), PCA (principal component analysis), regression and ridge regression.

Graph Laplacians illustrate many properties that also hold of continuous Laplacians (e.g., the maximum principle); this includes expander graphs, that exhibit a large spectral gap and have therefore many algorithmically desirable properties. Reversible Markov matrices are (essentially) symmetric matrices.

One main goal of this course is to discuss kernel functions and methods, which has fascinating foundations. Important examples of kernels come from the classical kernel functions for Laplacians and the heat equation. We may explain that each PD (positive definite) kernel function has a RKHS (reproducing kernel Hilbert space); we may just suffice with giving examples.
Prerequisites You should have at least one term of undergraduate linear algebra; ideally you would have seen some linear algebra beyond this, either theoretical or practical. You should know (or are willing to quickly teach yourself) the following topics: bases of subspaces of ℝn, and how eigenvectors can be used to diagonalize (diagonalizable) matrices.
You should be willing to believe that any continuous function on a sphere in ℝn attains a maximum and minimum value somewhere.
Possible Topics Here is a list of tentative topics. On the first day of class I will poll the students to get suggestions for more topics and which topics to emphasize.
  • Symmetric matrices: the existence of an orthonormal eigenbasis; extension of partial basis; Rayleigh quotient, min-max and max-min principles; examples. Positive semidefinite (PSD) matrices.
  • Simple applications: SVD (singular value decomposition); PCA (principle component analysis); regression and ridge regression.
  • Discrete Laplacians and related examples: graph adjacency matrices and Laplacians; expander graphs and some of their properties; most regular graphs are expander graphs. Reversible Markov chains; basic examples. Graph edge Laplacians (and grad, div, curl).
  • Kernel functions/methods: basic definitions of PD (positive definite) kernel functions. Examples from Laplace's equation and the heat equation on a real interval. Other kernel functions. Applicaions of kernel functions.
  • The Moore-Aronszajn characterization of PSD kernel functions as reproducing kerels of a Hilbert space. Examples (without Hilbert spaces); we may prove this theorem.
  • List another topic here: describe this topic.
Primary References I'm working on a set of notes for this course; this is a work in progress, and parts may be quite skeletal (especially some parts that are mainly notes for myself that won't be covered in class).
This course will have some overlap with a course I taught in Spring of 2021, namely CPSC 531F, Spring 2021. In particular, I may assign homework directly from the Supplemental Notes and Homework from this class (or problems that are very similar to these); you may be interested to know the homework problems assigned then.
Supplemental References UNDER CONSTRUCTION!! Here we will list some articles/books related to course material, to supplement class discussion. I believe that all these references are free of charge, at least if you have a UBC library card or a CWL with library privileges.
Boards Scans
  • Introduction to the Kernel Trick, Spectral Theory of Symmetric and PSD Matrices: Theory and Examples.
    Board scans:
    09_03: course overview, course policy, general remarks on heat equation kernels, begin the "kernel trick."
    09_05: Definition of a kernel. A kernel on [n] is really an n x n matrix. Symmetric, positive (semi)definite matrices and kernels. The kernel trick -- more details on the calculations regarding a toy example. For Monday: cleaning up the notation, and many examples of kernels, including those that do and don't satisfy the "kernel trick."
    09_08: The kernel trick: separating concentric circles of cows and goldfish in ℝ2, and measuring how well this works.
    09_10: The kernel trick: correcting the embedding Φ : ℝ2 -> ℝ6. Reviewing the computation last time. The kernel trick: part 1: the kernel Φ(x) ⋅ Φ(x') can be expressed as a simple function of x ⋅ x' in ℝ2. Part 2: (start) claim: the difference between a sample points representing cows and a test point depends only on k(x,x').
    09_12: Some possible reasons for using the "kernel trick." The first homework problem: test point, cows, and ghosts. Introduction to symmetric matrices, positive (semi)definite matrices. If K is any symmetric matrix, then etK is positive definite (not merely positive semidefinite) for any real t.
    09_15: Some remarks on ellipses, elliptic functions in arc length, and nice functions in ``shell'' probablity. Generalizing circular animal plots to animals plots lying on ellipses centred at (0,0). Theorem: a kernel function on X given by a dot product into ℝn cannot be positive definite if |X| > n. Theorems to be proven in the next section. Examples of matrices and their eigenvalues: all 1's matrix, matrices with all but one eigenvalue equal to 0 (or much smaller in absolute value than the first eigenvalue), 2 x 2 matrix [a b;b a], circulant matrices.
    09_17: Circulant matrices, symmetric and antisymmetric matrices (and any square matrix is a sum thereof). Circulant matrix eigenpairs. The need for the complex dot product. Revisiting the all 1's matrix.
    09_19: Admin stuff regarding homework problems. Standard Hadamard 2 x 2 matrix; tensor (Kronecker) product of two matrices. Definition of tensor product of matrices from tensor product of vector spaces and viewing matrices as acting on vector spaces. Eigenpairs of tensor products of two matrices. Begin: connection to kernel maps and the product of two kernels.
    09_22: Tensor product of two abstract vector spaces: (1) an initial object in a certain category, or (2) a vector space formed by pairs. Graph adjacency matrices and ... tensor product ?
    09_24: Preview of next few classes; admin stuff. Regular graphs, adjacency matrices and their powers. Interpretation of d-regular graph adjacency eigenvalues (statements, not proofs): λ1=d, multiplicity of d = # connected components; λn(G)=-d iff G is bipartite. 𝔹2 eigenvectors as the tensor product of two 𝔹1 eigenvectors. Example of AG x H.
    09_26: More on x product (Cartesian product) of graphs, tensor product, and eigenpairs. Start: expansion.
    09_29: "Geometric" view of the adjacency matrix. Begin spectral gap in adjacency matrices and implication: local versus global, multiplicity of eigenvalue d in a d-regular graph. Cycles and their products.
    10_01: Review interpretation of d-regular graph spectrum, including the situation λ21 vs = . Admin: we have some overlap with CPSC 531F, Spring 2021. "Geometric" view of the adjacency matrix and cycle eigenpairs; spectral gap is C/n^2 + O(1/n^4). Spectral gap of N_1 by N_2 cyclic grid (4-regular) is C/max(N_1,N_2)^2, so C/n for N_1=N_2 and n= N_1 N_2.
    10_03: Broad strokes about d-regular graphs: grids with N_1 = N_2 and with N_1 = 3. The "double cycle" (4-regular, each edge occurs twice). Most 4-regular graphs have spectral gap 4 - 2 sqrt(3). Graph laplacian = d I - Adjacency. Return to kernel functions.
    10_06: kernel k: 𝓧 × 𝓧 →ℝ of the special form k(x,x')=Φ(x)· Φ(x') where Φ: 𝓧 → ℝf (Φ = the "feature map", f = number of "features"). Theorems: k is always positive semidefinite, and k is definite iff Φ is injective, and Φ(𝓧) is a set of linearly independent vectors. Examples.
    10_08: Review kernels on finite sets. Explain that this implies that the product of any two positive semidefinite kernels 𝓧 × 𝓧 →ℝ (for arbitrary 𝓧) is positive semidefinite. Start: postive definite kernel on 𝓧 = ℝ : solving the ODE w''(x)=f(x) on [0,L] subject to the Dirichlet condition, and solving w''(x)=δy(x) there (i.e., called Green(x,y) ) . w(x)=ReLu(x) satisfies w'(x)=Heaviside(x) and w''(x)=δ0(x). What does δ0(x) really mean? Intuition?
    10_10: Solving Poisson's equation w'' = x^2 subject to the Dirichlet condition. Satisfying an ODE in the "weak" sense (this is actually a very powerful method). Explaining why ReLU' = Heaviside, even though ReLU is not differentiable at x=0 in the classical sense -- it is a weak solution to an ODE.
    10_15: Discussion of generalized functions and references. Discussion of continuum hypothesis as it relates to kernel functions with 𝓧 = ℝ. Better organized explanation of why we need weak solutions of ODEs to differentiate ReLu(x). Formalism of test functions as discussed in Generalized Functions and ODEs (A. Friedman). Motivation for weak solutions: formally integrating one can make sense of everything by using the set of test functions Ψ=C0(ℝ). Claim: ReLU,Heaviside solves ODE w'=f in weak sense.
    10_17: Outline of next few lectures, and how generalized functions arise as a natural consequence of weak solutions of of ODEs/PDEs. Idea of how to define generalized functions over an arbitrary vector space of "test functions," Ψ. Verification that ReLU'=Heaviside in the sense of genrelized functions (and we don't care how Heaviside(0) is defined (even though probably 1/2 is the best...) because for generalized functions we don't care. ∫ (w'')w = - ∫ (w')(w') is morally the reason that the operator w ↦ w'' is negative semidefinite (for Neumann/Dirichlet/etc conditions).
  • The SVD (Singular-value decomposition) and Applications.
    Board scans:
    SCANS TO GO HERE
  • Graph Laplacians and Related Topics.
    Board scans:
    SCANS TO GO HERE
  • Laplace and Heat Kernel Functions.
    Board scans:
    SCANS TO GO HERE
  • Kernel Methods:
    Board scans:
    SCANS TO GO HERE
Office Hours Office Hours starting Sept 22 Wednesdays, 2-3pm, Thursdays, 11am-12pm, all in ICCS Room X561. If you can't make this week's office hour(s) and would like to meet, please email me.
Office hour on Thursday, Oct 2, is cancelled.
Homework and Grades Homework will be assigned during lectures and organized into two or three sets, to be collected at regular interval(s) during the term. The third homework set may essentially be a small research project.

In my topics courses, grades have the following meaning (this is based on UBC Math guidelines):
  • 95% or higher (A+): student is strongly encouraged to pursue research in the area; the instructor will write a very strong letter of recommendation for the student.
  • 90% or higher (A+): the student is encouraged to pursue research in the area or a related one; the instructor will write an enthusiastic letter of recommendation for the student.
  • 85% to 89% (A): the student will be able to use some of the course material in another field of research; research in the area is not encouraged without significantly more background and study.
  • 80% to 84% (A-): the student has fulfilled department expectations of the graduate student.
  • 79% or below (B+ or below): the student has not fulfilled department expectations of the graduate student.
Diversity and Equity UBC is trying in earnest to encourage diversity and alleviate biases and inequities to which some members of its community are subjected; this includes, for example, UBC's Indian Residential School History and Dialogue Centre, privileged to work with the Musqueam First Nation; this also includes the Computer Science Department's various programs described on its Diversity in CS webpage.
I try to act reasonably free of bias; for example, I do not view sexual orientation or gender as set in stone from birth or as classified by some fixed, finite set of choices; I try to use language accordingly. I will undoubtedly goof upon occasion, and I welcome feedback on these and related matters.
Relevant Dates Class Starts: Wednesday, Sept 3. Drop/Withdrawl: Monday, Sept 15. National Day for Truth and Reconciliation: Tuesday, Sept 30. Thanksgiving: Monday, Oct 13. Midterm Break: Monday, Nov 10 - Wednesday, Nov 12 (Remembrance Day: Tuesday, Nov 11). Last day of class: Friday, Dec 5. [Finals (we won't have one): Tuesday, Dec 9 - Saturday, Dec 20.]
All homework, etc. is due on the last day of classes, Friday, Dec 5.

UBC CS Home| Joel Friedman Home| Course Materials