CPSC 536F Page, Fall 2025: PSD (Positive Semidefinite) Matrices and Some Theoretical Foundations of Machine Learning

This page concerns CPSC 536F, Winter 2025-26, Term 1. This is a topics course in the applications of symmetric matrices -- most often PSD (positive semidefinite) matrices -- to a number of areas. Roughly 60% of the course will give the theory of PSD matrices and some basic applications. Roughly 40% of the course will focus on kernel functions and methods.

This page has the most up-to-date information on the course(s) and supersedes any other document.

Materials below may have errors! Errors will be corrected either here or in class. If you are not attending my classes: use these materials at your own risk!

This entire webpage is currently (August 2025) UNDER CONSTRUCTION.

News	Oct 28, 2025: Classes starting Oct 29 (Wednesday) are likely resuming to in-person. If we need to move to online, I will email you via Workday and/or Canvas. Oct 25, 2025: Class on Oct 27 (Monday) is cancelled. Be prepared to transition to Zoom starting Oct 29 (Wednesday); links, if needed, to be provided via Canvas UBC's Canvas system and/or UBC Workday. Oct 23, 2025: Class on Oct 24 (Friday) is cancelled. Sept 17, 2025: I'm adding some exercises to the course handout on positive definite matrices and kernels. Sept 9, 2025: First office hours this week: 4-5:30pm, Wednesday, Sept 10. Aug 27, 2025: Class meets: MWF, 1-2pm, Room 519, FORW (Frank Forward Building).
Symmetric and PSD matrices in Machine Learning	This course will focus on symmetric matrices, usually positive semidefinite (PSD) matrices, and show how they are applied to machine learning and other algorithms. The main point of this course is to develop the conceptual foundations behind PSD matrices and kernels, and give broad classes of examples of these objects. This course is theoretical and will complement more practically oriented courses. There are a number of relatively simple applications of symmetric matrices, including the SVD (singular value decomposition), PCA (principal component analysis), regression and ridge regression. Graph Laplacians illustrate many properties that also hold of continuous Laplacians (e.g., the maximum principle); this includes expander graphs, that exhibit a large spectral gap and have therefore many algorithmically desirable properties. Reversible Markov matrices are (essentially) symmetric matrices. One main goal of this course is to discuss kernel functions and methods, which has fascinating foundations. Important examples of kernels come from the classical kernel functions for Laplacians and the heat equation. We may explain that each PD (positive definite) kernel function has a RKHS (reproducing kernel Hilbert space); we may just suffice with giving examples.
Prerequisites	You should have at least one term of undergraduate linear algebra; ideally you would have seen some linear algebra beyond this, either theoretical or practical. You should know (or are willing to quickly teach yourself) the following topics: bases of subspaces of ℝⁿ, and how eigenvectors can be used to diagonalize (diagonalizable) matrices. You should be willing to believe that any continuous function on a sphere in ℝⁿ attains a maximum and minimum value somewhere.
Possible Topics	Here is a list of tentative topics. On the first day of class I will poll the students to get suggestions for more topics and which topics to emphasize. Symmetric matrices: the existence of an orthonormal eigenbasis; extension of partial basis; Rayleigh quotient, min-max and max-min principles; examples. Positive semidefinite (PSD) matrices. Simple applications: SVD (singular value decomposition); PCA (principle component analysis); regression and ridge regression. Discrete Laplacians and related examples: graph adjacency matrices and Laplacians; expander graphs and some of their properties; most regular graphs are expander graphs. Reversible Markov chains; basic examples. Graph edge Laplacians (and grad, div, curl). Kernel functions/methods: basic definitions of PD (positive definite) kernel functions. Examples from Laplace's equation and the heat equation on a real interval. Other kernel functions. Applicaions of kernel functions. The Moore-Aronszajn characterization of PSD kernel functions as reproducing kerels of a Hilbert space. Examples (without Hilbert spaces); we may prove this theorem. Non-Parametric Statistics: I understand that kernel methods are important to non-parametric stats. I currently know next to nothing about this field, so I'm relying on ChatGPT/Gemini here. All of Nonparametric Statistics, by Larry Wasserman appears to be a roadmap of formulas used in nonparametric statistics (without motivation, examples, etc.). List another topic here: describe this topic.
Primary References	I'm working on a set of notes for this course; this is a work in progress, and parts may be quite skeletal (especially some parts that are mainly notes for myself that won't be covered in class). Positive Definite Matrices and Kernels (last revised October 10, 3:33pm, 2025); this is a work in progress. Maybe add another set of notes. This course will have some overlap with a course I taught in Spring of 2021, namely CPSC 531F, Spring 2021. In particular, I may assign homework directly from the Supplemental Notes and Homework from this class (or problems that are very similar to these); you may be interested to know the homework problems assigned then.
Supplemental References	UNDER CONSTRUCTION!! Here we will list some articles/books related to course material, to supplement class discussion. I believe that all these references are free of charge, at least if you have a UBC library card or a CWL with library privileges. Linear Algebra and Applications: Matrix Algebra by Horn and Johnson is a good general reference for linear algebra. PSD Kernel Functions: I'm not sure which references I'll give for kernel methods. Here are some possibilities: The textbook Kernel Methods for Machine Learning with Math and Python, by Joe Suzuki has a lot of "hands on" examples. [Historical remarks:] Kernel functions received intense study in the early 1900's, first named by Hilbert in a series of articles (here is the second part), based on ODEs and PDEs: potential theory and on the heat, wave, and Laplace equation (and Green's functions and Sturm-Liouville theory etc.) of the 1800's and 1700's (properties of Laplacian eigenvalues were known to the ancient Greeks as ``harmonics'' as they apply to stringed musical instruments). PD kernels became more widely known and studied following Aronszajn's articles: La théorie des noyaux reproduisants et ses applications (Part 1, 1943) and (Part 2, 1950:) Theory of Reproducing Kernels. List another reference here: describe this reference.
Boards Scans	Introduction to the Kernel Trick, Spectral Theory of Symmetric and PSD Matrices: Theory and Examples. Board scans: 09_03: course overview, course policy, general remarks on heat equation kernels, begin the "kernel trick." 09_05: Definition of a kernel. A kernel on [n] is really an n x n matrix. Symmetric, positive (semi)definite matrices and kernels. The kernel trick -- more details on the calculations regarding a toy example. For Monday: cleaning up the notation, and many examples of kernels, including those that do and don't satisfy the "kernel trick." 09_08: The kernel trick: separating concentric circles of cows and goldfish in ℝ², and measuring how well this works. 09_10: The kernel trick: correcting the embedding Φ : ℝ² -> ℝ⁶. Reviewing the computation last time. The kernel trick: part 1: the kernel Φ(x) ⋅ Φ(x') can be expressed as a simple function of x ⋅ x' in ℝ². Part 2: (start) claim: the difference between a sample points representing cows and a test point depends only on k(x,x'). 09_12: Some possible reasons for using the "kernel trick." The first homework problem: test point, cows, and ghosts. Introduction to symmetric matrices, positive (semi)definite matrices. If K is any symmetric matrix, then e^tK is positive definite (not merely positive semidefinite) for any real t. 09_15: Some remarks on ellipses, elliptic functions in arc length, and nice functions in ``shell'' probablity. Generalizing circular animal plots to animals plots lying on ellipses centred at (0,0). Theorem: a kernel function on X given by a dot product into ℝⁿ cannot be positive definite if \|X\| > n. Theorems to be proven in the next section. Examples of matrices and their eigenvalues: all 1's matrix, matrices with all but one eigenvalue equal to 0 (or much smaller in absolute value than the first eigenvalue), 2 x 2 matrix [a b;b a], circulant matrices. Basic Examples of Matrices and Eigenpairs: Board scans: 09_17: Circulant matrices, symmetric and antisymmetric matrices (and any square matrix is a sum thereof). Circulant matrix eigenpairs. The need for the complex dot product. Revisiting the all 1's matrix. 09_19: Admin stuff regarding homework problems. Standard Hadamard 2 x 2 matrix; tensor (Kronecker) product of two matrices. Definition of tensor product of matrices from tensor product of vector spaces and viewing matrices as acting on vector spaces. Eigenpairs of tensor products of two matrices. Begin: connection to kernel maps and the product of two kernels. 09_22: Tensor product of two abstract vector spaces: (1) an initial object in a certain category, or (2) a vector space formed by pairs. Graph adjacency matrices and ... tensor product ? Introduction to Graphs, Adjacency Matrices, Regular Graphs, and Eigenvalues. Board scans: 09_24: Preview of next few classes; admin stuff. Regular graphs, adjacency matrices and their powers. Interpretation of d-regular graph adjacency eigenvalues (statements, not proofs): λ₁=d, multiplicity of d = # connected components; λ_n(G)=-d iff G is bipartite. 𝔹² eigenvectors as the tensor product of two 𝔹¹ eigenvectors. Examples of A_{G x H}. 09_26: More on x product (Cartesian product) of graphs, tensor product, and eigenpairs. Start: expansion. 09_29: "Geometric" view of the adjacency matrix. Begin spectral gap in adjacency matrices and implication: local versus global, multiplicity of eigenvalue d in a d-regular graph. Cycles and their products. 10_01: Review interpretation of d-regular graph spectrum, including the situation λ₂<λ₁ vs = . Admin: we have some overlap with CPSC 531F, Spring 2021. "Geometric" view of the adjacency matrix and cycle eigenpairs; spectral gap is C/n^2 + O(1/n^4). Spectral gap of N_1 by N_2 cyclic grid (4-regular) is C/max(N_1,N_2)^2, so C/n for N_1=N_2 and n= N_1 N_2. 10_03: Broad strokes about d-regular graphs: grids with N_1 = N_2 and with N_1 = 3. The "double cycle" (4-regular, each edge occurs twice). Most 4-regular graphs have spectral gap 4 - 2 sqrt(3). Graph laplacian = d I - Adjacency. Return to kernel functions. Kernels, Laplacians in intervals in ℝ, a Positive Definite Kernel there, and Generalized Functions. Board scans: 10_06: kernel k: X × X →ℝ of the special form k(x,x')=Φ(x)· Φ(x') where Φ: X → ℝ^f (Φ = the "feature map", f = number of "features"). Theorems: k is always positive semidefinite, and k is definite iff Φ is injective, and Φ(X) is a set of linearly independent vectors. Examples. 10_08: Review kernels on finite sets. Explain that this implies that the product of any two positive semidefinite kernels X × X →ℝ (for arbitrary X) is positive semidefinite. Start: postive definite kernel on X = ℝ : solving the ODE w''(x)=f(x) on [0,L] subject to the Dirichlet condition, and solving w''(x)=δ_y(x) there (i.e., called Green(x,y) ) . w(x)=ReLu(x) satisfies w'(x)=Heaviside(x) and w''(x)=δ₀(x). What does δ₀(x) really mean? Intuition? 10_10: Solving Poisson's equation w'' = x^2 subject to the Dirichlet condition. Satisfying an ODE in the "weak" sense (this is actually a very powerful method). Explaining why ReLU' = Heaviside, even though ReLU is not differentiable at x=0 in the classical sense -- it is a weak solution to an ODE. 10_15: Discussion of generalized functions and references. Discussion of continuum hypothesis as it relates to kernel functions with X = ℝ. Better organized explanation of why we need weak solutions of ODEs to differentiate ReLu(x). Formalism of test functions as discussed in Generalized Functions and ODEs (A. Friedman). Motivation for weak solutions: formally integrating one can make sense of everything by using the set of test functions Ψ=C^∞₀(ℝ). Claim: ReLU,Heaviside solves ODE w'=f in weak sense. 10_17: Outline of next few lectures, and how generalized functions arise as a natural consequence of weak solutions of of ODEs/PDEs. Idea of how to define generalized functions over an arbitrary vector space of "test functions," Ψ. Verification that ReLU'=Heaviside in the sense of genrelized functions (and we don't care how Heaviside(0) is defined (even though probably 1/2 is the best...) because for generalized functions we don't care. ∫ (w'')w = - ∫ (w')(w') is morally the reason that the operator w ↦ w'' is negative semidefinite (for Neumann/Dirichlet/etc conditions). 10_20: Generalized functions: linear maps from Ψ=C^∞₀(ℝ) to ℝ, naturally suggested by "weak" solutions to the ODEs w'=f (and w''=f) and to be able to define the Delta function. Bump functions, and two applications (the main one will discussed in the next lesson). 10_22: Overview of what we need and use from the theory of generalized functions. Back to Green's function for (0,L) with Dirichlet conditions; symetry and positive definiteness of Green's functions, and rough justification in term of 10_29: Finishing remarks (for now) on kernel methods: (1) ChatGPT's sermon on Gauge Field Theory and ReLU(x) vs. \|x\|/2; (2) The heat equation, w_t = Δ w, or more generally w_t = ℒ(w); with ℒ a linear operator, i.e., w(t) = e^(t ℒ)w(0); (3) Fourier analysis on (0,1) (with Dirichlet conditions) or on (0,L). Begin Rayleigh quotients on symmetric matrices; state spectral theorem. Variational Approach to the Spectral Decomposition of Symmetric Matrices, the SVD (Singular-value decomposition) and Applications. Board scans: 10_31: More on the Rayleigh quotient; variation of the Rayleigh quotient at a maximum or minimum. 11_03: Lemmas for variations: (1) fractional polynomials in &eps; (2) variation in u of f(u)=\|\|Au-b\|\|^2: f(u+&eps;v)=f(u)+&eps;Term_1+O(&eps;^2). Viewing this as a restriction to Span(u,v), which is 2-dimensional. Relation of regression (curve fitting) to f(u). 11_05: Lemmas for variations: (1) variation in u of f(u)=\|\|Au-b\|\|^2 and g(u)=(Au)· u. (2) fractional polynomials in &eps; -- various lemmas. Quadratic polynomial interpolation as an example of f(u)=\|\|Au-b\|\|^2 and the "normal equations" A^T A u - A^T b = 0 Corollary: Ridge regression: minimize \|\|Au-b\|\|^2 + C\|\|u\|\|^2, using \|\|u\|\|^2 = \|\|I u - 0 \|\| (hence f(u) above with A=I and b=0), therefore A^T A u - A^T b + C u = 0. 11_07: Ridge regression and variants. Remarks on ridge regression (C small, C large): (C I + A^T A)^{-1} = (1/C)(I + A^T A/C)^{-1} is invertible and has power series for C large. Rayleigh quotient: unit vector trick: u + &eps;v where v is perpendicular to u. 11_14: Variational computation regarding the Rayleigh quotient. Maximizer or minimizer is an eigenvalue. Galactic algorithms and OO() notation (e.g., OO(n^2) = "uh-oh of n^2"). Graph Laplacians and Analysis Laplacians. (We've already discussed the adjacency of symmetric matrices and a bit about d-regular graphs.) Board scans: SCANS TO GO HERE Laplace and Heat Kernel Functions. < Laplace and Heat Kernel Functions. Board scans: SCANS TO GO HERE Kernel Methods: Topics: Board scans: SCANS TO GO HERE
Office Hours	Office Hours starting Sept 22 Wednesdays, 2-3pm, Thursdays, 11am-12pm, all in ICCS Room X561. If you can't make this week's office hour(s) and would like to meet, please email me. Office hour on Thursday, Oct 2, is cancelled.
Homework and Grades	Homework will be assigned during lectures and organized into two or three sets, to be collected at regular interval(s) during the term. The third homework set may essentially be a small research project. In my topics courses, grades have the following meaning (this is based on UBC Math guidelines): 95% or higher (A+): student is strongly encouraged to pursue research in the area; the instructor will write a very strong letter of recommendation for the student. 90% or higher (A+): the student is encouraged to pursue research in the area or a related one; the instructor will write an enthusiastic letter of recommendation for the student. 85% to 89% (A): the student will be able to use some of the course material in another field of research; research in the area is not encouraged without significantly more background and study. 80% to 84% (A-): the student has fulfilled department expectations of the graduate student. 79% or below (B+ or below): the student has not fulfilled department expectations of the graduate student.
Diversity and Equity	UBC is trying in earnest to encourage diversity and alleviate biases and inequities to which some members of its community are subjected; this includes, for example, UBC's Indian Residential School History and Dialogue Centre, privileged to work with the Musqueam First Nation; this also includes the Computer Science Department's various programs described on its Diversity in CS webpage. I try to act reasonably free of bias; for example, I do not view sexual orientation or gender as set in stone from birth or as classified by some fixed, finite set of choices; I try to use language accordingly. I will undoubtedly goof upon occasion, and I welcome feedback on these and related matters.
Relevant Dates	Class Starts: Wednesday, Sept 3. Drop/Withdrawl: Monday, Sept 15. National Day for Truth and Reconciliation: Tuesday, Sept 30. Thanksgiving: Monday, Oct 13. Midterm Break: Monday, Nov 10 - Wednesday, Nov 12 (Remembrance Day: Tuesday, Nov 11). Last day of class: Friday, Dec 5. [Finals (we won't have one): Tuesday, Dec 9 - Saturday, Dec 20.] All homework, etc. is due on the last day of classes, Friday, Dec 5.

UBC CS Home

Joel Friedman Home

Course Materials