\documentclass{article}
\usepackage{amsmath,amssymb,amsthm}
\usepackage{dsfont}
\usepackage{thmtools}
\usepackage{etoolbox}
\usepackage{enumitem}
\usepackage{fullpage}
\usepackage{parskip}
\usepackage{xcolor}
\usepackage{verbatim}
\usepackage[colorlinks,linkcolor=red!80!black]{hyperref}
\usepackage[noabbrev,capitalize]{cleveref}
% https://tex.stackexchange.com/a/42728/9019
\newcommand{\tagthis}[1]{\stepcounter{equation}\tag{\theequation}\label{#1}}
\definecolor{question}{rgb}{0,0,1}
\newlist{qlist}{enumerate}{1}
\setlist[qlist,1]{leftmargin=*, label=\textbf{(\alph*)}, ref={(\alph*)}}
\crefname{qlisti}{part}{parts}
\AtBeginEnvironment{qlist}{\color{question}}
\newcommand{\ask}[1]{\textcolor{question}{#1}}
\newenvironment{asking}{\begingroup\color{question}}{\endgroup}
\crefname{section}{Question}{Questions}
\definecolor{answer}{rgb}{0,.5,0}
\newcommand{\ans}[1]{\par\textcolor{answer}{Answer: #1}}
\newenvironment{answer}{\par\begingroup\color{answer}Answer: }{\endgroup}
\definecolor{points}{rgb}{0.6,0.3,0}
\newcommand{\pts}[1]{\textcolor{points}{[#1~points]}}
\newcommand{\hint}[1]{\textcolor{black!60!white}{\emph{Hint: #1}}}
\DeclareMathOperator*{\argmax}{arg\,max}
\DeclareMathOperator*{\argmin}{arg\,min}
\DeclareMathOperator{\bigO}{\mathcal{O}}
\DeclareMathOperator*{\E}{\mathbb{E}}
\newcommand{\dist}{\mathcal{D}}
\newcommand{\xdist}{\mathcal{D}_x}
\DeclareMathOperator{\spn}{span}
\DeclareMathOperator{\tr}{tr}
\newcommand{\tp}{^\mathsf{T}}
\newcommand{\op}{\mathit{op}}
\newcommand{\cH}{\mathcal{H}}
\newcommand{\cX}{\mathcal{X}}
\newcommand{\cY}{\mathcal{Y}}
\newcommand{\R}{\mathbb{R}}
\DeclareMathOperator{\VCdim}{VCdim}
\DeclareMathOperator{\emprad}{\hat{\mathfrak{R}}_S}
\DeclareMathOperator{\rad}{\mathfrak{R}_n}
\newcommand{\bsigma}{\boldsymbol{\sigma}}
\makeatletter
% \abs{} uses regular-sized bars, \abs*{} uses auto-sized ones
\newcommand{\abs}{\@ifstar{\@Abs}{\@abs}}
\newcommand{\@abs}[1]{\lvert #1 \rvert}
\newcommand{\@Abs}[1]{\left\lvert #1 \right\rvert}
\newcommand{\norm}{\@ifstar{\@Norm}{\@norm}}
\newcommand{\@norm}[1]{\lVert #1 \rVert}
\newcommand{\@Norm}[1]{\left\lVert #1 \right\rVert}
\newcommand{\ceil}{\@ifstar{\@Ceil}{\@ceil}}
\newcommand{\@ceil}[1]{\lceil #1 \rceil}
\newcommand{\@Ceil}[1]{\left\lceil #1 \right\rceil}
\newcommand{\floor}{\@ifstar{\@Floor}{\@norm}}
\newcommand{\@floor}[1]{\lfloor #1 \rfloor}
\newcommand{\@Floor}[1]{\left\lfloor #1 \right\rfloor}
\makeatother
\declaretheorem[name=Theorem,numberwithin=section]{thm}
\declaretheorem[name=Proposition,sibling=thm]{prop}
\declaretheorem[name=Lemma,sibling=thm]{lemma}
\begin{document}
\begin{center}
\Large
CPSC 532S: Assignment 2 --
due Friday, 18 Feb 2022, 11:59pm
\end{center}
This assignment is split into \textcolor{red}{four} questions (the parts with big section headers; most have sub-parts).
You can solve each question in groups of up to three.
Groups don't need to be consistent between problems;
you can do Q1 and Q2 alone, Q3 with Alice, and Q4 with Bob and Carlos if you want.
Please \textbf{do not} just split the questions up and do the parts separately.
\textbf{\color{red} If your name is on a solution, you are pledging that you contributed significantly to the solution and understand it fully.}
There is a separate Gradescope assignment for each problem;
use the Gradescope groups feature to submit once and associate with each of you,
but also put all of your names on the first page as a backup.
Prepare your answers to these questions \textbf{using \LaTeX}.
Hopefully you're reasonably familiar with it,
but if not, try using Overleaf and looking around for tutorials online.
(Note that free Overleaf accounts can only share with one ``named collaborator,''
but you can collaborate with more people by sending them an edit link.
Make sure you only share the parts of the homework you're handing in together!)
Feel free to ask questions if you get stuck on things on Piazza
(but remove any details about the actual answers\dots
feel free to make a private post if that's tough).
If you look stuff up anywhere other than in one of the two course textbooks,
please \textbf{cite your sources}:
just say in the answer to that question where you looked.
(A link is fine, no need for a formal citation.)
Please do not look at solution manuals or so on.
If you accidentally come across a solution while looking for something related,
still write the argument up in your own words, link to wherever you found it, and be clear about what happened.
If you like, the \texttt{.tex} source for this file is available on the course website,
and you can put your answers in
\verb|\begin{answ|\verb|er} My answer here... \end{answer}|
% that weird formatting so my answer-stripping script doesn't catch it :p
environments to make them stand out if so;
feel free to delete whatever boilerplate you want (or not, I'm not printing them out).
Or answer in a fresh document;
just make it clear which question you're answering where.
If you're using a consistent group and want to write your answers in one document,
you could split the PDF with e.g.\ \texttt{qpdf a2.pdf --pages . 2-3 -- q1.pdf}
or through the GUI of a PDF viewer.
Or you can upload the full file four times and just make sure you assign pages appropriately.
Submit your answers as a PDF on Gradescope:
\href{https://piazza.com/class/ky0u4qahwdn2xj?cid=17}{instructions on Piazza}.
You'll be prompted to mark where each sub-part is in your PDF;
make sure you mark all relevant pages for each part.
(This saves me a surprising amount of time in grading.)
If something goes wrong, you can also email your assignment to me directly (\texttt{dsuth@cs.ubc.ca}).
\clearpage
\section{Threshold functions \pts{20 points + 10 bonus}}
Recall our old friend, the class of threshold functions on $\R$:
\[
\cH = \left\{
x \mapsto \mathds 1(x \le \theta)
: \theta \in \R
\right\}
.\]
Let's also restate the Sauer-Shelah lemma for clarity\footnote{SSBD sets the threshold for the second part to $n \ge d + 2$, but $n \ge d$ is sufficient; see the proof from MRT we saw in class.}:
\begin{lemma}[Sauer-Shelah]
If the VC dimension of a hypothesis class $\cH$ is $d$,
the growth function $\tau_\cH$ satisfies (for any $n \in \mathbb N$)
\[
\tau_\cH(n) \le \sum_{i=0}^d \binom{n}{i}
.\]
Moreover, for all $n \ge d$, we have (where $e = \exp(1) \approx 2.718$ is Euler's constant)
$
\tau_\cH(n) \le \left( \frac{e n}{d} \right)^d
$.
\end{lemma}
We showed in class that the VC dimension of $\cH$ is 1:
it can shatter a set of size one (a single point),
but it cannot shatter any set of size two (since it can't label the left point 0 and the right point 1).
\begin{qlist}
\item \label{q:thresh:sauer} \pts{7}
Use the two parts of the Sauer-Shelah lemma to give two upper bounds on the growth function $\tau_\cH(n)$.
\item \label{q:thresh:tau} \pts{7}
Directly derive the exact value of the growth function $\tau_\cH$ from its definition.
How tight are the upper bounds from \cref{q:thresh:sauer}?
\item \label{q:thresh:rad-ub} \pts{6}
Use the previous parts to give an upper bound on the empirical Rademacher complexity $\emprad(\cH)$.
\item \pts{BONUS: 10}
Give the asymptotic value of $\emprad(\cH)$ for an $S$ with $n$ distinct points.
\textcolor{black}{Your answer might look something like ``$\rad(\cH) = 7n + \bigO(1)$,'' with a justification.}
How does it compare to the bound from \cref{q:thresh:rad-ub}?
\hint{It's not $7n + \bigO(1)$.}
\hint{I couldn't think of a real hint for this one that doesn't basically give you the answer, which is why it's bonus points. But I will say that to get the exact asymptotics, I had to look up a math paper. You'll get almost all the points if you just don't know the constant, and a good chunk of bonus points for an approach that seems fruitful. Remember to cite your sources.}
\end{qlist}
\clearpage
\section{Hyper-rectangles \pts{20}}
The class of axis-aligned hyper-rectangles in $\R^d$ is given by
\[
\cH = \{ x \mapsto \mathds{1}(a_1 \le x_1 \le b_1) \cdots \mathds{1}(a_d \le x_d \le b_d) : a_1 \le b_1, \dots, a_d \le b_d \}
.\]
That is, a hypothesis $h$ returns 1 if $x \in \R^d$ is inside the hyper-rectangle $[a_1, b_1] \times \cdots \times [a_d, b_d]$, and 0 otherwise.
\ask{Prove that $\VCdim(\cH) = 2 d$.}
\hint{SSBD section 6.3.3 shows this for $d = 2$.}
\clearpage
\section{Monotonicity \pts{20}}
\begin{qlist}
\item \pts{6} \label{q:vc-monotone}
Prove that if $\cH \subseteq \cH'$, then $\VCdim(\cH) \le \VCdim(\cH')$.
\item \pts{6} \label{q:rad-monotone}
Prove that if $\cH \subseteq \cH'$, then $\emprad(\cH) \le \emprad(\cH')$ and $\rad(\cH) \le \rad(\cH')$.
\item \pts{8}
Comment on how we should expect \cref{q:vc-monotone,q:rad-monotone} to affect the generalization loss of running ERM in $\cH$ versus $\cH'$. What other factors are at play?
\end{qlist}
\clearpage
\section{Generalization bound for a simple neural network \pts{40}}
\emph{Based on MRT exercise 3.11.}
Here is a class of neural networks mapping $\R^d$ to $\R$,
with one hidden layer of width $m$ and activations $\phi : \R \to \R$ a $1$-Lipschitz function (e.g.\ the ReLU or sigmoid function):
\[
\cH = \left\{
x \mapsto
\sum_{j=1}^m w_j \phi(u_j\tp x)
:
\norm{w}_1 \le \nu, \;
\norm{u_j}_2 \le \Lambda \text{ for each } j \in [m]
\right\}
.\]
$\Lambda$ and $\nu$ are hyperparameters defining how complex the class is allowed to be.
\begin{qlist}
\item \pts{10}
Show that $\displaystyle \emprad(\cH) = \frac{\nu}{n} \E_{\bsigma}\left[ \sup_{\norm{u}_2 \le \Lambda} \abs*{\sum_{i=1}^n \sigma_i \phi(u\tp x_i)} \right]$.
\end{qlist}
The following is a form of Talagrand's contraction lemma that works for any $\cH$ and $L$-Lipschitz function $\Phi$:%
\footnote{The MRT exercise claims that, for any $\cH$ and $L$-Lipschitz $\Phi$,
\begin{equation} \label{eq:tal:wrong}
\frac1n \E_{\bsigma}\left[ \sup_{h \in \cH} \abs*{\sum_{i=1}^n \sigma_i \Phi(h(x_i))} \right]
\le \frac{L}{n} \E_{\bsigma}\left[ \sup_{h \in \cH} \abs*{\sum_{i=1}^n \sigma_i h(x_i)} \right]
\tag{wrong}
.\end{equation}
This would be true if you dropped the absolute value bars, but the version as stated is not true.
For a counterexample,
consider the (not very interesting) hypothesis class $\cH = \{ x \mapsto 0 \}$
and the function $\Phi(x) = C$ which, as it ignores its argument, is $L$-Lipschitz for \emph{any} $L \ge 0$.
The LHS of \eqref{eq:tal:wrong} is $\abs{C}$ times
$\E_{\bsigma} \abs*{\frac1n \sum_{i=1}^n \sigma_i} = \sqrt{2 / (\pi n)} + \bigO(n^{-3/2})$,
as we discussed briefly in class.
The RHS of \eqref{eq:tal:wrong}, though, is exactly $0$,
since $h(x_i) = 0$ for all $x_i$.
This example also shows that assuming a symmetric $\cH$ would not be enough to fix \eqref{eq:tal:wrong}:
you would need $\Phi \circ \cH$ to be symmetric as well.
}
\begin{equation} \label{eq:tal}
\frac1n \E_{\bsigma}\left[ \sup_{h \in \cH} \abs*{\sum_{i=1}^n \sigma_i \Phi(h(x_i))} \right]
\le \frac{2 L}{n} \E_{\bsigma}\left[ \sup_{h \in \cH} \abs*{\sum_{i=1}^n \sigma_i h(x_i)} \right] + \frac{\abs{\Phi(0)}}{\sqrt n}
\tag{*}
.\end{equation}
\begin{qlist}[resume]
\item \pts{10}
Use \eqref{eq:tal} to upper bound $\emprad(\cH)$ in terms of $\emprad(\cH')$, for
\[
\cH' = \big\{ x \mapsto s (u\tp x) : \norm{u}_2 \le \Lambda, \, s \in \{-1, +1\} \big\}
.\]
\item \pts{10}
Bound $\rad(\cH')$, and thereby $\rad(\cH)$.
\textcolor{black}{You'll need an assumption on $\norm x$ to do this;} be clear what you're assuming.
\hint{We essentially did this in class (lecture 8, slide 11; note that I messed up a bit in class, but it's fixed on the posted slides).}
\item \pts{10}
Give an expression for $\varepsilon$
such that
$\displaystyle
\Pr_{S \sim \dist^n}\left(
\sup_{h \in \cH} \left( L_\dist(h) - L_S(h) \right)
\le \varepsilon
\right) \ge 1 - \delta$.
\begingroup\color{black}
Choose some reasonable loss function $\ell$ for inside $L$,
and be clear about any additional assumptions.
There are several valid approaches here.
Try to pick a reasonable set of assumptions and loss function;
something like ``all the $y$s are equal to $0$'' is not reasonable.
Your bound should have $\varepsilon \to 0$ as $n \to \infty$ with all other parameters fixed.
\endgroup
\end{qlist}
\end{document}