Solving Linear Regression Problems Using Fréchet Derivatives
If we change \(W\) along \(H\), \(E\) changes as below: \begin{align} E(W + H) - E(H) &= \frac{1}{2}{ \left\| (W + H)X - Y \right\| }^2 - \frac{1}{2} { \left\| WX - Y \right\| }^2 \notag \\ &= \frac{1}{2} \langle WX - Y + HX,\, WX - Y + HX \rangle - \frac{1}{2} { \left\| WX - Y \right\| }^2 \notag \\ &= \frac{1}{2} {\| WX - Y \|}^2 + \langle WX - Y,\, HX \rangle + \frac{1}{2} {\| HX \|}^2 - \frac{1}{2} {\| WX - Y \|}^2 \notag \\ &= \trace ( (WX - Y) (HX)^\top ) +\frac{1}{2} {\|HX\|}^2 \notag \\ &= \trace( (WX - Y)X^\top H^\top ) +\frac{1}{2} {\|HX\|}^2 \notag \\ &= \langle (WX - Y)X^\top,\, H \rangle + \frac{1}{2}{\|HX\|}^2 \label{result} \end{align} According to Cauchy–Schwarz inequality, \(\|HX\|^2 \leq {\|H\|}^2 {\|X\|}^2 \). Therefore, for any \(\varepsilon > 0\), if we take \(H\) so small that \(\|H\| \leq \displaystyle \frac{\varepsilon}{2 \|X\|^2 + 1},\) \begin{align*} \|f(W + H) - f(W) - \langle ( WX - Y)X^\top,\, H \rangle\| \leq \varepsilon \|H\| \end{align*} The derivative of \(f\) with respect to \(W\) at an arbitrary point \(W\) along \(H\) is the first term of the right-hand side of \((\ref{result})\): \begin{align*} D_W f(W)(H) = \langle ( WX - Y )X^\top,\, H \rangle. \end{align*} Since \(f\) is convex, \(f\) has the global minimum if and only if \begin{gather*} ( WX - Y )X^\top = 0 \\ W XX^\top = YX^\top \end{gather*} If \(XX^\top\) is regular, \begin{gather*} W = Y X^\top (XX^\top)^{-1}. \end{gather*}