# Solving Linear Regression Problems Using Fréchet Derivatives

Posted: 2022-10-07 (Updated: 2022-10-09)
Suppose we have a linear regression problem, where input vectors are in $$\mathbb{R}^n$$ and output vectors are in $$\mathbb{R}^m$$. Let $$N$$ be the size of dataset. For each sample $$(x^i,\,y^i)$$, we write \begin{gather*} x^i = \begin{pmatrix} x^i_1 \\ \vdots \\ x^i_n \end{pmatrix}, \quad y^i = \begin{pmatrix} y^i_1 \\ \vdots \\ y^i_m \end{pmatrix}, \end{gather*} and define $$X$$ and $$Y$$ as \begin{gather*} X = \begin{pmatrix} x^1 & \cdots & x^N \end{pmatrix} = \begin{pmatrix} x^1_1 & \cdots & x^N_1 \\ \vdots & \ddots & \vdots\\ x^1_n & \cdots & x^N_n \\ \end{pmatrix}, \\ Y = \begin{pmatrix} y^1 & \cdots & y^N \end{pmatrix} = \begin{pmatrix} y^1_1 & \cdots & y^N_1 \\ \vdots & \ddots & \vdots \\ y^1_m & \cdots & y^N_m \end{pmatrix}. \end{gather*} Let $$W \in \mathbb{R}^{m \times n}$$ be the matrix of coefficients as follows: \begin{gather*} W = \begin{pmatrix} w_{11} & \cdots & w_{1n} \\ \vdots & \ddots & \vdots \\ w_{m1} & \cdots & w_{mn} \end{pmatrix}. \\ \end{gather*} The squared error $$E$$ is \begin{align*} E(W) = \frac{1}{2} \| WX - Y \|^2, \end{align*} where $$\|\cdot\|$$ is the Frobenius norm.

If we change $$W$$ along $$H$$, $$E$$ changes as below: \begin{align} E(W + H) - E(H) &= \frac{1}{2}{ \left\| (W + H)X - Y \right\| }^2 - \frac{1}{2} { \left\| WX - Y \right\| }^2 \notag \\ &= \frac{1}{2} \langle WX - Y + HX,\, WX - Y + HX \rangle - \frac{1}{2} { \left\| WX - Y \right\| }^2 \notag \\ &= \frac{1}{2} {\| WX - Y \|}^2 + \langle WX - Y,\, HX \rangle + \frac{1}{2} {\| HX \|}^2 - \frac{1}{2} {\| WX - Y \|}^2 \notag \\ &= \trace ( (WX - Y) (HX)^\top ) +\frac{1}{2} {\|HX\|}^2 \notag \\ &= \trace( (WX - Y)X^\top H^\top ) +\frac{1}{2} {\|HX\|}^2 \notag \\ &= \langle (WX - Y)X^\top,\, H \rangle + \frac{1}{2}{\|HX\|}^2 \label{result} \end{align} According to Cauchy–Schwarz inequality, $$\|HX\|^2 \leq {\|H\|}^2 {\|X\|}^2$$. Therefore, for any $$\varepsilon > 0$$, if we take $$H$$ so small that $$\|H\| \leq \displaystyle \frac{\varepsilon}{2 \|X\|^2 + 1},$$ \begin{align*} \|f(W + H) - f(W) - \langle ( WX - Y)X^\top,\, H \rangle\| \leq \varepsilon \|H\| \end{align*} The derivative of $$f$$ with respect to $$W$$ at an arbitrary point $$W$$ along $$H$$ is the first term of the right-hand side of (\ref{result}): \begin{align*} D_W f(W)(H) = \langle ( WX - Y )X^\top,\, H \rangle. \end{align*} Since $$f$$ is convex, $$f$$ has the global minimum if and only if \begin{gather*} ( WX - Y )X^\top = 0 \\ W XX^\top = YX^\top \end{gather*} If $$XX^\top$$ is regular, \begin{gather*} W = Y X^\top (XX^\top)^{-1}. \end{gather*}