# Matrix Differentiation

## Overview

$\mathbf{y} = \mathbf{Ax}$ is a vector. However, its derivative with respect to $\mathbf{x}$ is a matrix $\mathbf{A}$. A vector is expanded to a matrix. The new dimension appeared during the differentiation is the dimension of $\mathbf{x}$. If the length of $\mathbf{x}$ is $n$, the length of $\mathbf{y}$ should also be $n$. The derivative looks like $\frac{\partial \mathbf{y}}{\partial \mathbf{x}}$. If we put the derivatives into the original vector shape, it looks like this.

{\frac {\partial \mathbf {y} }{\partial \mathbf x}}={\begin{bmatrix}{\frac {\partial y_{1}}{\partial \mathbf x}}\\{\frac {\partial y_{2}}{\partial \mathbf x}}\\\vdots \\{\frac {\partial y_{m}}{\partial \mathbf x}}\\\end{bmatrix}}

However, each of the $\mathbf x$ here is a vector, so the actual derivative is expanded with a dimension of the length of $\mathbf x$.

{\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}={\begin{bmatrix}{\frac {\partial y_{1}}{\partial x_{1}}}&{\frac {\partial y_{1}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{1}}{\partial x_{n}}}\\{\frac {\partial y_{2}}{\partial x_{1}}}&{\frac {\partial y_{2}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{2}}{\partial x_{n}}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial y_{m}}{\partial x_{1}}}&{\frac {\partial y_{m}}{\partial x_{2}}}&\cdots &{\frac {\partial y_{m}}{\partial x_{n}}}\\\end{bmatrix}}

When we expand the dimension, we need to align that dimension well across different terms of the function. For the following example, $\mathbf x$ and $\mathbf a$ are vectors. So the function is a scalar before differentiation. However, the $\mathbf x$ in one of the terms is transposed but not in the other one. When we expand the scalar to a vector during differentiation, we should either expand according to the transposed $\mathbf x$ or the not transposed $\mathbf x$, but not both. In this way, the two results can be added up together to $2\mathbf a$. Otherwise, they will be a row and a column vector.

\frac{\partial ({\mathbf x^\top \mathbf a + \mathbf a^\top \mathbf x})}{\partial \mathbf x} = 2\mathbf a

## Rules

All the rules are expanded according to $\mathbf x$ as the last dimension not transposed.

\frac{\partial ({\mathbf a^\top \mathbf x})}{\partial \mathbf x} = \mathbf a
\frac{\partial ({\mathbf x^\top \mathbf a})}{\partial \mathbf x} = \mathbf a