Matrix Differentiation
Overview
\mathbf{y} = \mathbf{Ax} is a vector. However, its derivative with respect to \mathbf{x} is a matrix \mathbf{A}. A vector is expanded to a matrix. The new dimension appeared during the differentiation is the dimension of \mathbf{x}. If the length of \mathbf{x} is n, the length of \mathbf{y} should also be n. The derivative looks like \frac{\partial \mathbf{y}}{\partial \mathbf{x}}. If we put the derivatives into the original vector shape, it looks like this.
However, each of the \mathbf x here is a vector, so the actual derivative is expanded with a dimension of the length of \mathbf x.
When we expand the dimension, we need to align that dimension well across different terms of the function. For the following example, \mathbf x and \mathbf a are vectors. So the function is a scalar before differentiation. However, the \mathbf x in one of the terms is transposed but not in the other one. When we expand the scalar to a vector during differentiation, we should either expand according to the transposed \mathbf x or the not transposed \mathbf x, but not both. In this way, the two results can be added up together to 2\mathbf a. Otherwise, they will be a row and a column vector.
Rules
All the rules are expanded according to \mathbf x as the last dimension not transposed.
For the quadratic form