Multi-variable derivative
To the people that are not used to matrix derivatives (like me) it could be useful to see how
First, we note that if you derive with respect to some matrix, the output will be of the same dimension of that matrix. That notation is just deriving every single component independently and then joining them together, so it will be better understood as as
So we can prove each derivative independently, it's just a lot of manual work! We see that is just a quadratic form, studied in Massimi minimi multi-variabile#Forme quadratiche so it is just computing this:
Last equation is true because is a symmetric matrix, then we easily see that indeed it's true that indeed it's the first row of the matrix multiplied by 2.
Known theorems
The Multivariate Chain Rule
Let be an -dimensional vector, and let each depend on a scalar variable , i.e.,
Suppose we have a function that maps , i.e.,
Then, the total derivative of with respect to is given by:
or, in vector notation:
where:
- is the gradient of .
- is the time derivative of .
Proof
By definition, the total derivative of with respect to measures the rate of change of as varies:
Since is a function of , we perform a first-order Taylor expansion (see Hopital, Taylor, Peano) around :
Dividing by and taking the limit:
Since , we obtain:
Using vector notation we have:
This represents the directional derivative of along the trajectory , showing how evolves as changes.
Total Derivative Rule
This is a simple extension of the multi-variable chain rule described above:
Let be a function of:
- A vector which itself depends on , i.e., .
- A scalar parameter .
The total derivative of with respect to is given by:
This result follows from the multivariate chain rule. For a function where each depends on , the total derivative is:
In our case:
- The variables correspond to the components of .
- is a vector, so we sum over its components.
Thus, applying the chain rule:
We can rewrite the above in vector notation. Since is an -dimensional vector, we rewrite the sum as a dot product:
where:
- is the gradient .
- is the Jacobian .
One application of this formalism can is the reparametrization trick in Variational Inference.
Common derivatives
Determinant
Also see wikipedia.
In the special case we have:
Proof:
I don't think I have understood this thing quite well...
Matrix Inverse
Quadratic Form
This should be easy, and quite similar to the above case when we have derived .
Quadratic Inverse
You can interpret this as a function composition.
Vector Matrix derivative
Suppose yo uhave a and a so then the derivative is:
Where is the matrix and is the identity matrix of size and is the Kronecker product. So the dimension of the derivative is: