Abstract

This post shows how to derive the derivative of common neural network modules such as linear transformation, softmax cross entropy loss and some activation functions step by step.

Notations

Techniques for derivation

Linear Transformation

Forward

Get and

Softmax and Cross Entropy Loss

Forward Process

Get when and

Activation Functions

Sigmoid

Tanh

ReLU