Which loss function would fit best in a categorical (discrete) supervised learning?
kullback-leibler (KL) loss
Binary Crossentropy
Mean Squared Error (MSE)
Any L2 loss