"As Jeremy Howard points out, even academic papers often use softmax for multi-class classification, and I too have already seen it used incorrectly in blogs and papers during my short time studying DL."
AFAIK softmax should be used with mutli-class classification and sigmoid can be used with mutli-label classification.
AFAIK softmax should be used with mutli-class classification and sigmoid can be used with mutli-label classification.