That is the case when we split a Multi-Label classification problem in \(C\) binary classification problems. \(t_1\) and \(s_1\) are the groundtruth and the score for \(C_1\), and \(t_2 = 1 - t_1\) and \(s_2 = 1 - s_1\) are the groundtruth and the score for \(C_2\). It seems that problem was with wrong activation function.Where it’s assumed that there are two classes: \(C_1\) and \(C_2\). I came across an "inverted" issue - I was getting good results with categorical_crossentropy (with 2 classes) and poor with binary_crossentropy. Where c is the index running over the number of classes C. Therefore it is the product of binary cross-entropy for each single output unit.Īnd categorical cross-entropy is defined as Each output neuron (or unit) is considered as a separate random binary variable, and the loss for the entire vector of outputs is the product of the loss of single binary variables. In the last case, binary cross-entropy should be used and targets should be encoded as one-hot vectors. In the second case, categorical cross-entropy should be used and targets should be encoded as one-hot vectors. In the first case, binary cross-entropy should be used and targets should be encoded as one-hot vectors.
![categorical cross entropy categorical cross entropy](https://image.slidesharecdn.com/neuralnetworks-180208074228/95/neural-networks-21-638.jpg)
pile(loss='binary_crossentropy', optimizer='adam', metrics=)
![categorical cross entropy categorical cross entropy](https://cdn-images-1.medium.com/max/718/1*gctBX5YHUUpBEK3MWD6r3Q.png)
to use indeed binary cross entropy as your loss function (as I said, nothing wrong with this, at least in principle) while still getting the categorical accuracy required by the problem at hand, you should ask explicitly for categorical_accuracy in the model compilation as follows: from trics import categorical_accuracy Let's verify that this is the case, using the MNIST CNN example in Keras, with the following modification: pile(loss='binary_crossentropy', optimizer='adam', metrics=) # WRONG wayĮpochs=2, # only 2 epochs, for demonstration purposes What happens under the hood is that, since you have selected binary cross entropy as your loss function and have not specified a particular accuracy metric, Keras (wrongly.) infers that you are interested in the binary_accuracy, and this is what it returns - while in fact you are interested in the categorical_accuracy. Why is that? If you check the metrics source code, Keras does not define a single accuracy metric, but several different ones, among them binary_accuracy and categorical_accuracy. Will not produce what you expect, but the reason is not the use of binary cross entropy (which, at least in principle, is an absolutely valid loss function). Is valid, your second one: pile(loss='binary_crossentropy', optimizer='adam', metrics=) In other words, while your first compilation option pile(loss='categorical_crossentropy', optimizer='adam', metrics=)
![categorical cross entropy categorical cross entropy](https://problem-images-ad2fswrrrws323.s3-us-west-2.amazonaws.com/CategoricalCrossEntropyLoss.png)
This behavior is not a bug the underlying reason is a rather subtle & undocumented issue at how Keras actually guesses which accuracy to use, depending on the loss function you have selected, when you include simply metrics= in your model compilation. I would like to elaborate more on this, demonstrate the actual underlying issue, explain it, and offer a remedy. Wrong when using binary_crossentropy with more than 2 labels The accuracy computed with the Keras method evaluate is just plain The reason for this apparent performance discrepancy between categorical & binary cross entropy is what user xtof54 has already reported in his answer below, i.e.: