问题描述:

In Tensorflow, I have a classifier network and unbalanced training classes. For various reasons I cannot use resampling to compensate for the unbalanced data. Therefore I am forced to compensate for the misbalance by other means, specifically multiplying the logits by weights based on the number of examples in each class. I know this is not the preferred approach, but resampling is not an option. My training loss op is `tf.nn.softmax_cross_entropy_with_logits`

(I might also try `tf.nn.sparse_softmax_cross_entropy_with_logits`

). The Tensorflow docs includes the following in the description of these ops:

WARNING: This op expects unscaled logits, since it performs a softmax

on logits internally for efficiency. Do not call this op with the

output of softmax, as it will produce incorrect results.

*My question*: Is the warning above referring only to scaling done by softmax, or does it mean *any* logit scaling of any type is forbidden? If the latter, then is my class-rebalancing logit scaling causing erroneous results?

Thanks,

Ron

The warning just informs you that `tf.nn.softmax_cross_entropy_with_logits`

will apply a `softmax`

on the input logits, before computing cross-entropy. This warning seems really to avoid applying softmax twice, as the cross-entropy results would be very different.

Here is a comment in the relevant source code, about the function that implements `tf.nn.softmax_cross_entropy_with_logits`

:

```
// NOTE(touts): This duplicates some of the computations in softmax_op
// because we need the intermediate (logits -max(logits)) values to
// avoid a log(exp()) in the computation of the loss.
```

As the warning states, this implementation is for improving performance, with the caveat that you should not put your own `softmax`

layer as input (which is somewhat convenient, in practice).

If the forced `softmax`

hinders your computation, perhaps another API could help: `tf.nn.sigmoid_cross_entropy_with_logits`

or maybe `tf.nn.weighted_cross_entropy_with_logits`

.

The implementation does not seem to indicate, though, that *any* scaling will impact the result. I guess a linear scaling function should be fine, as long as it preserves the original logits repartition. But whatever is applied on the input logits, `tf.nn.softmax_cross_entropy_with_logits`

will apply a `softmax`

before computing the cross-entropy.