问题描述:

I am trying to follow the udacity tutorial on tensorflow where I came across the following two lines for word embedding models:

`# Look up embeddings for inputs.`

embed = tf.nn.embedding_lookup(embeddings, train_dataset)

# Compute the softmax loss, using a sample of the negative labels each time.

loss = tf.reduce_mean(tf.nn.sampled_softmax_loss(softmax_weights, softmax_biases,

embed, train_labels, num_sampled, vocabulary_size))

Now I understand that the second statement is for sampling negative labels. But the question is how does it know what the negative labels are? All I am providing the second function is the current input and its corresponding labels along with number of labels that I want to (negatively) sample from. Isn't there the risk of sampling from the input set in itself?

This is the full example: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/udacity/5_word2vec.ipynb

You can find the documentation for `tf.nn.sampled_softmax_loss()`

here. There is even a good explanation of **Candidate Sampling** provided by TensorFlow here (pdf).

How does it know what the negative labels are?

TensorFlow will randomly select negative classes among all the possible classes (for you, all the possible words).

Isn't there the risk of sampling from the input set in itself?

When you want to compute the softmax probability for your true label, you compute: `logits[true_label] / sum(logits[negative_sampled_labels]`

. As the number of classes is huge (the vocabulary size), there is very little probability to sample the true_label as a negative label.

Anyway, I think TensorFlow removes this possibility altogether when randomly sampling. (EDIT: @Alex confirms TensorFlow does this by default)

Candidate sampling explains how the sampled loss function is calculated:

- Compute the loss function in a subset
*C*of all training samples*L*, where*C = T ⋃ S*,*T*is the samples in target classes, and*S*is the randomly chosen samples in all classes.

The code you provided uses `tf.nn.embedding_lookup`

to get the inputs [batch_size, dim] `embed`

.

Then it uses `tf.nn.sampled_softmax_loss`

to get the sampled loss function:

- softmax_weights: A Tensor of shape [num_classes, dim].
- softmax_biases: A Tensor of shape [num_classes]. The class biases.
- embed: A Tensor of shape [batch_size, dim].
- train_labels: A Tensor of shape [batch_size, 1]. The target classes
*T*. - num_sampled: An int. The number of classes to randomly sample per batch. the numbed of classes in
*S*. - vocabulary_size: The number of possible classes.
- sampled_values: default to log_uniform_candidate_sampler

For one batch, the target samples are just `train_labels`

(*T*). It chooses `num_sampled`

samples from `embed`

randomly (*S*) to be negative samples.

It will uniformly sample from `embed`

respect to the softmax_wiehgt and softmax_bias. Since `embed`

is embeddings[train_dataset] (of shape [batch_size, embedding_size]), if embeddings[train_dataset[i]] contains train_labels[i], it might be selected back, then it is not negative label.

According to Candidate sampling page 2, there are different types. For NCE and negative sampling, *NEG=S*, which may contain a part of *T*; for sampled logistic, sampled softmax, *NEG = S-T* explicitly delete *T*.

Indeed, it might be a chance of sampling from train_ set.