问题描述:

I am trying to use pixel-wise two class segmentation (foreground/background) to achieve object detection using CNN in TensorFlow. I am using binary object masks as groundtruth. As a very simple case I am using synthetically generated dataset with grayscale images, black background 3 lines and one rectangle. My goal is to detect a rectangle. Example of my input and output are given below.

Input (320x240):

Output (20x15):

My network is fully convolutional, with last layer having size [20x15x2]. Each of these 2 feature maps represents probability of pixel belonging to respective class.

This is how I'm computing loss function:

reshaped_logits = tf.reshape(pred, [-1, 2]) # shape [batch_size*20*15, 2]

reshaped_labels = tf.reshape(y, [-1]) # shape [batch_size*20*15]

loss = tf.nn.sparse_softmax_cross_entropy_with_logits(reshaped_logits, reshaped_labels)

cost = tf.reduce_mean(loss)

optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)

My loss function converges to a minimum of:

Minibatch Loss= 0.693147 quite fast, even though my output is really wrong. This is how I am computing the predicted mask in one iteration:

t = tf.nn.softmax(reshaped_logits) # from code snipped above

prob_maps = tf.reshape(t, [-1, out_size[0], out_size[1], 2]) # out_size=(20,15)

t = tf.cast(tf.argmax(prob_maps, 3), tf.int32)

out = tf.reshape(t, [-1, out_size[0], out_size[1]])

The output I'm getting for this groundtruth:

[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]

[0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]

[0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]

[0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]

[0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]

[0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]

[0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]]

... is this:

[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]

Is it possible that my model does converge properly but that I am somehow computing the output in a wrong way?

相关阅读:
Top