Skip to content Skip to sidebar Skip to footer

Tensorflow: _variable_with_weight_decay(...) Explanation

at the moment I'm looking at the cifar10 example and I noticed the function _variable_with_weight_decay(...) in the file cifar10.py. The code is as follows: def _variable_with_weig

Solution 1:

The code does what it says. You are supposed to sum everything in the 'losses' collection (which the weight decay term is added to in the second to last line) for the loss that you pass to the optimizer. In the loss() function in that example:

tf.add_to_collection('losses', cross_entropy_mean)
[...]
return tf.add_n(tf.get_collection('losses'), name='total_loss')

so what the loss() function returns is the classification loss plus everything that was in the 'losses' collection before.

As a side note, weight decay does not mean you subtract the value of wd from every value in the tensor as part of the update step, it multiplies the value by (1-learning_rate*wd) (in plain SGD). To see why this is so, recall that l2_loss computes

output = sum(t_i ** 2) / 2

with t_i being the elements of the tensor. This means that the derivative of l2_loss with regard to each tensor element is the value of that tensor element itself, and since you scaled l2_loss with wd the derivative is scaled as well.

Since the update step (again, in plain SGD) is (forgive me for omitting the time step indexes)

w := w - learning_rate * dL/dw

you get, if you only had the weight decay term

w := w - learning_rate * wd * w

or

w := w * (1 - learning_rate * wd)

Post a Comment for "Tensorflow: _variable_with_weight_decay(...) Explanation"