Tensorflow: _variable_with_weight_decay(...) Explanation
Solution 1:
The code does what it says. You are supposed to sum everything in the 'losses'
collection (which the weight decay term is added to in the second to last line) for the loss that you pass to the optimizer. In the loss()
function in that example:
tf.add_to_collection('losses', cross_entropy_mean)
[...]
return tf.add_n(tf.get_collection('losses'), name='total_loss')
so what the loss()
function returns is the classification loss plus everything that was in the 'losses'
collection before.
As a side note, weight decay does not mean you subtract the value of wd
from every value in the tensor as part of the update step, it multiplies the value by (1-learning_rate*wd)
(in plain SGD). To see why this is so, recall that l2_loss computes
output = sum(t_i ** 2) / 2
with t_i
being the elements of the tensor. This means that the derivative of l2_loss
with regard to each tensor element is the value of that tensor element itself, and since you scaled l2_loss
with wd
the derivative is scaled as well.
Since the update step (again, in plain SGD) is (forgive me for omitting the time step indexes)
w := w - learning_rate * dL/dw
you get, if you only had the weight decay term
w := w - learning_rate * wd * w
or
w := w * (1 - learning_rate * wd)
Post a Comment for "Tensorflow: _variable_with_weight_decay(...) Explanation"