Fix major metric learning loss function bug
This fixes a seemingly small, but significant bug in how the metric learning hinge loss is implemented. By default, the hinge loss is linear:
negative_loss = max(0, margin - d).mean()
in the example of negative loss (positive loss is similar). We however have seen that using the squared distance d**2
is more stable (it is easier to calculate, and it's smooth at the margin). Problem is that we've been implementing that wrong for a loooong time!
Previously, the squared distance loss was implemented as:
negative_loss = max(0, margin**2 - d**2).mean()
which kind of looks right. But this has a gradient of zero, when the distance is zero. This is very wrong - the gradient should be highest when the negative pair are closest. The correct implementation should be (and is, in this draft):
negative_loss = max(0, margin - d).pow(2).mean()
Now the gradient is largest at d=0
.
N.B. I am setting this to draft, as I would like to produce some graph construction performance numbers to prove that this improves performance, before merging in.