Skip to content

Fix major metric learning loss function bug

Daniel Thomas Murnane requested to merge dmurnane_fixed_metric_learning_loss into dev

This fixes a seemingly small, but significant bug in how the metric learning hinge loss is implemented. By default, the hinge loss is linear:

negative_loss = max(0, margin - d).mean()

in the example of negative loss (positive loss is similar). We however have seen that using the squared distance d**2 is more stable (it is easier to calculate, and it's smooth at the margin). Problem is that we've been implementing that wrong for a loooong time!

Previously, the squared distance loss was implemented as:

negative_loss = max(0, margin**2 - d**2).mean()

which kind of looks right. But this has a gradient of zero, when the distance is zero. This is very wrong - the gradient should be highest when the negative pair are closest. The correct implementation should be (and is, in this draft):

negative_loss = max(0, margin - d).pow(2).mean()

Now the gradient is largest at d=0.

N.B. I am setting this to draft, as I would like to produce some graph construction performance numbers to prove that this improves performance, before merging in.

Merge request reports

Loading