Skip to content

MGM address/port problems affecting HA

This line of code: https://gitlab.cern.ch/eos/eos-charts/-/blob/master/utils/templates/_hostnames.tpl?ref_type=heads#L52

  • incorrectly hard codes "-0" in the pod name
  • makes the assumption that the MGM STS name (mgm.fullname) is equal to utils.mgm_hostname (hard to tell if this is guaranteed to be correct)
  • is very complex, involving two different functions and three different levels of overrides

At the end of the day, in a HA MGM setup with pods eos-mgm-0 and eos-mgm-1, you get all these env vars that only reference the first pod:

[root@eos-fst-0 /]# env|grep eos-mgm
EOS_MGM_URL=root://eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_ALIAS=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_FUSE_MGM_ALIAS=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_MASTER2=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_MASTER1=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
[root@eos-mgm-1 /]# env|grep EOS_|grep eos-mgm
EOS_MGM_ALIAS=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_FUSE_MGM_ALIAS=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_MASTER2=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_MASTER1=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local

EOS_MGM_ALIAS needs to be fixed to point to the MGM service.

I think the FSTs work correctly because they have this, correctly pointing at the MGM service:

# grep eos-mgm /etc/xrd.cf.fst
fstofs.broker root://eos-mgm.eos.svc.kermes-dev.local:1097//eos/
Edited by Ryan Taylor