MGM address/port problems affecting HA
This line of code: https://gitlab.cern.ch/eos/eos-charts/-/blob/master/utils/templates/_hostnames.tpl?ref_type=heads#L52
- incorrectly hard codes "-0" in the pod name
- makes the assumption that the MGM STS name (
mgm.fullname) is equal toutils.mgm_hostname(hard to tell if this is guaranteed to be correct) - is very complex, involving two different functions and three different levels of overrides
At the end of the day, in a HA MGM setup with pods eos-mgm-0 and eos-mgm-1, you get all these env vars that only reference the first pod:
[root@eos-fst-0 /]# env|grep eos-mgm
EOS_MGM_URL=root://eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_ALIAS=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_FUSE_MGM_ALIAS=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_MASTER2=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_MASTER1=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
[root@eos-mgm-1 /]# env|grep EOS_|grep eos-mgm
EOS_MGM_ALIAS=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_FUSE_MGM_ALIAS=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_MASTER2=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_MASTER1=eos-mgm-0.eos-mgm.eos.svc.kermes-dev.local
EOS_MGM_ALIAS needs to be fixed to point to the MGM service.
I think the FSTs work correctly because they have this, correctly pointing at the MGM service:
# grep eos-mgm /etc/xrd.cf.fst
fstofs.broker root://eos-mgm.eos.svc.kermes-dev.local:1097//eos/
Edited by Ryan Taylor