Skip to content
Snippets Groups Projects
Code owners
Assign users and groups as approvers for specific file changes. Learn more.

Troubleshooting

Arne recently added Alex and Juárez to the access list for Ironic monitoring. This can help checking if there is a major outage going on.

Check that TFTP and HTTP files can be served

TFTP is a nightmare and we could not make the client part work on a CS8 client so use a CC7 one.

You will need to enable TFTP on the client's side firewall or packets will be filtered/dropped:

yum install tftp -y
firewall-cmd --zone=public --add-service=tftp --permanent
firewall-cmd --reload

Then check that files can be retrieved:

# Check that the port 69 is open, or TFTP won't work
nmap aims.cern.ch -sU -p 69

# Check autoregistration part works fine. If this works, registered images should work as well, but if needed adapt paths.
# UEFI bootloader
tftp -4 -vvv -m binary aimstest02.cern.ch -c get /hwreg/loader/uefi/bootx64.efi.0
# BIOS bootloader (HTTPS)
tftp -4 -vvv -m binary aims.cern.ch -c get /hwreg/loader/bios/lpxelinux.0
# BIOS hwreg image (HTTP)
wget aims.cern.ch/aims/boot/HWREG_AUTOINSTALL/vmlinuz
# UEFI hwreg image (TFTP). Ref. https://its.cern.ch/jira/browse/LOS-763
tftp -4 -vvv -m binary aims.cern.ch -c get/aims/boot/OPENSTACK-IRONIC-IPA/vmlinuz

Collection of AIMS2 errors

We have gathered some AIMS2 errors in the following link: AIMS2 error collection

This list is not yet complete and depends on contributions from Procurement, Ironic and Linux Support teams.

Error: No interface data returned from LanDB for XXX.

This is an issue we have seen from time to time. First make sure the node appears in LANDB (https://network.cern.ch) and that there is no related OTG.

It is normally due to the node not having all required info on LANDB, even if it seems so on the web report.

You can double check by comparing a working and a not working node with the sample code on https://network.cern.ch/sc/soap/6/soaplite-example2.pl.txt. You would normally get not working nodes not showing info under NetworkInterfaceCards -> HardwareAddress.

Logging

AIMS2 logs from all its components dnsmasq, in.tftpd, aims2sync and httpd are sent to Kibana through logstash.

Check if a specific machine is contacting AIMS2

  • Go to https://es-linux1.cern.ch, select custom "Internal" tenant.
  • Filter your search to linux_private-aims* Index pattern
  • If you want to search for a specific machine you can do it by:
    • searching its MAC address as in "54:ab:3a:79:44:3e" or "54-ab-3a-79-44-3e". You will get logs from either dnsmasq, tftpd and aims2sync
    • searching its IP address according to https://network.cern.ch
    • searching its host name

If what you want is also to check whether the PXE config files for SYSLINUX / GRUB2 are being created, you can then do the following:

# Normally aims01 would be the master; aimstest01 for the test env
ssh root@aims01
# Search your MAC address
find / -type d -wholename "/var/log/aims2sync.log*" | xargs zgrep "54:ab:3a:79:44:3e"
# ARP-typed MACs are also possible to use
find / -type d -wholename "/var/log/aims2sync.log*" | xargs zgrep "01-54-ab-3a-79-44-3e"
# Search your host name
find / -type d -wholename "/var/log/aims2sync.log*" | xargs zgrep "IPXETESTNETBOOT"

Sample logs and its meaning

aims2server

Entries refer mostly to interface configurations synced to disk, i.e. /tftpboot/aims/config/.../... or to synced images that are ready to use.

Apr 01 17:41:38 aims01.cern.ch server.cgi[3213235]: 188.185.120.186 - ADD pxe conf for 01-a4:bf:01:5e:fb:c1 / MAC a4:bf:01:5e:fb:c1 (RALLY-2225-JCYS) [uefi]

These correspond to our monitoring. See https://kojimon.web.cern.ch

Apr 01 17:41:47 aims01.cern.ch httpd[3095698]: ::1 - - [01/Apr/2022:17:41:47 +0200] "GET /server-status/?auto HTTP/1.1" 200 825 459 "-" "Go-http-client/1.1"

You may see many other logs but they are self explanatory.

Bear in mind as of April 2022 we have enabled DB debug level to know the queries being done. It can be removed if desired but has been proven useful for debugging past issues.

xinetd

Entries refer to TFTP transactions with the clients, IP corresponds to client's IP and can be checked on https://network.cern.ch. Note in.tftp comes from the xinetd unit.

2020-08-24T17:37:31.740790+02:00 aims01 in.tftpd[2939]: Client ::ffff:128.142.33.81 finished /aims/loader/bios/pxelinux.cfg/default

dnsmasq

Entries refer to all the DHCP info from the client

Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 available DHCP subnet: 128.142.0.0/255.255.0.0
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 vendor class: PXEClient:Arch:00000:UNDI:002001
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 PXE(eth0) a4:bf:01:27:75:67 proxy
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 tags: x86PC, eth0
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 bootfile name: /aims/loader/bios/lpxelinux.0
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 server name: 188.184.21.168
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 next server: 188.184.21.168
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 sent size:  1 option: 53 message-type  5
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 sent size:  4 option: 54 server-identifier  188.184.21.168
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 sent size:  9 option: 60 vendor-class  50:58:45:43:6c:69:65:6e:74
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 sent size: 17 option: 97 client-machine-id  00:55:22:04:1e:9b:c3:11:e7:ab:21:a4:bf:01...

httpd

Entries refer to HTTP calls the server is receiving, basically client operations.

137.138.156.101 - - [23/Aug/2020:03:29:32 +0200] "POST /aims/server HTTP/1.1" 200 445 364974 "-" "SOAP::Lite/Perl/1.1"