-
Daniel Juarez authoredDaniel Juarez authored
Troubleshooting
Arne recently added Alex and Juárez to the access list for Ironic monitoring. This can help checking if there is a major outage going on.
Check that TFTP and HTTP files can be served
TFTP is a nightmare and we could not make the client part work on a CS8 client so use a CC7 one.
You will need to enable TFTP on the client's side firewall or packets will be filtered/dropped:
yum install tftp -y
firewall-cmd --zone=public --add-service=tftp --permanent
firewall-cmd --reload
Then check that files can be retrieved:
# Check that the port 69 is open, or TFTP won't work
nmap aims.cern.ch -sU -p 69
# Check autoregistration part works fine. If this works, registered images should work as well, but if needed adapt paths.
# UEFI bootloader
tftp -4 -vvv -m binary aimstest02.cern.ch -c get /hwreg/loader/uefi/bootx64.efi.0
# BIOS bootloader (HTTPS)
tftp -4 -vvv -m binary aims.cern.ch -c get /hwreg/loader/bios/lpxelinux.0
# BIOS hwreg image (HTTP)
wget aims.cern.ch/aims/boot/HWREG_AUTOINSTALL/vmlinuz
# UEFI hwreg image (TFTP). Ref. https://its.cern.ch/jira/browse/LOS-763
tftp -4 -vvv -m binary aims.cern.ch -c get/aims/boot/OPENSTACK-IRONIC-IPA/vmlinuz
Collection of AIMS2 errors
We have gathered some AIMS2 errors in the following link: AIMS2 error collection
This list is not yet complete and depends on contributions from Procurement, Ironic and Linux Support teams.
Error: No interface data returned from LanDB for XXX.
This is an issue we have seen from time to time. First make sure the node appears in LANDB (https://network.cern.ch) and that there is no related OTG.
It is normally due to the node not having all required info on LANDB, even if it seems so on the web report.
You can double check by comparing a working and a not working node with the sample code on https://network.cern.ch/sc/soap/6/soaplite-example2.pl.txt. You would normally get not working nodes not showing info under NetworkInterfaceCards -> HardwareAddress
.
Logging
AIMS2 logs from all its components dnsmasq
, in.tftpd
, aims2sync
and httpd
are sent to Kibana through logstash.
Check if a specific machine is contacting AIMS2
- Go to https://es-linux1.cern.ch, select custom "Internal" tenant.
- Filter your search to
linux_private-aims*
Index pattern - If you want to search for a specific machine you can do it by:
- searching its MAC address as in
"54:ab:3a:79:44:3e"
or"54-ab-3a-79-44-3e"
. You will get logs from eitherdnsmasq
,tftpd
andaims2sync
- searching its IP address according to https://network.cern.ch
- searching its host name
- searching its MAC address as in
If what you want is also to check whether the PXE config files for SYSLINUX / GRUB2 are being created, you can then do the following:
# Normally aims01 would be the master; aimstest01 for the test env
ssh root@aims01
# Search your MAC address
find / -type d -wholename "/var/log/aims2sync.log*" | xargs zgrep "54:ab:3a:79:44:3e"
# ARP-typed MACs are also possible to use
find / -type d -wholename "/var/log/aims2sync.log*" | xargs zgrep "01-54-ab-3a-79-44-3e"
# Search your host name
find / -type d -wholename "/var/log/aims2sync.log*" | xargs zgrep "IPXETESTNETBOOT"
Sample logs and its meaning
aims2server
Entries refer mostly to interface configurations synced to disk, i.e. /tftpboot/aims/config/.../...
or to synced images that are ready to use.
Apr 01 17:41:38 aims01.cern.ch server.cgi[3213235]: 188.185.120.186 - ADD pxe conf for 01-a4:bf:01:5e:fb:c1 / MAC a4:bf:01:5e:fb:c1 (RALLY-2225-JCYS) [uefi]
These correspond to our monitoring. See https://kojimon.web.cern.ch
Apr 01 17:41:47 aims01.cern.ch httpd[3095698]: ::1 - - [01/Apr/2022:17:41:47 +0200] "GET /server-status/?auto HTTP/1.1" 200 825 459 "-" "Go-http-client/1.1"
You may see many other logs but they are self explanatory.
Bear in mind as of April 2022 we have enabled DB debug level to know the queries being done. It can be removed if desired but has been proven useful for debugging past issues.
xinetd
Entries refer to TFTP transactions with the clients, IP corresponds to client's IP and can be checked on https://network.cern.ch. Note in.tftp
comes from the xinetd
unit.
2020-08-24T17:37:31.740790+02:00 aims01 in.tftpd[2939]: Client ::ffff:128.142.33.81 finished /aims/loader/bios/pxelinux.cfg/default
dnsmasq
Entries refer to all the DHCP info from the client
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 available DHCP subnet: 128.142.0.0/255.255.0.0
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 vendor class: PXEClient:Arch:00000:UNDI:002001
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 PXE(eth0) a4:bf:01:27:75:67 proxy
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 tags: x86PC, eth0
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 bootfile name: /aims/loader/bios/lpxelinux.0
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 server name: 188.184.21.168
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 next server: 188.184.21.168
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 sent size: 1 option: 53 message-type 5
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 sent size: 4 option: 54 server-identifier 188.184.21.168
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 sent size: 9 option: 60 vendor-class 50:58:45:43:6c:69:65:6e:74
Aug 23 19:23:18 aims01.cern.ch dnsmasq-dhcp[8015]: 19363175 sent size: 17 option: 97 client-machine-id 00:55:22:04:1e:9b:c3:11:e7:ab:21:a4:bf:01...
httpd
Entries refer to HTTP calls the server is receiving, basically client operations.
137.138.156.101 - - [23/Aug/2020:03:29:32 +0200] "POST /aims/server HTTP/1.1" 200 445 364974 "-" "SOAP::Lite/Perl/1.1"