CI Test Job Seg Faults
Job #19834173 failed for bdcdfba3:
Reported to @leejr and @enibigir by @fravera. We tried a few times to get gdb to run directly in the CI and also tried downloading the binary and core dump but there were still missing packages to make that successful.
Then in a local container with the same image used by the CI, I've imported the binary and core dump to recreate the CI env. After installing everything it asked for (which was a lot!), gdb
was still unable to give useful information because of some differences in the linking somehow. Trying to recompile in the local container to see if it can help.
In the process, I'm manually installing everything that's needed to run gdb
with all the debug info. If it ends up working eventually, then I can tag this as an image for a future set of CI debug jobs.
I tried a little bit to find a way to spawn a CI job on failure of a previous job, but I couldn't find a way to do that. In the meantime, I can make a "catch-all" debug job at the end that is manually triggered, that any core dumps in the artifacts and somehow runs gdb
on it. Might be a little tricky to design elegantly.
Current gdb output
[root@b2e8fdfed4d5 Ph2_ACF]# gdb ./bin/ot_module_test ci_tools/core
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /Users/leejr/cernbox/Work/Ph2_ACF/bin/ot_module_test...done.
[New LWP 43]
[New LWP 66]
[New LWP 65]
[New LWP 69]
[New LWP 71]
[New LWP 72]
[New LWP 68]
[New LWP 67]
warning: .dynamic section for "/lib64/libzstd.so.1" is not at the expected address (wrong library or version mismatch?)
warning: .dynamic section for "/usr/lib64/root/libCling.so" is not at the expected address (wrong library or version mismatch?)
warning: Could not load shared library symbols for 16 libraries, e.g. /usr/lib64/root/libCore.so.6.22.
Use the "info sharedlibrary" command to see the complete listing.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: the debug information found in "/usr/lib/debug//lib64/libfcgi.so.0.0.0.debug" does not match "/lib64/libfcgi.so.0" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug/usr/lib64/libfcgi.so.0.0.0.debug" does not match "/lib64/libfcgi.so.0" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug//usr/lib64/libfcgi.so.0.0.0.debug" does not match "/lib64/libfcgi.so.0" (CRC mismatch).
warning: the debug information found in "/usr/lib/debug/usr/lib64//libfcgi.so.0.0.0.debug" does not match "/lib64/libfcgi.so.0" (CRC mismatch).
Missing separate debuginfo for /lib64/libfcgi.so.0
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/ae/6d9bf706f55071c560ded21cfe1294ebbf595f.debug
Core was generated by `ot_module_test -f CMS2S.xml -t -m -a -b --reconfigure'.
Program terminated with signal 6, Aborted.
#0 0x00007fa5d8b69387 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:55
55 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) quit