Skip to content

Apptainer occasionally fails to start containers

Apptainer occasionally fails to start containers with a pretty cryptic message

ERROR  : Failed to get file information for file descriptor 3: Bad file descriptor
ERROR  : Could not write info to setgroups: Permission denied

This is disruptive to testing as often we need to re-build (depending on which platform fails).

We should probably

  1. Pass --debug to apptainer to get a sample of debug logs when the problem happens (to understand what this file descriptor is pointing to).
  2. Detect and handle the failure to start the container. (In the second example below, the build of a project fails, and the install and the rest of the projects happily continue.)
  3. Until the underlying problem is understood, we should probably retry running apptainer. We should consider if the retrial should be done on any error. The exit code is for example also 1 (the same as when apptainer fails to start) when the build (ninja) fails for a genuine problem.

Some related discussions: https://github.com/apptainer/apptainer/issues/430 https://github.com/apptainer/singularity/pull/4953 https://github.com/apptainer/singularity/issues/5206

Examples

https://jenkins-lhcb-nightlies.web.cern.ch/job/nightly-builds/job/build/337143/consoleFull

2023-09-23 12:15:56,463:DEBUG   : running cmake --install LHCb/build --prefix LHCb/InstallArea/x86_64_v2-centos7-clang12-opt
2023-09-23 12:15:56,463:DEBUG   : apptainer command: /cvmfs/lhcbdev.cern.ch/nightly-environments/5ced04d9e2e8ad4a44b36e93198a8c2f88c23ebb2313ab50b202d5e241e8cb8d/bin/apptainer exec --contain --bind /cvmfs --bind /home/lblocal/jenkins-build/workspace/nightly-builds/build@2:/workspace --bind /home/lblocal/jenkins-build/workspace/nightly-builds/build@2 --pwd /workspace/build --env PATH=/cvmfs/lhcb.cern.ch/lib/bin/x86_64-centos7:/cvmfs/lhcb.cern.ch/lib/bin/x86_64-centos7:/cvmfs/lhcb.cern.ch/lib/bin/Linux-x86_64:/cvmfs/lhcb.cern.ch/lib/bin:/cvmfs/lhcbdev.cern.ch/nightly-environments/5ced04d9e2e8ad4a44b36e93198a8c2f88c23ebb2313ab50b202d5e241e8cb8d/bin:/cvmfs/lhcbdev.cern.ch/conda/miniconda/linux-64/1622055603/condabin:/usr/sue/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/opt/puppetlabs/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin /cvmfs/lhcb.cern.ch/containers/os-base/centos7-devel/prod/amd64 cmake --install LHCb/build --prefix LHCb/InstallArea/x86_64_v2-centos7-clang12-opt
2023-09-23 12:15:56,620:DEBUG   : [91mERROR  : Failed to get file information for file descriptor 3: Bad file descriptor
2023-09-23 12:15:56,621:DEBUG   : [0m[91mERROR  : Could not write info to setgroups: Permission denied
2023-09-23 12:15:56,622:DEBUG   : command exited with code 1
2023-09-23 12:15:56,622:DEBUG   : Completed at: 2023-09-23 12:15:56.622240

https://jenkins-lhcb-nightlies.web.cern.ch/job/nightly-builds/job/build/337186/consoleFull

2023-09-23 19:48:31,458:DEBUG   : running cmake --build Detector/build -j 10 -- -k0
2023-09-23 19:48:31,458:DEBUG   : apptainer command: /cvmfs/lhcbdev.cern.ch/nightly-environments/5ced04d9e2e8ad4a44b36e93198a8c2f88c23ebb2313ab50b202d5e241e8cb8d/bin/apptainer exec --contain --bind /cvmfs --bind /home/lblocal/jenkins-build/workspace/nightly-builds/build@2:/workspace --bind /home/lblocal/jenkins-build/workspace/nightly-builds/build@2 --pwd /workspace/build --env PATH=/cvmfs/lhcb.cern.ch/lib/bin/x86_64-el9:/cvmfs/lhcb.cern.ch/lib/bin/x86_64-centos7:/cvmfs/lhcb.cern.ch/lib/bin/Linux-x86_64:/cvmfs/lhcb.cern.ch/lib/bin:/cvmfs/lhcbdev.cern.ch/nightly-environments/5ced04d9e2e8ad4a44b36e93198a8c2f88c23ebb2313ab50b202d5e241e8cb8d/bin:/cvmfs/lhcbdev.cern.ch/conda/miniconda/linux-64/1622055603/condabin:/usr/sue/bin:/usr/lib64/ccache:/usr/local/bin:/usr/bin:/opt/puppetlabs/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin /cvmfs/lhcb.cern.ch/containers/os-base/alma9-devel/prod/amd64 cmake --build Detector/build -j 10 -- -k0
2023-09-23 19:48:31,609:DEBUG   : [91mERROR  : Failed to get file information for file descriptor 3: Bad file descriptor
2023-09-23 19:48:31,609:DEBUG   : [0m[91mERROR  : Could not write info to setgroups: Permission denied
2023-09-23 19:48:31,610:DEBUG   : command exited with code 1
2023-09-23 19:48:31,615:DEBUG   : running cmake --install Detector/build --prefix Detector/InstallArea/x86_64_v3-el9-gcc12+cuda12_1-opt+g

/cc @clemenci @cburr @sponce