As per Marco's instructions, this is based on an el7 container from an el8 or el9 host node.
Initially I started this on lbbuildinter02. This was ok until I had to pull external libraries (OpenGL for Geant4, via HEP_OSlibs) inside my container. Then I realised that lbbuildinter02 (the host, not the container) has network issues, it cannot ping mirrorlist.centos.org.
Therefore I moved to an openlab node, olarm-203. The rest of what I say below is on that node (unless otherwise stated).
The points I describe below are some issues and how I worked around them to advance. A real fix is in most cases still needed.
First, the part before Gauss.
For Gaudi and GitCondDB, I disabled Intel amplifier and I added GitCondDB to the list of projects
The test failures in Gaudi require some update of the stdout reference files, which I made in gaudi/Gaudi!1396 (merged) before deciding to use armv8.1_a instead of armv8_a.
I am now able to follow your codimd instructions to the end on lbbuildinter02, using podman in /home/andreav an dusing your latest container with the new HEP_OSlibs.
About 102b, my bad, sorry: I was still using 101arm_3 which was probably a previous version of codimd...
About the Gaudi tests: thanks, ok so the failures are expected becuse the ref files must be updated. I confirm that I get these
Total Test time (real) = 878.83 sec The following tests did not run: 206 - GaudiExamples.google_auditors.heapchecker (Skipped) 225 - GaudiExamples.jira.gaudi_1174 (Disabled) 278 - GaudiExamples.skipped_test (Skipped) The following tests FAILED: 209 - GaudiExamples.histoex (Failed) 211 - GaudiExamples.histoex2 (Failed) 265 - GaudiExamples.root_io.extcoll.read (Failed)
About IntelAmplifier: very good, thanks. With your new hackathon setup I do not need to change it in configuration.mk.
About GitCondDB: I confirm that I am using your modification line in configuration.mk, but this is not enough. I cannot make Detector or make GitCondDB unless/until I do the following
echo PROJECTS += GitCondDB >> configuration.mk
Thanks to you and @sponce for merging the patches to Detector and LHCb. I am now able to build GitCondDB, Detector, LHCb and Run2Support. I also tried the tests, but I am not sure they should work (do I need to specify some geometry tag to use? it seems to create one). The GitCondDB tests succceed, but the Detector, LHCb and Run2Support tests fail. For make Detector/test:
export CMAKE_PREFIX_PATH_original=$CMAKE_PREFIX_PATH export CMAKE_PREFIX_PATH=$CMAKE_PREFIX_PATH_original:/cvmfs/sft.cern.ch/lcg/releases/Geant4/11.0.3-21a84/aarch64-centos7-gcc11-opt make Gauss
Geant4 cannot find OpenGL.
My understanding is that for x86 this would normally be pulled by HEP_OSlibs. I therefore installed HEP_OSlibs within the container. (This is the reason why I moved from lbbuildinter02 to olarm-203).
Note: HEP_OSlibs for ARM is going to be regularly supported for el8/el9, but for el7 it is still shaky. The version that was available yesterday for el7 did not build in my container, see linuxsupport/rpms/HEP_OSlibs#8 (closed). I have now created a new version for el7, and I was able to install it
Ah, one important point: if you agree that OpenGL must come from HEP_OSlibs, it would be good to add HEP_OSlibs to the container, that is Marco's gitlab-registry.cern.ch/lhcb-core/lbdocker/aarch64-centos7-build
For the LHCB_7 layer I didn't do anything because we do not have the ARM builds of the generators we need. Your workaround allows you to move a bit further, but we need something proper.
About the container, I confirm that the new container has HEP_OSlibs and thus includes OpenGL. Just a minor note, it seems that podman did not look for the latest version in my first try this year, and used the cache. So maybe better update the codimd to specify "latest" or a specific version? I used this
About LHCb_7, we should have a chat at some point how it can/should be improved. For the moment I just did the same that I had done, and actually it worked (see further comments below about CRMC), so maybe this is enough?
Just one more point with respect to what I said above, somehow it is no longer needed to add Geant4 to the CMAKE_PREFIX_PATH (maybe thanks to me using 102b and not 101arm3...)
I do not know if this changed in LCG102b and/or specifically for ARM, but FindCRMC.cmake was looking for its headers in src, which is bizarre. I changed this to look for them in include/crmc instead
Madgraph LHAPDF62/C++ is not found
I have no idea how much or where Madgraph PDFs are used now in LHCb. It seems that the C++ version was looked for, but in LCG102b/ARM only a fortran version is available. I moved to that - again, a dirty workaround, a real fix is needed.
Photos++ CxxInterface and Fortran are missing
Here again I have no idea how much or whiche versions of Photos are used normally by LHCb. Here cmake was looking for libPhotosCxxInterface and libPhotosFortran, but LCG102b/ARM has instead libPhotospp and things like libPhotosHEPEVT. Initially I tried to change the FindPhotos.cmake to use those, but then I saw these are (not surprisingly) used elsehwre in the code. So my current workaround is to disable Photos altogether in GaussDependencies and in Gen/LbEvt.
Probably the real fix, as for Madgraph, is to get the needed files in the LCG102b/ARM on cvmfs?
After the above workarounds, my make Gauss is still stuck here, in LaMarr and GaussPhysics. I will probably only look at this after the winter break.
CMake Error at /workspace/Gaudi/InstallArea/armv8.1_a-centos7-gcc11-opt/lib/cmake/Gaudi/GaudiToolbox.cmake:399 (add_library): Target "GaussPhysics" links to target "Geant4::G4LHCblists" but the target was not found. Perhaps a find_package() call is missing for an IMPORTED target, or an ALIAS target is missing?Call Stack (most recent call first): Sim/GaussPhysics/CMakeLists.txt:16 (gaudi_add_module)CMake Error at /workspace/Gaudi/InstallArea/armv8.1_a-centos7-gcc11-opt/lib/cmake/Gaudi/GaudiToolbox.cmake:399 (add_library): Target "LbLamarr" links to target "LHCb::CaloUtils" but the target was not found. Perhaps a find_package() call is missing for an IMPORTED target, or an ALIAS target is missing?Call Stack (most recent call first): Sim/LbLamarr/CMakeLists.txt:16 (gaudi_add_module)
Thanks Marco. I am not sure if my Lamarr issue above is due to "an incorrect LHCB_7 definition". Actually the "Sim/GaussPhysics" issue no longer appears. As for Lamarr, I fixed it by simply disabling LHCb::CaloUtils in Lamarr, see !914 (54537d09). It seems that git clone LHCb pulls in Calo/CaloKernel but not Calo/CaloUtils, so I assume this is simply not needed anymore?
General status update, after several in-line replies. I have restarted from scratch, with several improvements thanks to Marco.
I now am using lbbuildinter02 because Marco's new container includes HEP_OSlibs already (that I need for OpenGL), so I do not need to pull it (which would fail on lbbuildinter02 as I have no network apparently).
I had a mistake in December in my configuration.mk, now fixed thanks to Marco (use LCG102b there).
I still have to do a few hacks as described above, mainly
Create LHCB_7, by copying as-is the LCG_102b-aarch64-centos7-gcc11-opt.cmake toolchain to LCG_102b_LHCB_7-aarch64-centos7-gcc11-opt.cmake
In addition, I still need all the configuration patches to Gauss in MR !914 (closed) . This includes so far the seven commits in December plus two additional ones I just added, one for Lamarr (disable LHCb/Calo/CaloUtils) and one for CRMC (to find th elib directory in cmake)
Once the above things are done, the Gauss configuration is fixed "enough" to start the build. A lot of stuff builds, but then there are a few build errors in the source code that I need to fix.
If I do make Gauss over and over (so that the parallel build manages to do as much as possible even if elsewhere there are errors), the main issues to fix seem to be the following
Build flags -march in AmpGen (this is a real aarch64 issue)
One CaloUtils header is not found. So my patch above must be undone. But where is CaloUtils supposed to come from, is it LHCb/Calo/CaloUtils? If this is the case, why is it not downloaded by git clone LHCb?
(I mean I guess I need sim10-patches for Gauss, I guess LHCb, and which other packages? I have Gaudi, GitCondDB, Detector, LHCb, Run2Support. Geant4, Gauss... plus a few others with tags pre-fixed by Marco, like lcg-toolchains, Catch2, yaml-cpp, DD4HEP).
I think that I can try the above, i.e. essentially fix Geant4 and use the LHCb branch sim10-patches as mentioned by Gloria. However I think I also need an aarch64 patch in Detector, so I will make my branch off v1r2 and add Marco's patch.
In summary this is what I suggest that I should try:
(DBASE: not needed, I have not downloaded this)
LCG: 102b (the aarch version I am using is /cvmfs/sft.cern.ch/lcg/views/LCG_102b/aarch64-centos7-gcc11-opt)
Geant4: v10r6p2t6 (at least, I will try this...)
Gaudi: master
Detector: v1r2 (or better my branch off v1r2, including Marco's patches which so far have only been merged to master)
LHCb: sim10-patches (or better my branch off sim10-patches, including Marco's patches which so far have only been merged to master)
Run2Support: master
Gauss: master (or better my branch off master, including my patches)
target_compile_options(AmpGen PUBLIC -march=x86-64)
This is in a catch-all "else" for no-SIMD. In the latest master this is changed to '-march=native' (however there are also a few added hooks to disable SIMD! actually native would normaly give the best SIMD available...)
I made progress on various fronts (and ased also @philten for help on Pythia), but it seems like I am totally stuck on Photos. In summary, it looks like the latest Gauss v56r2, while built on some LCG that probably uses photos 3.64, is using photos 3.56. The CxxInterface and Fortran libraries of Photos, which are needed in Gauss LbEvtGen, are only available in photos 3.56 and not in 3.64. The ARM build of photos 3.56 does not exist, there is only a build of photos 3.64. The options are I imagine:
EITHER ask someone (who? SFT?) to build photos 3.56 on ARM, and then percolate this though the various LHCb cvmfs installations/mirrrors and cmake lcg-toolchain fragments
OR disable completely photos in the Gauss build on ARM
What I had done so far, just to get the configuration going, was indeed to disable photos lookup in Gauss cmake. But now there are bits of Gauss code LbEvtGen that need photos, so I would need to disable those in an even harder way.
Can someone please suggest which option we need? Thanks!
PS Just for my own reference, this was my reverse engineering
for f in /cvmfs/lhcb.cern.ch/lib/lhcb/GAUSS/GAUSS_v56r2/InstallArea/x86_64_v2-centos7-gcc11-opt/lib/*so; do if grep -i photos $f; then ldd $f |& grep -i photos | grep -v CXX; echo; fi; doneBinary file /cvmfs/lhcb.cern.ch/lib/lhcb/GAUSS/GAUSS_v56r2/InstallArea/x86_64_v2-centos7-gcc11-opt/lib/libEvtGenExternal.so matches libPhotosCxxInterface.so.0 => /cvmfs/lhcb.cern.ch/lib/lcg/releases/MCGenerators/photos++/3.56.lhcb1-9cb9b/x86_64-centos7-gcc11-opt/lib/libPhotosCxxInterface.so.0 (0x00007f771e1e2000) libPhotosFortran.so.0 => /cvmfs/lhcb.cern.ch/lib/lcg/releases/MCGenerators/photos++/3.56.lhcb1-9cb9b/x86_64-centos7-gcc11-opt/lib/libPhotosFortran.so.0 (0x00007f771d911000)Binary file /cvmfs/lhcb.cern.ch/lib/lhcb/GAUSS/GAUSS_v56r2/InstallArea/x86_64_v2-centos7-gcc11-opt/lib/libEvtGen.so matchesBinary file /cvmfs/lhcb.cern.ch/lib/lhcb/GAUSS/GAUSS_v56r2/InstallArea/x86_64_v2-centos7-gcc11-opt/lib/libLbEvtGenLib.so matches libPhotosCxxInterface.so.0 => not found libPhotosFortran.so.0 => not foundBinary file /cvmfs/lhcb.cern.ch/lib/lhcb/GAUSS/GAUSS_v56r2/InstallArea/x86_64_v2-centos7-gcc11-opt/lib/libLbEvtGen.so matches...
Short update: I managed to get to the end, now make Gauss completes successfully.
With many limitations of course! The main one is that I completely disabled Photos. We need 3.56 and this is missing for ARM. So I simply disabled some build units, or worse I disabled the photos code calls in LbPhotos.cpp and similar.
The last item to debug was also the most bizarre: in my podman environment the USER env is not set... or maybe getenv does not work well. I changed one 'getenv("USER")' in AmpGen to 'AmpGen" in the 'Generated by' headers.
Let's iterate the discussion offline. Thanks to all for the help!
As mentioned offline, we have requested to have a build of LCG_102 ARM with all the versions of the generators as needed by Gauss identical to those used in the x86 build.
Hi @gcorti thanks. The ARM stack for LCG102b_LHCB_7 has now been built thanks to EP-SFT, and the configuration has been created thankd to @bcouturi and @clemenci.
I have restarted all of the machinery above. Using the new stack, the Gauss patches that are needed are much simpler, essentially there is only two minor patches for AmpGen,
!920 (merged) (I also reported one AmpGen issue upstream as https://github.com/GooFit/AmpGen/issues/30)
I confirm that make Gauss now builds succesfully for ARM. I have not tried any test yet however (and I do not really plan to as I am not sure I would be of much help there).
I am posting below (in the next comment) the full instructions I am using. Note anyway that this is what I use as branches:
cd Gaudi; git checkout master; cd - cd GitCondDB; git checkout master; cd - cd Detector; git checkout v1r2; cd - cd LHCb; git checkout valassi_sim10_aarch64; cd - cd Run2Support; git checkout master; cd - cd Geant4; git checkout v10r6p2t6; cd - cd Gauss; git checkout valassi_sim10_aarch64_lhcb7; cd -
Essentially, only these two MRs are still missing
In LHCb, merge valassi_sim10_aarch64 into sim10-patches: LHCb!3921 (merged)
In Gauss, merge valassi_sim10_aarch64_lhcb7 into master: !920 (merged)
Login ssh andreav@lbbuildinter02 bashCheck this is ok from .bashrc: env | grep XDG XDG_DATA_HOME=/home/andreav/.local/share XDG_CONFIG_HOME=/home/andreav/.config ...Check this is ok too podman login gitlab-registry.cern.chStart as per Marco's recipe, WITHOUT SPECIFYING LHCB_7 HERE! cd /home/andreav/ARM2022 git clone https://gitlab.cern.ch/lhcb/upgrade-hackathon-setup.git workspace echo platform=armv8.1_a-centos7-gcc11-opt >> workspace/configuration.mk echo LCG_VERSION=102b >> workspace/configuration.mk echo WITH_GITCONDDB = 1 >> workspace/configuration.mkAnd add my own patch echo PROJECTS += GitCondDB >> workspace/configuration.mkTo speed up the builds, copy the ccache from a previous build cp -dpr workspace.20230117b/.ccache/ workspaceStart the container with the tag! This is a new container that includes the latest HEP_OSlibs cd /home/andreav/ARM2022 podman run -ti --network host -v /cvmfs:/cvmfs:shared -v ${PWD}/workspace:/workspace:z -w /workspace gitlab-registry.cern.ch/lhcb-core/lbdocker/aarch64-centos7-build:2023-01-05Everything else is within the container.Then within the container as per Marco's instructions, WITHOUT SPECIFYING LHCB7! . /cvmfs/lhcb.cern.ch/lib/LbEnv lb-set-platform armv8.1_a-centos7-gcc11-opt export LCG_VERSION=102b git clone -b add-aarch64 https://gitlab.cern.ch/lhcb-core/lcg-toolchains.git git clone https://gitlab.cern.ch/lhcb-core/mirrors/Catch2.git -b v2.13.10 cmake -S Catch2 -B Catch2/build.$BINARY_TAG -DCMAKE_TOOLCHAIN_FILE=${PWD}/lcg-toolchains/LCG_${LCG_VERSION}/$BINARY_TAG.cmake -DCATCH_ENABLE_WERROR=NO -DCATCH_BUILD_STATIC_LIBRARY=YES -DCMAKE_INSTALL_PREFIX=${PWD}/Catch2/InstallArea/$BINARY_TAG -DBUILD_TESTING=NO -GNinja cmake --build Catch2/build.$BINARY_TAG --target install git clone https://gitlab.cern.ch/lhcb-core/mirrors/yaml-cpp.git -b yaml-cpp-0.7.0 cmake -S yaml-cpp -B yaml-cpp/build.$BINARY_TAG -DCMAKE_TOOLCHAIN_FILE=${PWD}/lcg-toolchains/LCG_${LCG_VERSION}/$BINARY_TAG.cmake -DCMAKE_INSTALL_PREFIX=${PWD}/yaml-cpp/InstallArea/$BINARY_TAG -DYAML_BUILD_SHARED_LIBS=ON -DBUILD_TESTING=NO -GNinja cmake --build yaml-cpp/build.$BINARY_TAG --target install git clone https://gitlab.cern.ch/lhcb-core/mirrors/DD4hep.git -b v01-23 ( . /cvmfs/sft.cern.ch/lcg/views/LCG_${LCG_VERSION}/aarch64-centos7-gcc11-opt/setup.sh cmake -S DD4hep -B DD4hep/build.$BINARY_TAG -DCMAKE_INSTALL_PREFIX=${PWD}/DD4hep/InstallArea/$BINARY_TAG -DBUILD_TESTING=NO -GNinja -DCMAKE_CXX_STANDARD=17 -DDD4HEP_USE_XERCESC=ON -DDD4HEP_USE_GEANT4=OFF -DDD4HEP_USE_TBB=ON -DDD4HEP_BUILD_PACKAGES="DDRec DDDetectors DDCond DDAlign" cmake --build DD4hep/build.$BINARY_TAG --target install )Download the projects git clone ssh://git@gitlab.cern.ch:7999/gaudi/Gaudi.git git clone ssh://git@gitlab.cern.ch:7999/lhcb/GitCondDB.git git clone ssh://git@gitlab.cern.ch:7999/lhcb/Detector.git git clone ssh://git@gitlab.cern.ch:7999/lhcb/LHCb.git git clone ssh://git@gitlab.cern.ch:7999/lhcb/Run2Support.git git clone ssh://git@gitlab.cern.ch:7999/lhcb/Geant4.git git clone ssh://git@gitlab.cern.ch:7999/lhcb/Gauss.gitFix the branches or tags in all packages cd Gaudi; git checkout master; cd - cd GitCondDB; git checkout master; cd - cd Detector; git checkout v1r2; cd - cd LHCb; git checkout valassi_sim10_aarch64; cd - cd Run2Support; git checkout master; cd - cd Geant4; git checkout v10r6p2t6; cd - cd Gauss; git checkout valassi_sim10_aarch64_lhcb7; cd -Build all packages make Gaudi make GitCondDB make Detector make LHCb make Run2Support make Geant4 make Gauss
Just one question for you and/or @gcorti, is this the config you are testing for a release, or are you using one with LCG101a? How do the instructions differ in that case? Are all LHCB_7 libraries available for ARM?
One more general comment. Using Marco's latest instructions above I have rebuilt the stack for ARM. Then I also tested it using the same HEP-SCORE setup that I used to evaluate the Geant 10.7 patch in Geant4#10 (closed)
cd Gauss wget https://gitlab.cern.ch/valassi/hep-workloads/-/raw/qa-build-lhcb-sim-run3/lhcb/sim-run3/lhcb-sim-run3/prodConf_Gauss_0bmk2023_00000726_1.py cat prodConf_Gauss_0bmk2023_00000726_1.py | sed -e "s/NOfEvents=5/NOfEvents=5/" > prodConf005evt.py PRODCONFROOT=/cvmfs/lhcb.cern.ch/lib/lhcb/DBASE/ProdConf/v3r5 PYTHONPATH=/cvmfs/lhcb.cern.ch/lib/lhcb/DBASE/ProdConf/v3r5/python ./run gaudirun.py -T '$APPCONFIGOPTS/Gauss/Beam6800GeV-mu100-2022-nu3.2.py' '$DECFILESROOT/options/10000000.py' '$LBPYTHIA8ROOT/options/Pythia8.py' '$APPCONFIGOPTS/Gauss/Run3-detector.py' '$APPCONFIGOPTS/Gauss/DataType-Upgrade.py' '$APPCONFIGOPTS/Gauss/G4PL_FTFP_BERT_EmOpt2.py' '$APPCONFIGOPTS/Persistency/Compression-ZLIB-1.py' prodConf005evt.py 2>&1 | tee out005evt_OLD_arm.log
This completes successfully. However the results are not bit by bit the same as for the same instructions on x86. If I compare the two logs with 5 events, this is what I get
In other words the "GiGaGetMainEvent INFO Number of extracted MCParticles/MCVertices" printouts are different. The numbers differ so much that I am tempted to imagine that different sets of random numbers are used somewhere in the algorithms... otherwise there is really something badly wrong/different.
Is it possible to print out all random numbers used everywhere?
Is it possible that different random numbers are used on x86 and arm, and why? Can I force them to be the same?
Vague idea, could it be that using x86-64_v2 switches on a vectorization of random numbers that results in different sequences? (I will try by rebuilding without the v2 just in case...)
Thanks Andrea
PS I tried x86_64-centos7-gcc11-opt and I get the same results bit by bit in the logs as x86_64_v2-centos7-gcc11-opt. So there is a problem specific to ARM, whether different random numbers or something more serious.
@valassi, just to clarify, are you running on x86 the exact same code - caveat the unavoidable differences for arms?
I am not so surprised that over a run of 5 events you see differences: even if the random numbers seed are the same the sequence is accessing registers in memory. We do see sometimes differences in the event evolution between opt and dbg. The memory layout can be different and as soon a different register is access because of the randomisation of a physics process this get reflected in what happens later and the particles we decide to keep in MCParticles may change.
Do do a fair comparison we need to run a few thousands events and compare them statistically.
@gcorti, yes unless I am doing something wrong (always a possibility!) I am running the exact same git tags of all projects for x86 and ARM. Ok if you are not surprised that there are differences, then this is not necessarily a bad sign. I agree with you that the full validation should be done statistically on thousands of events: @landerli had kindly suggested he might be able to help with this, but not necessarily on a short timescale.
Would you agree that we could still aim for a release on cvmfs (both x86 and ARM) and the inclusion in HEP-SCORE by mid-February of these ARM patches, even if the full physics validation has not been done yet? I agree that the ARM version would not be usable by LHC in production yet, but from the point of view of HEP-SCORE benchmarking I think that it's better to have an ARM-to-be-validated version rather than no ARM version at all.
Thanks to Gloria and Marco, Gauss v56r3 has been released on x86 and ARM. This can be closed. (Then in case a different issue can be opened for the large scale validation on ARM, I guess...)