MooreOnline
Getting started in the online network
To be able to run MooreOnline, (e.g. with the Online TestBeam GUI), connect to an appropriate machine in the online network. Then you can follow the instructions here to build the stack up to MooreOnline.
Note: Use one machine from the swdev cluster or
n4050101
to build. Since they have many cores, this will speedup the process compared toplus
nodes.
Problems with file permissions (click to expand)
Users permissions for "others" might be wrong, which prevents running as the online user (e.g. from the ECS). Fix them with commands such as
find . -perm -u=r -exec chmod a+r {} +
find . -perm -u=x -exec chmod a+x {} +
Even better, extend your ~/.bashrc
to set a better umask (permissions for new files)
# sane permissions for new files (group and others can read)
umask u=rwx,g=rx,o=rx
Running the TestBeam GUI
It is advisable to run the GUI from a plus
node.
If the files /dev/shm/bm_*
exist and they were made by another user, you need to either delete them
or find another plus node to run from (and stick to it).
To start the GUI, from your stack folder, do:
MooreOnline/run MooreOnline/MooreScripts/scripts/TestBeamGUI/start.sh MooreOnline/MooreScripts/tests/options/RecoMon/Arch.xml
The parameter to the script is the dataflow "architecture" file, which defines which tasks will run. The RecoMon/Arch.xml architecture refers to the main task startup script runRecoMon.sh, which further uses reco.py for the actual Gaudi job options. The input data is defined in RecoMon/OnlineEnv.opts. You can use RecoMon/OnlineEnvBase.py to customise the OutputLevel of the Gaudi task.
When running the GUI, check that the architecture and the scripts come from your directory.
Once the GUI is open, you first have to click on "Apply Parameters"
to unlocks all buttons and GUI commands.
Then you will be able to start the needed services for monitoring the application.
When starting services on the GUI: (click to expand)
For now, the interesting services to be started are:
- "Start" logSrv
- "Start" tmSrv
- "Start" logViewer
- "Start" mbmmon
⚠️ logSrv (LOGS) must be started before tmSrv!
⚠️ Do not press Ctrl+C in the logViewer (it will close)
Using "Commands" from the Controller: (click to expand)
-
To start the application, on the GUI, do, in this order:
- "launch"
- "load"
- "configure"
- "start"
- Check the GUI state transitions.
- Check all transitions with the
logViewer
. - Then check in
mbmmon
if events are flowing. - You can also use
AutoStart
to do all of the above automatically.
-
To finalize the application, on the GUI, do, in this order:
- "stop"
- "reset"
- "unload" (?)
- "destroy"
- You can also use
AutoStop
to do all of the above automatically. - The GUI shall end in a
DEAD
state.
-
If the application runs into an error:
- The GUI will go into a
ERROR
state. - "recover" option will become available at the GUI Controller.
- After using "recover" you will be able to finalize the application correctly.
- The GUI will go into a
Task outputs
Besides the live log viewer, you can get some outputs from the task in its working directory.
A new working directory is created next to the Arch.xml
file every time you run start.sh
. A symlink latest
is updated for convenience to the latest directory. For example
ls -l MooreOnline/MooreScripts/tests/options/RecoMon/output/latest/
shows that you can access the job environment (.env
), job options (.opts
, .dump
),
output (.log
) and histogram savesets (Savesets
):
drwxr-xr-x 3 rmatev hlt 4.0K 5 oct 15:34 Savesets
-rw-r--r-- 1 rmatev hlt 62K 5 oct 15:37 Controller.env
-rw-r--r-- 1 rmatev hlt 62K 5 oct 15:37 MBM_0.env
-rw-r--r-- 1 rmatev hlt 62K 5 oct 15:37 MDFProd_0.env
-rw-r--r-- 1 rmatev hlt 637K 5 oct 15:38 RecoMon_0.dump
-rw-r--r-- 1 rmatev hlt 62K 5 oct 15:37 RecoMon_0.env
-rw-r--r-- 1 rmatev hlt 77K 5 oct 16:06 RecoMon_0.log
-rw-r--r-- 1 rmatev hlt 110K 5 oct 15:37 RecoMon_0.opts
Inspecting counters and histograms
From the node where you started the GUI, you can inspect counters and histograms while the application is RUNNING
with the use of taskCounters.exe
and taskHistos.exe
. For that, you will need to pass the dns
for the machine from where you are running the GUI and the task
for which you want to inspect counters or histograms. In the example bellow, the dns
for the machine is PLUSCC03
; just be sure to change PLUSCC03
for the corresponding dns
for the machine from where you are running the GUI.
To list the tasks for the counters you can do:
Online/run taskCounters.exe -dns=PLUSCC03 # lists tasks
This will give you a list of the current running tasks. To inspect the counters for a given task, such as TESTBEAMGUI_PLUSCC03_Moore_0
, then do:
Online/run taskCounters.exe -dns=PLUSCC03 -task=TESTBEAMGUI_PLUSCC03_Moore_0
To inspect histograms for a given task, one can do:
Online/run taskHistos.exe -dns=PLUSCC03 -task=TESTBEAMGUI_PLUSCC03_Moore_0
Online/run taskHistos.exe -dns=PLUSCC03 -task=TESTBEAMGUI_PLUSCC03_Moore_0 -show
# root [0] gHistos.size()
# (unsigned long) 2
# root [1] gHistos[0]->Draw()
HLT2 throughput test
First get the input data at
kinit user@CERN.CH
mkdir $XDG_RUNTIME_DIR/UpgradeHLT1FilteredWithGEC
xrdcp root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP2/Hlt2Throughput/minbias_filtered_gec11000_1.mdf - > $XDG_RUNTIME_DIR/UpgradeHLT1FilteredWithGEC/data.mdf
# \time ./run mdfwriter.py --db-entry UpgradeHLT1FilteredWithGEC -o $XDG_RUNTIME_DIR/UpgradeHLT1FilteredWithGEC/data.mdf
Start the GUI and get it running.
MooreOnline/run MooreOnline/MooreScripts/scripts/TestBeamGUI/start.sh MooreOnline/MooreScripts/tests/options/HLT2/Arch.xml
While it's running, measure the throughput
MooreOnline/run python MooreOnline/MooreScripts/scripts/TestBeamGUI/measure_throughput.py
See below how to measure throughput in an automated non-interactive way.
Non-interactive test bench
HLT1
Here is an example of how to run HLT1 on two MEPs and write the output to a file.
MooreOnline/run MooreOnline/MooreScripts/scripts/testbench.py --working-dir=hlt1slim MooreOnline/MooreScripts/tests/options/HLT1Slim/Arch.xml --test-file-db-key=2022_mep_253895 --hlt-type=hlt1_pp_no_gec_no_ut --tfdb-nfiles 2 --measure-throughput=0
HLT2
The automatic LHCbPR throughput test of HLT2 uses the testbench. You can run in locally with
MooreOnline/run bash -c '$PRCONFIGROOT/scripts/benchmark-scripts/MooreOnline_hlt2_pp_default.sh'
Online system tips
Temporary files from HLT1 storage
As of May 2023, at most 5 MDFs from the beginning of each run are stored in
/calib/online/tmpHlt1Dumps/LHCb/<run-number>
Those files will become available only after the run is stopped, and deleted after one week. If you need different or (slightly) more files from a run, follow the instructions below.
Copy data from HLT1 storage
Only use this with small amounts of data for testing!
MooreScripts/scripts/get_hlt1_data.py 238820 /scratch/rmatev/
The script is standalone so it can be used without the stack.