Skip to content
Snippets Groups Projects

MooreOnline

Getting started in the online network

To be able to run MooreOnline, (e.g. with the Online TestBeam GUI), connect to an appropriate machine in the online network. Then you can follow the instructions here to build the stack up to MooreOnline.

Note: Use one machine from the swdev cluster or n4050101 to build. Since they have many cores, this will speedup the process compared to plus nodes.

Problems with file permissions (click to expand)

Users permissions for "others" might be wrong, which prevents running as the online user (e.g. from the ECS). Fix them with commands such as

find . -perm -u=r -exec chmod a+r {} +
find . -perm -u=x -exec chmod a+x {} +

Even better, extend your ~/.bashrc to set a better umask (permissions for new files)

# sane permissions for new files (group and others can read)
umask u=rwx,g=rx,o=rx

Running the TestBeam GUI

It is advisable to run the GUI from a plus node. If the files /dev/shm/bm_* exist and they were made by another user, you need to either delete them or find another plus node to run from (and stick to it).

To start the GUI, from your stack folder, do:

MooreOnline/run MooreOnline/MooreScripts/scripts/TestBeamGUI/start.sh MooreOnline/MooreScripts/tests/options/RecoMon/Arch.xml

The parameter to the script is the dataflow "architecture" file, which defines which tasks will run. The RecoMon/Arch.xml architecture refers to the main task startup script runRecoMon.sh, which further uses reco.py for the actual Gaudi job options. The input data is defined in RecoMon/OnlineEnv.opts. You can use RecoMon/OnlineEnvBase.py to customise the OutputLevel of the Gaudi task.

When running the GUI, check that the architecture and the scripts come from your directory. Once the GUI is open, you first have to click on "Apply Parameters" to unlocks all buttons and GUI commands. Then you will be able to start the needed services for monitoring the application.

When starting services on the GUI: (click to expand)

For now, the interesting services to be started are:

  • "Start" logSrv
  • "Start" tmSrv
  • "Start" logViewer
  • "Start" mbmmon

⚠️ logSrv (LOGS) must be started before tmSrv!

⚠️ Do not press Ctrl+C in the logViewer (it will close)

Using "Commands" from the Controller: (click to expand)
  • To start the application, on the GUI, do, in this order:

    • "launch"
    • "load"
    • "configure"
    • "start"
    • Check the GUI state transitions.
    • Check all transitions with the logViewer.
    • Then check in mbmmon if events are flowing.
    • You can also use AutoStart to do all of the above automatically.
  • To finalize the application, on the GUI, do, in this order:

    • "stop"
    • "reset"
    • "unload" (?)
    • "destroy"
    • You can also use AutoStop to do all of the above automatically.
    • The GUI shall end in a DEAD state.
  • If the application runs into an error:

    • The GUI will go into a ERROR state.
    • "recover" option will become available at the GUI Controller.
    • After using "recover" you will be able to finalize the application correctly.

Task outputs

Besides the live log viewer, you can get some outputs from the task in its working directory. A new working directory is created next to the Arch.xml file every time you run start.sh. A symlink latest is updated for convenience to the latest directory. For example

ls -l MooreOnline/MooreScripts/tests/options/RecoMon/output/latest/

shows that you can access the job environment (.env), job options (.opts, .dump), output (.log) and histogram savesets (Savesets):

drwxr-xr-x 3 rmatev hlt 4.0K  5 oct 15:34 Savesets
-rw-r--r-- 1 rmatev hlt  62K  5 oct 15:37 Controller.env
-rw-r--r-- 1 rmatev hlt  62K  5 oct 15:37 MBM_0.env
-rw-r--r-- 1 rmatev hlt  62K  5 oct 15:37 MDFProd_0.env
-rw-r--r-- 1 rmatev hlt 637K  5 oct 15:38 RecoMon_0.dump
-rw-r--r-- 1 rmatev hlt  62K  5 oct 15:37 RecoMon_0.env
-rw-r--r-- 1 rmatev hlt  77K  5 oct 16:06 RecoMon_0.log
-rw-r--r-- 1 rmatev hlt 110K  5 oct 15:37 RecoMon_0.opts

Inspecting counters and histograms

From the node where you started the GUI, you can inspect counters and histograms while the application is RUNNING with the use of taskCounters.exe and taskHistos.exe. For that, you will need to pass the dns for the machine from where you are running the GUI and the task for which you want to inspect counters or histograms. In the example bellow, the dns for the machine is PLUSCC03; just be sure to change PLUSCC03 for the corresponding dns for the machine from where you are running the GUI.

To list the tasks for the counters you can do:

Online/run taskCounters.exe -dns=PLUSCC03  # lists tasks

This will give you a list of the current running tasks. To inspect the counters for a given task, such as TESTBEAMGUI_PLUSCC03_Moore_0, then do:

Online/run taskCounters.exe -dns=PLUSCC03 -task=TESTBEAMGUI_PLUSCC03_Moore_0

To inspect histograms for a given task, one can do:

Online/run taskHistos.exe -dns=PLUSCC03 -task=TESTBEAMGUI_PLUSCC03_Moore_0
Online/run taskHistos.exe -dns=PLUSCC03 -task=TESTBEAMGUI_PLUSCC03_Moore_0 -show
# root [0] gHistos.size()
# (unsigned long) 2
# root [1] gHistos[0]->Draw()

HLT2 throughput test

First get the input data at

kinit user@CERN.CH
mkdir $XDG_RUNTIME_DIR/UpgradeHLT1FilteredWithGEC
xrdcp root://eoslhcb.cern.ch//eos/lhcb/wg/rta/WP2/Hlt2Throughput/minbias_filtered_gec11000_1.mdf - > $XDG_RUNTIME_DIR/UpgradeHLT1FilteredWithGEC/data.mdf
# \time ./run mdfwriter.py --db-entry UpgradeHLT1FilteredWithGEC -o $XDG_RUNTIME_DIR/UpgradeHLT1FilteredWithGEC/data.mdf

Start the GUI and get it running.

MooreOnline/run MooreOnline/MooreScripts/scripts/TestBeamGUI/start.sh MooreOnline/MooreScripts/tests/options/HLT2/Arch.xml

While it's running, measure the throughput

MooreOnline/run python MooreOnline/MooreScripts/scripts/TestBeamGUI/measure_throughput.py

See below how to measure throughput in an automated non-interactive way.

Non-interactive test bench

HLT1

Here is an example of how to run HLT1 on two MEPs and write the output to a file.

MooreOnline/run MooreOnline/MooreScripts/scripts/testbench.py --working-dir=hlt1slim MooreOnline/MooreScripts/tests/options/HLT1Slim/Arch.xml --test-file-db-key=2022_mep_253895 --hlt-type=hlt1_pp_no_gec_no_ut --tfdb-nfiles 2 --measure-throughput=0

HLT2

The automatic LHCbPR throughput test of HLT2 uses the testbench. You can run in locally with

MooreOnline/run bash -c '$PRCONFIGROOT/scripts/benchmark-scripts/MooreOnline_hlt2_pp_default.sh'

Online system tips

Temporary files from HLT1 storage

As of May 2023, at most 5 MDFs from the beginning of each run are stored in

/calib/online/tmpHlt1Dumps/LHCb/<run-number>

Those files will become available only after the run is stopped, and deleted after one week. If you need different or (slightly) more files from a run, follow the instructions below.

Copy data from HLT1 storage

Only use this with small amounts of data for testing!

MooreScripts/scripts/get_hlt1_data.py 238820 /scratch/rmatev/

The script is standalone so it can be used without the stack.