Some ideas to address the high memory usage in FunTuple

Following the discussion in DPA-WP3, I am creating this issue.

High memory usage in FunTuple was first observed at Rec#365. When a user creates a very large FunTuple, memory usage can rise to over 7 GB. This was initially suspected to be a memory leak at DTF.

After addressing the memory leak at DTF, the issue of high memory usage persisted in large FunTuple production. It was then identified as an issue with Functor JIT compilation, as detailed at LHCb#299 (comment 6645917).

Although the idea behind FunTuple is to tuple only the variables used for analysis, in practice, users often tuple all possible values. This leads to the production of more than 10 tuples with over 500 branches in a single DaVinci job (5k in total). Since each branch is an independent functor in FunTuple, it is reasonable to expect significant resources to be used in JIT to compile all the required functors (on the order of 10^3).

Ideally, during JIT compilation, repeated functors should only be compiled once. This caching mechanism relies on the functor's hash code, computed through the str expression of the functor. If two functors have different hash codes, they both need to be compiled, otherwise, they are treated as the same functor. Thus, we do not need to compile on the order of 10^3 functors if most of them are repeated.

However, certain functors that contain hard-coded properties, such as F.TES(<LOCATION>) or F.VALUE_FROM_DICT(<KEY>), will disrupt this cache. Different <LOCATION> and <KEY> values lead to different str expressions, so the composite functors must be compiled repeatedly.

One proposed solution to this problem is to create the JIT functor cache before submitting the job, uploading it to CVMFS or directly to the grid so that it can be used directly during the DaVinci job without recompiling functors. However, this approach is well-known to be difficult to implement because not all users have access to upload cache files into CVMFS, and packaging the compiled cache file into a grid job is not trivial.

A more direct solution, proposed by this issue, is to try to avoid the situation by reducing the number of uncachable functors in DaVinci jobs. The following ideas are proposed:

Avoid the use of functors that alter the cache hash, such as F.VALUE_FROM_DICT, and instead use FunTuple to handle more complex output types like std::map<string, ...>. For instance, in the case of TIS/TOS, this would significantly reduce the number of functors we need to compile by compiling a single functor for all selection lines, instead of one for each. The same logic can be applied to other functors, such as using F.P instead of three different functors (F.PX, F.PY, F.PZ).
Extend FunTuple to support extra input for configuring a functor. For example, we could create a new functor F.APPLY_TES that receives a str TES location as input and returns a F.TES(CORRESPOND_LOCATION). This way, we could do the following:

{
  "ANY_RESULT1": (F.ANYFUNCTOR @ F.APPLY_TES, LOCATION_1),
  "ANY_RESULT2": (F.ANYFUNCTOR @ F.APPLY_TES, LOCATION_2)
}

In the FunTuple backend, the functor Func = F.ANYFUNCTOR @ F.APPLY_TES would be registered by the Functor Factory as a single functor.

For more complex situations:

case 1:

# traditionally
A = FUNC.bind(TES(<location>), FORWARDARGS)
# then in cpp we call
#   A(LHCb::Particle)

# newly
B = FUNC.bind(APPLY_TES, APPLY(FORWARDARGS))
# then in cpp we call
#   B(<location>)(LHCb::Particle)

case 2:

# traditionally
C = FUNC.bind(TES(<location2>), FUNC.bind(TES(<location1>), FORWARDARGS))
# then in cpp we call
#   C(LHCb::Particle)

# newly
D = FUNC.bind(APPLY_TES, APPLY(FUNC1.bind(APPLY_TES, APPLY(FORWARDARGS))))
# then in cpp we call
#   D(<location1>)(<location2>)(LHCb::Particle)

Where APPLY(<AnyFunctor>) is a Functor that receives anything as input and returns <AnyFunctor>

Using these types of functors that return a new functor, we should be able to use any extra inputs to configure very complex composite functors. We can use Python classes to hide these mechanics from users, such as automatically handle and replace F.TES with F.APPLY_TES

I am opening this issue to discuss these new ideas. Please note that such implementations can be designed to affect FunTuple exclusively, without affecting HLT lines, making this a topic for DPA discussion.

FYI: @amathad @erodrigu @ahennequ

Please notify more individuals who might be interested in this discussion.

Edited Feb 07, 2024 by Jiahui Zhuo