Speed up qm test by 10 minutes

assigned to @sstahl

assigned to @sponce and unassigned @sstahl

I wonder how much time we could gain by improving the validator of the tests...

changed the description

added RTA label

approved this merge request

/ci-test

Started integration test build. Once done, check the results or the comparison to a reference build.

Throughput Test Moore_hlt1_pp_default: 28392.2 Events/s -- change of -0.20% vs. reference

Throughput Test Moore_hlt1_pp_default: 28479.3 Events/s -- change of 0.10% vs. reference

Throughput Test Moore_hlt2_fastest_reco: 491.9 Events/s -- change of 0.31% vs. reference

Throughput Test Moore_hlt2_fastest_reco: 490.9 Events/s -- change of 0.12% vs. reference

Throughput Test Moore_hlt2_pp_thor: 278.8 Events/s -- change of 0.09% vs. reference

As expected, execution time of the test went from 1462 to 103 seconds.

Worth it indeed ! For info after discussion with @clemenci we have a nice idea to fix that everywhere : the main point is that the time is not spent in the validation but actually in the diffing with the reference. So we propose a simple approach : replace at the validation level every line of more than 200 chars with a line of type : "Very long line, original content has been removed. Original checksum : 0x...". This way the diff will be fast and still fail when content has changed.

@sponce, do you mean that the actual ref file will contain "Very long line..." instead of the original content?

Actually this test does not do a reference check, so for tests where only error lines are counted we might need a different solution.

That solution will work too actually. Maybe I was not really clear I must say, by "validation", I meant preprocessing, before counting the errors in this case.

And to answer @chasse, yes the ref would have "Very long..." as we preprocess the refs. So indeed we would lose the original content. Now it it's a problem, people can just have more decent line size and problem will be sorted out

@sponce I don't think replacing output in a ref with a checksum is a good idea, it makes it impossible to know which value changed.
Dropping a detailed diff in favor of just saying there is a diff somewhere in that long line, is fine, but let's not replace the output.

Wait a second, we do not drop anything in case people behave properly ! We only protect the system against abnormal cases where lines are insanely long. In these cases, the drop of the "detailed diff" is actually an incentive to go back to decent logs. We can always discuss on the line length limitation of course.

Just to give ideas of what I call "insane". In the case of this MR, the 4 longest lines were of length 4095, 4095, 2143 and 941 characters. So a limit at 500 chars is probably fine, although such lines are already non sense.

If you overwrite the content with a checksum, and something changes, how is one then supposed to debug this? IMHO this really is an impossible situation to put user into.

Wait a second, we do not drop anything in case people behave properly

It's not helpful to just point at someone and say "well their own fault". While 4k chars is long, and probably not very useful, it's a bit crazy that is snowballs into crazy validator times. So, this should somehow be fixed on the internal side, instead of just blaming the user.

IMHO this really is an impossible situation to put user into.

In my opinion, the user has put her/himself in this situation by not respecting the (not so well defined) contract of the logs. Coming back to this in a second

this should somehow be fixed on the internal side

Is there anything to be fixed ? Other than refusing bad behaviors ?

Now I have to come back to the (badly defined) contract. To me, text log is by definition supposed to be handled by a human being, and that's what I call the "contract". Otherwise we would use binary logs, that's what computers understand. The fact that logs are handled by humans is actually proven by the very existence of diffs. Now this "contract" is, I believe, where we start disagreeing. Otherwise, human limitations make lines of 4K non sense and thus my position kind of obvious.

So what's your position here ? Should logs be machine targeted ? But if yes, does a diff make sense at all ? I'm not sure I understand how humans are supposed to deal with a amount of information they cannot ingest.

Last comment : if your answer will be "the line may be long, but the diff will be small, as only a portion will change", then great, we do not have a problem anymore ! Let's call it a single long line if you want, but let's make sure there are regularly '\n' characters in it ;-)

not respecting the (not so well defined) contract of the logs.

There was never any indication to users that a line length limit exists or should be respected.

And again:

If you overwrite the content with a checksum, and something changes, how is one then supposed to debug this?

You can't just replace the content and thus break the whole point of comparing to actual values.

There was never any indication to users that a line length limit exists or should be respected.

Indeed, but when people push it too much, it's slow. It works but it's slow

So that I can understand why it's so important to have such very very very long lines, could you point me to an example where they make sense and are needed ?

My 2cts on this subject.

Fact: the algorithm we are using to compare stdout with reference is extremely slow when there are very long lines (we never noticed the problem until we started to have extremely long lines)

Changing the algorithm as it was suggested some time ago is not practical so we have to solve test validation slowness in another way.

What seems more practical is to preprocess the output to reduce the length of those lines, for example by truncating them (the use of a hash allows us to detect changes in the dropped part). Indeed you are blind to the actual diff, but you can (yes, with a bit of pain) find out what the real change is... and if the test is actually meant to compare the value of the long line, you should stop abusing stdout and write your data to a file, then you use the custom validator feature to read the file back and check that it makes sense.

Another way to shorten the lines is to inject artificial line breaks, but I believe it does not fit that well the API of the stdout preprocessor (it can modify lines but not emit multiple lines from a single line of input IIRC). Even if it was practical to go in this direction it still gives the wrong message that comparing stdout with a reference file is a good way to write tests (it is an acceptable way to write tests when you do not have better options, but we do have better options).

Said that, as @sstahl pointed out, this test only checks if there are warnings or errors, so an output level < WARNING is not useful to the test (yes, it helps only the person trying to understand the problem)... but most important the diff step is not run, so the slowness must come from somewhere else that I'll have to debug at some point.

I'm already digging into it, the slowdown comes from applying the regex of the lineskipper to very long lines. Currently checking if some of those problematic regexes can be improved/removed.

Thanks!

checking if some of those problematic regexes can be improved/removed

we could also decide not to apply the regex to long lines... but we will still have issues trying to generate the minimal diff.

I stick to my opinion that long lines do not belong in the stdout diff.

fwiw I agree with @chasse approach to this problem & his above opinions

I would add a note of caution against mixing technical questions with process management questions. Technically long lines in the stdout are nonsensical, however as @chasse crucially points out "There was never any indication to users that a line length limit exists or should be respected." so we cannot very well blame people for sometimes generating them. If we think they should not be allowed then the proper way forward is to give clearer instructions to developers and at the same time to set up additional tests which would catch such lines. Which of course comes with maintenance overhead, as always. Simply turning them into a hash is not in my view a desirable process, whatever its technical benefits.

Criticizing is easy, art is difficult. In other terms, what do you propose practically ?

@chasse has made a nice job in dropping useless regexps, and @sstahl has fixed the most problematic test by removing the unused long needs, so the problem is less critical right now. But nevertheless it may come back.

Practically I think you can choose to either treat these problems as they arise or if you believe this is enough of a maintainer quality of life problem you give clear guidelines to developers that long lines are forbidden in reference files and add a test to check that nobody commits reference files with such long lines. Or if this is now mostly a solved problem you can open an issue to look into a more long-term solution once we have fewer immediate burning fires to put out with respect to datataking. Since you've now been the on-duty maintainer for a while you're clearly best-placed to judge how real of a problem in day-to-day work this actually is.

Rate test Moore_spruce_rate_and_size: 21161.0 Hz - change of 0.00% vs. reference. No new line is added, 32 lines have a rate of 0 Hz and 33 have a rate larger than 500 Hz.

I'm tempted to open an issue indeed. In theory, it's the right thing to do.

In practice, my experience says that if you want to forget something, issues are THE way to go. It's ideal to have your conscience released and be certain at the same time that nothing will ever be done (in the sense done thanks to the issue, not done at all).

So I've opened an issue (LHCb#238), but also wrote it down in my ToDo list. And when I'll do it, I won't remember the issue anymore and leave it open forever, as usual :-)

It's not the destination, it's the issues you acquire along the way

Rate test Moore_hlt2_rate_and_size: 308780.0 Hz - change of 0.00% vs. reference. No new line is added, 353 lines have a rate of 0 Hz and 88 have a rate larger than 1000 Hz.

added ci-test-triggered label

[2022-07-14 17:54] Validation started with lhcb-master-mr#5055

resolved all threads

merged

mentioned in commit 058b6692

resolved all threads

mentioned in issue #456 (closed)

mentioned in issue LHCb#252 (closed)

Speed up qm test by 10 minutes

Merged by Sebastien Ponce 2 years ago (Jul 15, 2022 6:36am UTC) 2 years ago

Activity

Speed up qm test by 10 minutes

Merge request reports

Merged by Sebastien Ponce 2 years ago (Jul 15, 2022 6:36am UTC) 2 years ago

Activity