Preprocessing plots - set default to plotting the whole sample
The number of jets that are plotted during the preprocessing can be set in the preprocessing config like here.
However, I think we should set the default to "all jets plotted", since plotting only a fraction of the sample can be misleading:
When I ran a preprocessing with PDF sampling, targeting the b-jets pT distribution, the resulting plot for njets_to_plot = 3e5
looks like that:
There are two problems here:
-
We only see b-jets and bb-jets. Light-flavour jets and c-jets are missing. The reason for this is that the first 300_000 jets in the sample are either b-jets or bb-jets. (fixed with !558 (merged)) -
The pT distribtions do not match, even though they should (since this is after resampling). The reason here is again the rather coarse shuffling that has happened at this point. So plotting the first N jets of the sample is not representative.
Plotting the whole file resolves both problems (I haven't tried with the scripts from the preprocessing, but I expect the outcome to be the same).