Implement tool for managing samples
For running at full scale, we will need tools for running over different samples.
Starting from the user point-of-view, this would ideally be as simple as specifying a list of strings, e.g.
sample_list = ["ttH_M125", "tHq_M125", "Data"]
and then giving this to a class SampleManager
that creates a Sample
instance for each:
from higgs_dna.samples.sample_manager import SampleManager
sample_manager = SampleManager(
samples = sample_list,
years = ["2016", "2017", "2018"]
)
samples = sample_manager.produce()
where samples
is a list of Sample
objects that would contain:
Sample.files[year] # list of nanoAOD files for specified year
Sample.xs
Sample.scale1fb[year]
...
A Sample
object should also be able to specify when there are specific systematics/reweightings/etc that should be applied to this sample.
In practice, we could deal with this by creating a json
file with metadata about each sample, similar to the (https://github.com/cms-analysis/flashgg/blob/dev_legacy_runII/MetaData/data/cross_sections.json)[cross sections json in flashgg].
For a given sample, we might have an entry like:
"ttH_M125" : {
"xs" : XX, # pb
"files" : {
"2016" : [file1.root, file2.root, ...] # could be hard-coded
"2017" : "/ttH_M125/UL2017_production/NANOAODSIM" # or could provide DAS name and have a tool to look up file names
},
"systematics" : { # same construction as for any systematic
"tth_specific_theory_unc" : {
"type" : "event",
"method" : "from_branch",
...
}
}
}
Then, the list of Sample
objects can be given to a JobsManager
or similar, which will take care of setting up the sample-specific options (e.g. adding a sample-specific theory weight/unc to the SystematicsProducer
) and splitting these up into jobs.