Implement an analysis manager class
Running an analysis consists of specifying:
- list of samples
- list of systematics to consider (which may vary by sample)
- a tag sequence
- details of output file: variables of interest (
m_gg,m_jj, etc.) and output format (e.g.parquet) - job submission details
I think it would be nice to have an Analysis or similarly named class, which is constructed from the 5 inputs above and does Analysis.run() to run an entire analysis.
For example, we might have:
sample_list = ["ttH_M125", "tHq_M125", "Data"]
years = ["2016", "2017", "2018"]
samples = SamplesManager(
sample_list = sample_list,
years = years
)
syst_options = {
"weights" : {
"dummy_theory_sf" : {
...
"independent_collections" : {
...
}
tag_sequence = TagSequence(
tag_list = [
diphoton_tagger,
[tth_tagger, thq_tagger]
]
)
jobs_manager = JobsManager(
batch = "local",
n_events_per_output = 10**6
)
analysis = Analysis(
samples = samples,
systematics = syst_options,
tag_sequence = tag_sequence,
variables_of_interest = ["m_gg", "m_jj"],
output_format = "parquet",
jobs_manager = jobs_manager
)
analysis.run()
where analysis.run() would do the following:
- Go through each
Sampleinsamplesand- construct the function to run the systematics + tag sequence (we may have e.g. different systematics for different samples)
- add jobs to the
jobs_managerfor eachSample, taking into account the specific function for this sample
- Submit jobs through the
JobsManager - Monitor jobs and record their metadata. At a very basic level, this would simply be checking whether the job succeeded. If a job succeeds, it would also be useful to record physics information about this job: how many events were processed, what are the efficiency of each
Tagger's selections (and perhaps the efficiency of each cut of eachSelectionof eachTagger), summary information about the systematics: what are the mean/std dev of each systematic, etc - Post-process: once a large enough fraction of jobs have finished (need 100% for data, but not strictly necessary for MC), merge outputs and update
scale1fbaccording to the processed number of events for each sample. - Summarize: print out summary info and write a
jsonwith high-level info. This would entail properly merging the metadata returned by each job.