Major tuning

3e7c1e4e · Guido Sterbini · 10700125 · 10700125 · 3e7c1e4e · 10700125
Commit 3e7c1e4e authored 4 years ago by Guido Sterbini
--- a/examples/001_example/001.py
+++ b/examples/001_example/001.py
-# %%
-"""
-### Introduction
-
-See https://codimd.web.cern.ch/p/0QX9ebi1bn#/ for the latest version.
-
-Our community is often confronted with the need of running complex algorithms for a set of different input.
-E.g. a DA computation with tune scan + beam-beam + errors.
-
-This implies to stage the algorithm in different steps corresponding, sometimes, to different codes (MADX, SixTrack,...) and/or different hardware (local CPU, GPU, HTCondor/LSF clusters, BOINC...).
-
-The topic of this brainstorming is to discuss about a python package that could convey a **standard** approach in order to
-
- avoid re-inventing the wheel each time, 
- improve the way we share our work-flow for the different simulations,
- provide a standard way to babysitting the simulations and postprocess the output.
-
-Clearly the package can be integrated with other solutions (see next [presentation]()).
-
-The challenge here is to maintain a good balance between simplicity (to be user-friendly) and flexibility (to cover a large gamut of use cases).
-
-You can find at https://gitlab.cern.ch/abpcomputing/sandbox/tree_maker a proposal.
-We are going first to present its rationale (a bit abstract, 5 min) and then explore together a simple example (pragmatic and complementary to the first part, 15 min).
-
-
-### Rationale
-
-The general way to describe our problem (running a staged algorithm for a set of different input) is to associate a **job** for each stage and input.
-
-A job can be represented as a **node** in a **graph** (nodes connected with edges).
- 
-The main idea is to downscale the problem of a generic graph to a simpler graph, a **tree**.
-
-A **tree** is a simplified [**DAG**](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (Directed Acycled Graphs) where each node can have maximum one parent.
-The tree is convenient since it can be directly mapped into a file system (the folder stucture of a file system is a tree).
-
-In python a tree can be represented, for example, with the `anytree` package (see [000_example](https://gitlab.cern.ch/abpcomputing/sandbox/tree_maker/-/blob/master/examples/000_example/000.ipynb)). 
-
-The `anynode` object of the `anytree` package can be generalized to any class.
-Indeed we generalized it to our `NodeJob` class, inheriting all the methods/attributes of `anynode`, e.g., root, parent, children, ancestors, siblings, leaves, depth, height, searching/filtering methods... 
-
-The main ideas is that each node of our simulation tree 
-
-1. is a instance of the `NodeJob` (extending the `anytree`).
-2. refers to a **template node** (example a MadX mask): `NodeJob.template_path`
-3. has a specific dictionary of input, `NodeJob.dictionary`
-4. is mapped to a file system, `NodeJob.path`
-5. has a specific submit command, `NodeJob.submit_command`
-6. has a specific log file, `NodeJob.log_path`
-
-
-The users should spend 99% of their time on the physics (the templates, each template is well "isolated" for a deep understanding of its physics), and use the package to build/orchestrate the tree.
-
-#### Building of the tree
-The building of the tree is done in three steps:
- istantiating the nodes
- **cloning** (i.e. copying) the templates on the NodeJob.path
- **mutating** (i.e. changing) the input of the template with the info in the NodeJob.dictionary
-
-
-#### Orchestrating the tree
-
-Each node can be run (refers to NodeJob.submit_command) and logged (NodeJob.submit_command).
-One can orchestrate the simulation but writing and reading in the different log.
-
-We will show now a simple example to clarify all these ingredients.
-In this way we can factorize the physics (the template), the parameters (the dictionary), the folder (JobNode.path) but maintaining for all nodes the very same interface (`JobNode`).
-
-
-
-
-### Simple example ([001_example](https://gitlab.cern.ch/abpcomputing/sandbox/tree_maker/-/blob/master/examples/001_example/001.ipynb))
-
-
-Let aussume that we need to make this computation
-
-$\sqrt{|(a+b)\times c|}$
-
-and we want to compute the standard deviation of the result assuming that a, b and c are normal distributed independent variables. Clearly the problem is quite naive but we want to address it as if we will need a cluster to solve it. 
-
-For example, we can partition the problem in three conscutive stages
-
-1. A sum: $(a+b)$
-2. A multiplication of the result 1 with c: $(a+b)\times c$
-3. A sqrt of the result of 2: $\sqrt{|(a+b)\times c|}$
-
-For each stage we build a template.
-Documentation (only started, you need to be on GPN) can be found at https://acc-py.web.cern.ch/gitlab/abpcomputing/sandbox/tree_maker/docs/master/. 
-"""
-
-# %%
-import tree_maker
-from tree_maker import NodeJob
-
-# %%
-# Clearly for this easy task on can do all in the very same python kernel
-# BUT here we want to mimic the typical flow
-# 1. MADX for optics matching/error seeding
-# 2. Tracking for FMA and or DA studies
-# 3. simulation baby-sitting and
-# 4. postprocessing
-
-import numpy as np
-a=np.random.randn(4)
-b=np.random.randn(4)
-c=np.random.randn(2)
-
-my_list_original=[]
-for ii in c:
-    my_list_original+=list(np.sqrt(np.abs((a+b)*ii)))
-my_list_original=sorted(my_list_original)
-
-# %%
-"""
-#### The root of the tree 
-"""
-
-# %%
-#root
-root = NodeJob(name='root', parent=None)
-root.path = '/home/jovyan/local_host_home/CERNBox/2021/tree_maker/examples/001_example/study_000'
-root.template_path = root.path + '/../templates'
-root.log_file = root.path + "/log.yaml"
-
-# %%
-"""
-#### First generation of nodes
-"""
-
-# %%
-#first generation
-for node in root.root.generation(0):
-    node.children=[NodeJob(name=f"{child:03}",
-                           parent=node,
-                           path=f"{node.path}/{child:03}",
-                           template_path = root.template_path+'/sum_it',
-                           submit_command = f'python run.py',
-                           log_file=f"{node.path}/{child:03}/log.yaml",
-                           dictionary={'a':float(a[child]), 
-                                       'b':float(b[child])
-                                      })
-                   for child in range(len(a))]
-
-# To combine different lists one can use the product or the zip functions    
-#import itertools
-#[[i, j, z] for i, j, z in itertools.product(['a','b'],['c','d'],[1,2,3])]
-#[[i, j, z] for i, j, z in zip(['a','b'],['c','d'],[1,2,3])]
-root.print_it()
-
-# %%
-"""
-#### Second generation of nodes
-"""
-
-# %%
-#second generation
-for node in root.root.generation(1):
-    node.children=[NodeJob(name=f"{child:03}",
-                           parent=node,
-                           path = f"{node.path}/{child:03}",
-                           template_path = root.template_path+'/multiply_it',
-                           submit_command = f'python run.py',
-                           log_file=f"{node.path}/{child:03}/log.yaml",
-                           dictionary={'c': float(c[child])})
-                   for child in range(len(c))]
-root.print_it()
-
-# %%
-"""
-#### Third generation of nodes
-"""
-
-# %%
-#third generation
-for node in root.root.generation(2):
-    node.children=[NodeJob(name=f"{child:03}",
-                           parent=node, 
-                           path = f"{node.path}/{child:03}",
-                           template_path = root.template_path+'/square_root_it',
-                           submit_command = f'python run.py',
-                           log_file=f"{node.path}/{child:03}/log.yaml",
-                           dictionary={'log_file': f"{node.path}/{child:03}/log.yaml"})
-                           for child in range(1)]
-root.print_it()
-
-# %%
-# we can inspect the data structure
-root.children[3].children[1].children[0].submit_command
-
-# %%
-# or we can modify the attributes of the tree
-if False:
-    for i, node in enumerate(root.leaves):
-        if i>3:
-            print(i)
-            node.submit_command = f'condor_submit run.sub -batch-name square_root'
-
-# %%
-# we can transfer the information of the tree in a yaml for the orchestration later
-root.to_yaml()
-
-# %%
-"""
-### Cloning the templates of the nodes
-From python objects we move the nodes to the file-system.
-"""
-
-# %%
-# We map the pythonic tree in a >folder< tree
-root.clean_log()
-root.rm_children_folders()
-for depth in range(root.height):
-    [x.clone_children() for x in root.generation(depth)]
-
-# VERY IMPORTANT, tagging
-root.tag_as('cloned')
-
-# %%
-"""
-### Launching the jobs
-"""
-
-# %%
-root.tag_as('launched')
-for node in root.generation(1):
-    node.cleanlog_mutate_submit()
-
-# %%
-for node in root.generation(2):
-    node.cleanlog_mutate_submit()
-
-# %%
-for node in root.generation(3):
-    node.cleanlog_mutate_submit()
-
-# %%
-# check if all root descendants are completed 
-if all([descendant.has_been('completed') for descendant in root.descendants]):
-    root.tag_as('completed')
-    print('All jobs are completed!')
-
-# %%
-"""
-### Post-processing
-"""
-
-# %%
-# retrieve the output
-my_list=[]
-for node in root.leaves:
-    output = tree_maker.from_yaml(node.path+'/output.yaml')
-    my_list.append(output['result'])
-
-# %%
-# sanity check
-assert any(np.array(sorted(my_list))-np.array(my_list_original))==0
-
-# %%
-# std of the results
-np.std(my_list)
-
-# %%
-"""
-### Monitoring 
-"""
-
-# %%
-root=tree_maker.tree_from_yaml(f'/home/jovyan/local_host_home/CERNBox/2021/tree_maker/examples/001_example/study_000/tree.yaml')
-
-# %%
-# checking the status
-my_filter = lambda node: node.depth==2 and node.has_been('completed')
-for node in root.descendants:
-    if my_filter(node):
-        print(node.path)
-        
-# one can also use root.find(filter_= lambda node: node.depth==1 and node.has_been('completed'))
-
-# %%
-def my_test(node):
-    output = tree_maker.from_yaml(node.path+'/output.yaml')
-    return node.is_leaf and node.has_been('completed') and output['result']<1.2
-
-for node in root.descendants:
-    if my_test(node):
-        print(node.path) 
-
-# %%
-#or (better)
-for node in root.generation(3):
-    if my_test(node):
-        print(node.path)
-
-# %%
--- a/examples/001_example/001_chronjob.py
+++ b/examples/001_example/001_chronjob.py
--- a/examples/001_example/templates/multiply_it/output.yaml
+++ b/examples/001_example/templates/multiply_it/output.yaml
-result: 1
--- a/examples/001_example/templates/square_root_it/output.yaml
+++ b/examples/001_example/templates/square_root_it/output.yaml
-result: 1.0
--- a/examples/001_example/templates/sum_it/config.yaml
+++ b/examples/001_example/templates/sum_it/config.yaml
 # This is my input
-a: 0    # this is the first element of the sum
+a: -1    # this is the first element of the sum
 b: -1   # this is the second element of the sum
 run_command: 'python run.py'
 log_file: './log.yaml'
--- a/examples/001_example/templates/sum_it/output.yaml
+++ b/examples/001_example/templates/sum_it/output.yaml
-result: -1
+result: -2
--- a/examples/001_example/templates/sum_it/run.sh
+++ b/examples/001_example/templates/sum_it/run.sh
 #!/bin/bash
-source  /afs/cern.ch/eng/tracking-tools/python_installations/activate_default_python  
+#bsub -q hpc_acc -e %J.err -o %J.out cd $PWD && ./run.sh 
+source  /afs/cern.ch/eng/tracking-tools/python_installations/miniconda3/bin/activate  
 python run.py
--- a/examples/003_example/001_make_folders.py
+++ b/examples/003_example/001_make_folders.py
+# %%
+import tree_maker
+from tree_maker import NodeJob
+import time
+
+import numpy as np
+a=np.random.randn(20)
+b=np.random.randn(20)
+c=np.random.randn(10)
+
+my_list_original=[]
+for ii in c:
+    my_list_original+=list((a+b)*ii)
+my_list_original=sorted(my_list_original)
+
+# %%
+"""
+#### The root of the tree 
+"""
+start_time = time.time()
+# %%
+#root
+import os
+my_folder = os.getcwd()
+root = NodeJob(name='root', parent=None)
+root.path = my_folder + '/study_000'
+root.template_path = my_folder + '/templates'
+root.log_file = root.path + "/log.json"
+
+# %%
+"""
+#### First generation of nodes
+"""
+
+# %%
+#first generation
+for node in root.root.generation(0):
+    node.children=[NodeJob(name=f"{child:03}",
+                           parent=node,
+                           path=f"{node.path}/{child:03}",
+                           template_path = root.template_path+'/sum_it',
+                           submit_command = f'bsub -q hpc_acc -e %J.err -o %J.out {root.template_path}/sum_it/run.sh &',
+                           log_file=f"{node.path}/{child:03}/log.json",
+                           dictionary={'a':float(a[child]), 
+                                       'b':float(b[child])
+                                      })
+                   for child in range(len(a))]
+
+# %%
+"""
+To combine different lists one can use the product or the zip functions    
+```
+import itertools
+[[i, j, z] for i, j, z in itertools.product(['a','b'],['c','d'],[1,2,3])]
+[[i, j, z] for i, j, z in zip(['a','b'],['c','d'],[1,2,3])]
+```
+"""
+
+# %%
+"""
+#### Second generation of nodes
+"""
+
+# %%
+# second generation
+for node in root.root.generation(1):
+    node.children=[NodeJob(name=f"{child:03}",
+                           parent=node,
+                           path = f"{node.path}/{child:03}",
+                           template_path = f'{root.template_path}/multiply_it',
+                           submit_command = f'bsub -q hpc_acc -e %J.err -o %J.out {root.template_path}/multiply_it/run.sh &',
+                           log_file=f"{node.path}/{child:03}/log.json",
+                           dictionary={'c': float(c[child])})
+                   for child in range(len(c))]
+
+root.to_json()
+
+print('Done with the tree creation.')
+print("--- %s seconds ---" % (time.time() - start_time))
+
+# %%
+"""
+### Cloning the templates of the nodes
+From python objects we move the nodes to the file-system.
+"""
+
+# %%
+# We map the pythonic tree in a >folder< tree
+start_time = time.time()
+root.clean_log()
+root.rm_children_folders()
+from joblib import Parallel, delayed
+
+for depth in range(root.height):
+#    [x.clone_children() for x in root.generation(depth)]
+     Parallel(n_jobs=8)(delayed(x.clone_children)() for x in root.generation(depth))
+
+# VERY IMPORTANT, tagging
+root.tag_as('cloned')
+print('The tree structure is moved to the file system.')
+print("--- %s seconds ---" % (time.time() - start_time))
--- a/examples/003_example/002_chronjob.py
+++ b/examples/003_example/002_chronjob.py
+# %%
+"""
+Example of a chronjob
+"""
+
+# %%
+import tree_maker
+from tree_maker import NodeJob
+
+
+# %%
+try:
+    root=tree_maker.tree_from_json(
+    f'./study_000/tree.json')
+except Exception as e:
+    print(e)
+    print('Probably you forgot to edit the address of you json file...')
+
+if root.has_been('completed'):
+    print('All descendants of root are completed!')
+else:
+    for node in root.descendants:
+        node.smart_run()
+    if all([descendant.has_been('completed') for descendant in root.descendants]):
+        root.tag_as('completed')
+        print('All descendants of root are completed!')
--- a/examples/003_example/003_postprocessing.py
+++ b/examples/003_example/003_postprocessing.py
+# %%
+"""
+Example of a chronjob
+"""
+
+# %%
+import tree_maker
+from tree_maker import NodeJob
+import pandas as pd
+
+# %%
+# Load the tree from a yaml
+try:
+    root=tree_maker.tree_from_json(
+    f'./study_000/tree.json')
+except Exception as e:
+    print(e)
+    print('Probably you forgot to edit the address of you json file...')
+
+my_list=[]
+if root.has_been('completed'):
+    print('All descendants of root are completed!')
+    for node in root.generation(2):
+        my_list.append(node.has_been('completed'))
+    assert all(my_list)
+    print('Sanity check passed.')
+else:
+    print('Complete first all jobs')
--- a/examples/003_example/004_postprocessing.py
+++ b/examples/003_example/004_postprocessing.py
+# %%
+"""
+Example of a chronjob
+"""
+
+# %%
+import tree_maker
+from tree_maker import NodeJob
+import pandas as pd
+import awkward as ak
+
+# %%
+# Load the tree from a yaml
+try:
+    root=tree_maker.tree_from_json(
+    f'./study_000/tree.json')
+except Exception as e:
+    print(e)
+    print('Probably you forgot to edit the address of you json file...')
+
+my_list=[]
+if root.has_been('completed'):
+    print('All descendants of root are completed!')
+    for node in root.generation(2)[0:100]:
+        my_list.append(ak.from_parquet(f'{node.path}/test.parquet', columns=['x'], row_groups=99)['x',-1])
+    print(my_list)
+else:
+    print('Complete first all jobs')
+
--- a/examples/003_example/005_postprocessing.py
+++ b/examples/003_example/005_postprocessing.py
+# %%
+"""
+Example of a chronjob
+"""
+
+# %%
+import tree_maker
+from tree_maker import NodeJob
+import pandas as pd
+import awkward as ak
+import os
+
+# %%
+# Load the tree from a yaml
+try:
+    root=tree_maker.tree_from_json(
+    f'./study_000/tree.json')
+except Exception as e:
+    print(e)
+    print('Probably you forgot to edit the address of you json file...')
+
+my_list=[]
+if root.has_been('completed'):
+    print('All descendants of root are completed!')
+    for node in root.generation(1):
+        node.tag_as('postprocessing_submitted')
+        node.submit_command=f'bsub -q hpc_acc {node.template_path}/postprocess.sh &'
+        node.submit()
+else:
+    print('Complete first all jobs')
+
--- a/examples/003_example/templates/multiply_it/config.yaml
+++ b/examples/003_example/templates/multiply_it/config.yaml
+# This is my input
+parent: '../sum_it'    # this is the first element of the product
+c: -1   # this is the second element of the product
+log_file: './log.yaml'
\ No newline at end of file
--- a/examples/003_example/templates/multiply_it/log.yaml
+++ b/examples/003_example/templates/multiply_it/log.yaml
+{
+  "0": {
+    "tag": "started",
+    "unix_time": 1624890907618272000,
+    "human_time": "2021-06-28 16:35:07.618272"
+  },
+  "1": {
+    "tag": "completed",
+    "unix_time": 1624890908593553920,
+    "human_time": "2021-06-28 16:35:08.593554"
+  },
+  "2": {
+    "tag": "started",
+    "unix_time": 1624890995812024064,
+    "human_time": "2021-06-28 16:36:35.812024"
+  },
+  "3": {
+    "tag": "completed",
+    "unix_time": 1624890995928683008,
+    "human_time": "2021-06-28 16:36:35.928683"
+  },
+  "4": {
+    "tag": "started",
+    "unix_time": 1624891021181616128,
+    "human_time": "2021-06-28 16:37:01.181616"
+  },
+  "5": {
+    "tag": "completed",
+    "unix_time": 1624891021380608000,
+    "human_time": "2021-06-28 16:37:01.380608"
+  },
+  "6": {
+    "tag": "started",
+    "unix_time": 1624891070778615040,
+    "human_time": "2021-06-28 16:37:50.778615"
+  },
+  "7": {
+    "tag": "completed",
+    "unix_time": 1624891070982253056,
+    "human_time": "2021-06-28 16:37:50.982253"
+  },
+  "8": {
+    "tag": "started",
+    "unix_time": 1624891074472503808,
+    "human_time": "2021-06-28 16:37:54.472504"
+  },
+  "9": {
+    "tag": "completed",
+    "unix_time": 1624891074613457920,
+    "human_time": "2021-06-28 16:37:54.613458"
+  }
+}
\ No newline at end of file
--- a/examples/003_example/templates/multiply_it/run.py
+++ b/examples/003_example/templates/multiply_it/run.py
+import json
+import numpy as np
+import ruamel.yaml
+import tree_maker
+
+# load the configuration
+with open('config.yaml', 'r') as file:
+    yaml = ruamel.yaml.YAML()
+    cfg = yaml.load(file)
+    
+with open(cfg['parent']+'/output.yaml', 'r') as file:
+    yaml = ruamel.yaml.YAML()
+    parent_out = yaml.load(file)
+
+tree_maker.tag_json.tag_it(cfg['log_file'], 'started')
+    
+# define the function (product of two numbers)
+def my_function(my_x, my_y):
+    'Just a multiplication'
+    return my_x*my_y
+
+# run the code
+result = my_function(parent_out['result'], cfg['c'])
+
+with open('output.yaml', 'w') as fp:
+    yaml = ruamel.yaml.YAML()
+    yaml.dump({'result': result}, fp)
+    
+import pandas as pd
+
+pd.DataFrame(np.random.randn(100000,6), columns=['x','xp','y','yp','z','zp']).to_parquet('test.parquet', row_group_size=1000)
+
+tree_maker.tag_json.tag_it(cfg['log_file'], 'completed')
--- a/examples/003_example/templates/multiply_it/run.sh
+++ b/examples/003_example/templates/multiply_it/run.sh
+#!/bin/bash
+source  /afs/cern.ch/eng/tracking-tools/python_installations/miniconda3/bin/activate  
+python /gpfs/gpfs/gpfs_maestro_home_new/hpc/sterbini/tree_maker/examples/002_example/templates/multiply_it/run.py
--- a/examples/003_example/templates/multiply_it/run.sub
+++ b/examples/003_example/templates/multiply_it/run.sub
+#initialdir  = . 
+executable  = run.sh
+output      = .output.txt
+error       = .err.txt
+log         = .log.txt
+should_transfer_files   = yes
+when_to_transfer_output = on_exit
+transfer_input_files    = config.yaml, run.py
+# The line below can be commented it necessary
+#transfer_output_files   = output.yaml
+JobFlavour  = "espresso"
+queue
--- a/examples/003_example/templates/sum_it/config.yaml
+++ b/examples/003_example/templates/sum_it/config.yaml
+# This is my input
+a: -1    # this is the first element of the sum
+b: -1   # this is the second element of the sum
+run_command: 'python run.py'
+log_file: './log.yaml'
--- a/examples/003_example/templates/sum_it/postprocess.py
+++ b/examples/003_example/templates/sum_it/postprocess.py
+import glob
+import awkward as ak
+import numpy as np
+
+my_folders=sorted(glob.glob('0*')) 
+my_list=[]
+for my_folder in my_folders:
+    aux=ak.from_parquet(f'{my_folder}/test.parquet')
+    my_list.append(np.mean(aux))
+aux=ak.Array(my_list)
+ak.to_parquet(aux,'./summary.parquet')
--- a/examples/003_example/templates/sum_it/postprocess.sh
+++ b/examples/003_example/templates/sum_it/postprocess.sh
+#!/bin/bash
+#bsub -q hpc_acc -e %J.err -o %J.out cd $PWD && ./run.sh 
+source  /afs/cern.ch/eng/tracking-tools/python_installations/miniconda3/bin/activate  
+python /gpfs/gpfs/gpfs_maestro_home_new/hpc/sterbini/tree_maker/examples/002_example/templates/sum_it/postprocess.py