add_handler_remove_hashcode
Currently, there are hash codes at the end of some dictionary names, making it difficult for other handlers to get the correct path. This handler "HashRemoveHandler" recursively obtain all the directories, remove hash code and save a new file. Hash code is judged by un underline followed by 8 random characters at the end of the directory's name, e.g."_(\w{8})$", and for directories with same name but only different hash code, they are distinguished by adding _variantX where X ∈ 1, 2, 3 ... N.
Merge request reports
Activity
requested review from @msaur
assigned to @xueting
@msaur is it understood where are these hashes coming from? Is it the new histograms service in Gaudi which appends them to names/paths automatically to avoid problems when running in concurrent/multithreaded mode? Is it possible to simply configure the underlying application not to have these hashes in the final output? If not, will it not be better to have a generic handler which would rename the paths, cutting out the hashes from them (in this case it can be reused by other apps, such as Gauss and Boole, instead of having something RTA-specific)?
I fully agree that some general handler which would do the parsing and/or renaming would be better, one could pass list of expected string as an input to the algorithm and them jut re-use it anywhere as needed.
I still haven't managed to have a chat with Sebastien, but in fact I think we could ask him directly here. @sponce here we are dealing with the hashes in name of algorithms which then being saved as a path in the output ROOT file. When configuration changes, which triggers change of hashes, also the paths will change which makes problematic to maintain stability of our monitoring tasks.
To give some specific example, we are using PrUpgradeChecking.py which then calls (among many others) PrChecker.cpp which saves histograms for various track and particle categories in paths like
Track/BestLongTrackChecker_{hash}/BestLong/
So we are trying to deal with this issue and @dpopov raised a point if it is possible to change configuration so the final file will not save hashes in paths or that is more problematic.
After a quick discussion with @dovombru on WP3 meeting, it seems any general solution within the configuration is rather difficult and would require quite some time, so I would go forward with having a generic handler which could work for all test in LHCbPR.
@xueting I wonder if your new handler could be modified that list of correct paths would be always passed by a handler which needs to rename some paths. So we would have a generic handler which is able to save a new file with a correct paths which would be configured by those handlers which need it. @dpopov Better ideas?
Is passing a list of names necessary? From what I see at a glance -- the hash is always eight symbols long, and is added as a trailing suffix to the directory name, separated (from the algorithm name?) by an underscore. We can double check in Gaudi(?) code how the hash is calculated, if its length is fixed or variable, but it seems it is rather trivial to locate and remove it, unless I miss something.
I think it makes sense indeed to remove it. This is the solution that was adopted for the Allen online monitoring (see this comment). The only thing to be careful about is that one algorithm is not configured twice with different conditions, both of them producing output that would be relevant for the dashboard. In this case only one of the outputs would be produced.
I am not sure what are the plans for running LHCb applications in MT mode, but if in general case this may happen, then it is better to foresee in the code. It does not sound complicated: if a directory with such algorithm name already exists, then the duplicate is stored with some suffix with an counter
_variant2
, or something like that (counter would make sense if there may be more than one duplicates)?Ah, ok, so it is just to distinguish between multiple copies of the same algorithms, but which may have different configurations, I see. This does not change the result .. however.. now I start wondering: is there a way to distinguish between these algorithms only by the contents of the output ROOT file? If they appear in random order -- they will be renamed in random order, and there will be consistency in names between the different results. Do we care about this or not?
As discussed with @dpopov we may go forward with just removing
_{hash}
from the name of folders and then saving a new file. Naming scheme looks stables so it should be simple to create a generic code which will iterate over all folder and remove_{hash}
suffix.In case of more than once instance of one folder then it can be just renamed as
folder_1
.@xueting Can you take a look on this?
Checking the files at
/eos/lhcb/user/x/xueting/RTA/hash_code
, for original file I see:Track/TrackResCheckerBestLong/Long
where in the modified file we have:
Track/TrackResCheckerBestLong/Long_variant2
Similar is happening for quite a few folders as well. This seems to be some byproduct of the required changes, in the moment there is no hash then no change should happen at all. I guess that in the moment we will just rewrite original file this could be simpler.
In the new commit, this issue has been solved. It happens because that there are identically named subfolders within different parent folders. I changed the way of comparing folder names, and now this problem no longer exists. You can take a look at /eos/lhcb/user/x/xueting/RTA/hash_code to see the newly modified file. Besides, I slightly modified the code. If both input_file_path and output_file_path are provided, modified file will be saved in output_file_path;If only input_file_path is provided, modified file will replace the old file.
Seems that @xueting was proactive and already implemented changes to efficiency handlers, however we still need this
HashRemoveHandler
to fix files which are used as the input for the dashboard (i.e. direct output of every job we are using). So it still needs to be added to each job we are running. Otherwise lhcb-core/LHCbPR2FE2!182 would not work and we would still have problem with showing plots on the dashboard.
mentioned in merge request !245 (closed)
- Resolved by Xueting Yang
- Resolved by Xueting Yang
- Resolved by Xueting Yang
added 1 commit
- c5c96bee - Update HashRemoveHandler.py, put the example usage below the general class description
added 1 commit
- a7ee38ad - Update PrCheckerSummaryHandler.py, rename the new file it's original name
added 1 commit
- 62c23fa3 - Update PrCheckerSummaryHandler_withoutUT.py, rename the new file to it's original name
mentioned in commit 465e676a