Draft: working EMPFlowGA_flows+loose.json (!713) · Merge requests · atlas-flavor-tagging-tools / training-dataset-dumper · GitLab

Snippets Groups Projects

GitLab service scheduled maintenance on Friday, March 21st 2025 as of 08h00 Geneva time for an estimated period of 1 hour. Further information under OTG0154918.

Open Ivan Oleksiyuk requested to merge ioleksiy/training-dataset-dumper:EMPFlowGA into main 8 months ago

1 unresolved thread

Description

Adding a config to dump EMPFlow jets with:

Ghost associated "tracks" (with r22default selection)
Ghost associated tracks with r22loose selection "tracks_loose"
Collection of all Pflow objects "flows"
Collection of all separately "charged" and "neutral" Pflow objects !?!
Tracks left after overlap removal with flows "tracks_OR" and "tracks_OR_loose" !?!

Each collection has a limit of 80!?! constituents (instead of 40 that is default)

!?! - points that are still up for discussion. The size of this ump will be huge as we have twice the number of constituent and way to many collections. Some studies still have to be run to maybe remove "charged" and "neutral" and "tracks_OR", "tracks_OR_loose". The "tracks" might be replaced with "tracks_loose" in the future

The config should be used with: dump-single-btag

Review checklist:

CI Passing
Comments addressed
Source branch is up to date with target

Edited 8 months ago by Ivan Oleksiyuk

Members who can merge are allowed to add commits.

Requires 1 approval from eligible users.

Activity

Ivan Oleksiyuk changed the description 8 months ago

changed the description
Samuel Van Stroud @svanstro started a thread on an old version of the diff 7 months ago

configs/EMPFlowGA_flows+loose.json 0 → 100644

	1	{
Samuel Van Stroud @svanstro · 7 months ago Maintainer Thanks @ioleksiy. I don't love the filename (we should probably avoid `+`). Can we call this `EMPFlow_GN3dev.json` or something? Dan Guest @dguest · 7 months ago Owner to be fair we do have one other config file that has `-` in the name, so apparently it works but I do seem to remember the grid having some issues naming datasets with `-` in the name, so I would assume we could get issues from `+`. Maybe best to avoid those and stick with things that would be a valid variable name in python. Edited 7 months ago by Dan Guest Ivan Oleksiyuk changed this line in version 2 of the diff 7 months ago changed this line in version 2 of the diff I Ivan Oleksiyuk @ioleksiy · 7 months ago Author Contributor Hi, I renamed the file. Sorry for waiting. I also removed the OR collections (as we discussed OR is not needed for training) and reduced the number of constituents back to a maximum of 40 (it is a default for us for now as I understand although not based on any studies as far as I am aware) Please register or sign in to reply

Nicole Michelle Hartman approved this merge request 7 months ago

approved this merge request

Ivan Oleksiyuk added 1 commit 7 months ago

added 1 commit

8ada9ce7 - renamed file, removed OR, reduced to 40 constituents

Compare with previous version

Ivan Oleksiyuk reset approvals from @hartman by pushing to the branch 7 months ago

reset approvals from @hartman by pushing to the branch

Ivan Oleksiyuk added 19 commits 7 months ago

added 19 commits

8ada9ce7...2ee28420 - 18 commits from branch atlas-flavor-tagging-tools:main
860cc944 - Merge branch 'main' of...

Compare with previous version

Samuel Van Stroud mentioned in merge request !744 (merged) 6 months ago

mentioned in merge request !744 (merged)

Please register or sign in to reply