Merge branch 'hotfix-flowsim-gpu' into 'master'

Hotfix flowsim gpu See merge request !1056

Merge branch 'hotfix-flowsim-gpu' into 'master'
fe554f31 · Robin Hofsaess · 7612f3e8 · 2f3867ea · fe554f31 · fe554f31
Commit fe554f31 authored 2 weeks ago by Robin Hofsaess
--- a/cms/flowsim-gpu/CHANGELOG.md
+++ b/cms/flowsim-gpu/CHANGELOG.md
+# Changelog for CMS-FlowSim-GPU
+## [Unreleased]
+## [v0.1] Initial blank version
+- Setting up naming and cleanup
+- Parser is reused unchanged
--- a/cms/flowsim-gpu/Dockerfile.append
+++ b/cms/flowsim-gpu/Dockerfile.append
+# Add here any workload-specific Dockerfile instructions.
+# They will be appended to the Dockerfile generated from a common template.
+# Get inputs
+RUN wget -q https://hep-benchmarks.web.cern.ch/hep-workloads/data/cms/flowsim/checkpoint-latest.pt -O /bmk/data/checkpoint-latest.pt; \
+    wget -q https://hep-benchmarks.web.cern.ch/hep-workloads/data/cms/flowsim/gen_ttbar_2M_final-001.npy -O /bmk/data/gen_ttbar_2M_final-001.npy; \
+    wget -q https://github.com/francesco-vaselli/FlowSim/archive/ecb8256a9eca6ff77c7850da0ab5ded724a5cae6.tar.gz -O /tmp/benchmark.tgz ; 
+RUN tar -xvzf  /tmp/benchmark.tgz  -C /bmk; rm -f /tmp/benchmark.tgz; mv /bmk/FlowSim* /bmk/FlowSim-benchmark; 
+RUN python3 -m pip install --upgrade setuptools
+RUN python3 -m pip install wheel
+RUN python3.9 -m pip install -r /bmk/FlowSim-benchmark/requirements.txt
--- a/cms/flowsim-gpu/cms-flowsim-gpu.spec
+++ b/cms/flowsim-gpu/cms-flowsim-gpu.spec
+HEPWL_BMKEXE=cms-flowsim-gpu-bmk.sh
+HEPWL_BMKOPTS="-e 1 -c1 -m none -x '--device cpu --n-objects 45000 --batch-size 1000 --n-model-instances 1 ' "
+HEPWL_BMKDIR=cms-flowsim-gpu
+HEPWL_BMKDESCRIPTION="FlowSim as per https://github.com/francesco-vaselli/FlowSim"
+HEPWL_DOCKERIMAGENAME=cms-flowsim-gpu-bmk
+HEPWL_DOCKERIMAGETAG=ci-v0.1
+HEPWL_CVMFSREPOS=NONE
+HEPWL_EXTEND_SFT_SPEC=""
+HEPWL_BMKOS="gitlab-registry.cern.ch/linuxsupport/alma9-base:latest"
+HEPWL_BUILDARCH="x86_64,aarch64"
+HEPWL_BMKUSEGPU=0
\ No newline at end of file
--- a/cms/flowsim-gpu/cms-flowsim-gpu/DESCRIPTION
+++ b/cms/flowsim-gpu/cms-flowsim-gpu/DESCRIPTION
+The present benchmark replicates the inference step of the CMS FlashSim simulation framework.
+A ML model is deployed to simulate an arbitrary number of reconstructed particle jets starting from a collection of generator-level jets.
+Models can also be deployed sequentially to replicate the simulation of different, correlated object collections in FlashSim.
+Code from https://github.com/francesco-vaselli/FlowSim/tree/benchmark
+The figure of merit being tracked is the throughput measured in object simulated per second.
+A series of other stats related to CPU/GPU usage are also being logged.
+-- Usage: --
+The script accepts the following arguments:
+      '-n': Number of copies
+      '-t': Number of threads [This has no effect when running on GPU!]
+      ['-e': +++This variable is unused in this workload. Number of events is set with -x '--nobjects'!+++]
+      '-x': The following EXTRA_ARGS are provided:
+          '--mode': different run modes: cpu, gpu, full, manual. The mode defines '--device cpu/cuda/vulkan' automatically (see below).
+          '--n-gpus': Number of GPUs to use. Note that in GPU only mode, NCOPIES needs to be <= NGPUS! [-1: use all available]
+          '--nevents-reduced': Number of events to be processed in a CPU copy. Note: This overwrites --nobjects for CPU WLs!
+          '--cpu-copies': [manual mode only] Number of CPU copies to run
+          '--gpu-copies': [manual mode only] Number of GPU copies to run
+          ['--device': Defines where to run: cpu or cuda/vulkan. Note that it is not directly configurable but defined by the mode (see above).]
+          '--n-objects' the Number of objects to simulate [Scales runtime and GPU utilization]
+          '--batch-size', the Batch size for inference [Scales VRAM consumption]
+          '--n-model-instances', the Number of model instances to run [Scales mainly runtime; small influence on GPU utilization]
+          '--additional', string of additional CLI arguments for the WL that are not validated
+                {"
+                --gpu-memory-limit: the GPU memory limit in GB
+                --num-threads: the Number of PyTorch threads to use [Has no effect on GPU mode!]
+                --monitor-interval: the Resource monitoring interval in seconds
+                [--gpu-id: the GPU device ID to use (default: 0) -> not configurable in the current version!]
+                "}
+-- Run Modes: --
+In >>cpu<< mode, only the CPU is used (-> --device cpu).
+The mode >>gpu<< runs the WL on a GPU. The matching here is one copy per GPU. Note that no overbooking is allowed (NGPUS must be <= maximum available GPUs and NCOPIES <= NGPUS).
+Only the >>manual<< mode allows an overbooking of a machine. Here, all available input variables must be set manually and are not validated. BE CAREFUL!
+The >>full<< mode automatically loads the entire machine and does not allow any configuration.
+Note that the number of threads can only be steered for CPU copies, as GPU is always single-threaded.
+Additionally, the number of events is not steered by '-e', but by -x '--nobjects XYZ'.
+In CPU mode,
\ No newline at end of file
--- a/cms/flowsim-gpu/cms-flowsim-gpu/cms-flowsim-gpu-bmk.sh
+++ b/cms/flowsim-gpu/cms-flowsim-gpu/cms-flowsim-gpu-bmk.sh
--- a/cms/flowsim-gpu/cms-flowsim-gpu/parseResults.py
+++ b/cms/flowsim-gpu/cms-flowsim-gpu/parseResults.py
+import json
+import glob
+import os
+from collections import defaultdict
+from math import log10, floor
+def round_to_significant_digits(value, digits):
+    if not isinstance(value, (int, float)):
+        return value
+    if value == 0:
+        return 0
+    else:
+        return round(value, digits - int(floor(log10(abs(value)))) - 1)
+def process_value(value, significant_digits):
+    print(value)
+    if isinstance(value, dict):
+        return {k: process_value(v, significant_digits) for k, v in value.items()}
+    elif isinstance(value, (int, float)):
+        return round_to_significant_digits(value, significant_digits)
+    else:
+        return value
+def merge_dicts(dicts, significant_digits=5):
+    """Merge multiple dictionaries with identical structure, creating lists at the leaf nodes."""
+    def merge_values(val_list, significant_digits=5):
+        if all(isinstance(v, dict) for v in val_list):
+            # If all values are dictionaries, merge them recursively
+            merged = {}
+            keys = set(k for d in val_list for k in d)
+            for key in keys:
+                merged[key] = merge_values([d[key] for d in val_list if key in d])
+            return merged
+        else:
+            # If the values are not dictionaries, return them as a list
+            return [round_to_significant_digits(v, significant_digits) for v in val_list]
+    return merge_values(dicts, significant_digits)
+def parse_results(baseWDir, significant_digits=5):
+    # Initialize variables to store cumulative results
+    total_throughput = 0
+    all_data = []
+    # Loop through each JSON file and collect data
+    for jsonFile in glob.glob(os.path.join(baseWDir, 'proc_*/flowsim_output_*.json')):
+        with open(jsonFile, 'r') as f:
+            data = json.load(f)
+            total_throughput += data['performance']['throughput_objects_per_second']
+            all_data.append(data)
+    # Merge all collected data
+    merged_results = merge_dicts(all_data, significant_digits)
+    # Generate the JSON output
+    resJSON = {
+        "wl-scores": {
+            "flowsim": round_to_significant_digits(total_throughput, significant_digits)
+        },
+        "wl-stats": process_value(merged_results, significant_digits)
+    }
+    # Write the JSON output to a file
+    output_file = os.path.join(baseWDir, 'parser_output.json')
+    with open(output_file, 'w') as f:
+        json.dump(resJSON, f, indent=4)
+# Example usage:
+# parse_results('/path/to/baseWDir', significant_digits=5)
+if __name__ == "__main__":
+    import sys
+    parse_results(sys.argv[1], significant_digits=5)
--- a/cms/flowsim-gpu/cms-flowsim-gpu/parseResults.sh
+++ b/cms/flowsim-gpu/cms-flowsim-gpu/parseResults.sh
+# Copyright 2019-2020 CERN. See the COPYRIGHT file at the top-level
+# directory of this distribution. For licensing information, see the
+# COPYING file at the top-level directory of this distribution.
+parseResultsDir=$(cd $(dirname ${BASH_SOURCE}); pwd) # needed to locate parseResults.py
+# Function parseResults must be defined in each benchmark (or in a separate file parseResults.sh)
+# The following variables are guaranteed to be defined and exported: NCOPIES, NTHREADS, NEVENTS_THREAD, BMKDIR, DEBUG, APP
+# Logfiles have been stored in process-specific working directories <basewdir>/proc_<1...NCOPIES>
+# The function is started in the base working directory <basewdir>:
+# please store here the overall json summary file for all NCOPIES processes combined
+function parseResults(){
+  echo "[parseResults] current directory: $(pwd)"
+  # #-----------------------
+  # Parse results (bash)
+  #-----------------------
+  echo "[parseResults] python parser starting"
+  # Call the Python script
+  python3 ${parseResultsDir}/parseResults.py "$baseWDir"
+  shstatus=$?
+  [ "$shstatus" != "0" ] && return $shstatus
+  #-----------------------
+  # Return status
+  #-----------------------
+  return $shstatus
+}
--- a/cms/flowsim-gpu/cms-flowsim-gpu/test_parser.sh
+++ b/cms/flowsim-gpu/cms-flowsim-gpu/test_parser.sh
+#!/bin/bash
+#$(dirname $0)/../../../common/parsertest.sh $(dirname $0)