Skip to content
Snippets Groups Projects
Commit fe554f31 authored by Robin Hofsaess's avatar Robin Hofsaess
Browse files

Merge branch 'hotfix-flowsim-gpu' into 'master'

Hotfix flowsim gpu

See merge request !1056
parents 7612f3e8 2f3867ea
Branches
No related tags found
1 merge request!1056Hotfix flowsim gpu
# Changelog for CMS-FlowSim-GPU
## [Unreleased]
## [v0.1] Initial blank version
- Setting up naming and cleanup
- Parser is reused unchanged
# Add here any workload-specific Dockerfile instructions.
# They will be appended to the Dockerfile generated from a common template.
# Get inputs
RUN wget -q https://hep-benchmarks.web.cern.ch/hep-workloads/data/cms/flowsim/checkpoint-latest.pt -O /bmk/data/checkpoint-latest.pt; \
wget -q https://hep-benchmarks.web.cern.ch/hep-workloads/data/cms/flowsim/gen_ttbar_2M_final-001.npy -O /bmk/data/gen_ttbar_2M_final-001.npy; \
wget -q https://github.com/francesco-vaselli/FlowSim/archive/ecb8256a9eca6ff77c7850da0ab5ded724a5cae6.tar.gz -O /tmp/benchmark.tgz ;
RUN tar -xvzf /tmp/benchmark.tgz -C /bmk; rm -f /tmp/benchmark.tgz; mv /bmk/FlowSim* /bmk/FlowSim-benchmark;
RUN python3 -m pip install --upgrade setuptools
RUN python3 -m pip install wheel
RUN python3.9 -m pip install -r /bmk/FlowSim-benchmark/requirements.txt
HEPWL_BMKEXE=cms-flowsim-gpu-bmk.sh
HEPWL_BMKOPTS="-e 1 -c1 -m none -x '--device cpu --n-objects 45000 --batch-size 1000 --n-model-instances 1 ' "
HEPWL_BMKDIR=cms-flowsim-gpu
HEPWL_BMKDESCRIPTION="FlowSim as per https://github.com/francesco-vaselli/FlowSim"
HEPWL_DOCKERIMAGENAME=cms-flowsim-gpu-bmk
HEPWL_DOCKERIMAGETAG=ci-v0.1
HEPWL_CVMFSREPOS=NONE
HEPWL_EXTEND_SFT_SPEC=""
HEPWL_BMKOS="gitlab-registry.cern.ch/linuxsupport/alma9-base:latest"
HEPWL_BUILDARCH="x86_64,aarch64"
HEPWL_BMKUSEGPU=0
\ No newline at end of file
The present benchmark replicates the inference step of the CMS FlashSim simulation framework.
A ML model is deployed to simulate an arbitrary number of reconstructed particle jets starting from a collection of generator-level jets.
Models can also be deployed sequentially to replicate the simulation of different, correlated object collections in FlashSim.
Code from https://github.com/francesco-vaselli/FlowSim/tree/benchmark
The figure of merit being tracked is the throughput measured in object simulated per second.
A series of other stats related to CPU/GPU usage are also being logged.
-- Usage: --
The script accepts the following arguments:
'-n': Number of copies
'-t': Number of threads [This has no effect when running on GPU!]
['-e': +++This variable is unused in this workload. Number of events is set with -x '--nobjects'!+++]
'-x': The following EXTRA_ARGS are provided:
'--mode': different run modes: cpu, gpu, full, manual. The mode defines '--device cpu/cuda/vulkan' automatically (see below).
'--n-gpus': Number of GPUs to use. Note that in GPU only mode, NCOPIES needs to be <= NGPUS! [-1: use all available]
'--nevents-reduced': Number of events to be processed in a CPU copy. Note: This overwrites --nobjects for CPU WLs!
'--cpu-copies': [manual mode only] Number of CPU copies to run
'--gpu-copies': [manual mode only] Number of GPU copies to run
['--device': Defines where to run: cpu or cuda/vulkan. Note that it is not directly configurable but defined by the mode (see above).]
'--n-objects' the Number of objects to simulate [Scales runtime and GPU utilization]
'--batch-size', the Batch size for inference [Scales VRAM consumption]
'--n-model-instances', the Number of model instances to run [Scales mainly runtime; small influence on GPU utilization]
'--additional', string of additional CLI arguments for the WL that are not validated
{"
--gpu-memory-limit: the GPU memory limit in GB
--num-threads: the Number of PyTorch threads to use [Has no effect on GPU mode!]
--monitor-interval: the Resource monitoring interval in seconds
[--gpu-id: the GPU device ID to use (default: 0) -> not configurable in the current version!]
"}
-- Run Modes: --
In >>cpu<< mode, only the CPU is used (-> --device cpu).
The mode >>gpu<< runs the WL on a GPU. The matching here is one copy per GPU. Note that no overbooking is allowed (NGPUS must be <= maximum available GPUs and NCOPIES <= NGPUS).
Only the >>manual<< mode allows an overbooking of a machine. Here, all available input variables must be set manually and are not validated. BE CAREFUL!
The >>full<< mode automatically loads the entire machine and does not allow any configuration.
Note that the number of threads can only be steered for CPU copies, as GPU is always single-threaded.
Additionally, the number of events is not steered by '-e', but by -x '--nobjects XYZ'.
In CPU mode,
\ No newline at end of file
This diff is collapsed.
import json
import glob
import os
from collections import defaultdict
from math import log10, floor
def round_to_significant_digits(value, digits):
if not isinstance(value, (int, float)):
return value
if value == 0:
return 0
else:
return round(value, digits - int(floor(log10(abs(value)))) - 1)
def process_value(value, significant_digits):
print(value)
if isinstance(value, dict):
return {k: process_value(v, significant_digits) for k, v in value.items()}
elif isinstance(value, (int, float)):
return round_to_significant_digits(value, significant_digits)
else:
return value
def merge_dicts(dicts, significant_digits=5):
"""Merge multiple dictionaries with identical structure, creating lists at the leaf nodes."""
def merge_values(val_list, significant_digits=5):
if all(isinstance(v, dict) for v in val_list):
# If all values are dictionaries, merge them recursively
merged = {}
keys = set(k for d in val_list for k in d)
for key in keys:
merged[key] = merge_values([d[key] for d in val_list if key in d])
return merged
else:
# If the values are not dictionaries, return them as a list
return [round_to_significant_digits(v, significant_digits) for v in val_list]
return merge_values(dicts, significant_digits)
def parse_results(baseWDir, significant_digits=5):
# Initialize variables to store cumulative results
total_throughput = 0
all_data = []
# Loop through each JSON file and collect data
for jsonFile in glob.glob(os.path.join(baseWDir, 'proc_*/flowsim_output_*.json')):
with open(jsonFile, 'r') as f:
data = json.load(f)
total_throughput += data['performance']['throughput_objects_per_second']
all_data.append(data)
# Merge all collected data
merged_results = merge_dicts(all_data, significant_digits)
# Generate the JSON output
resJSON = {
"wl-scores": {
"flowsim": round_to_significant_digits(total_throughput, significant_digits)
},
"wl-stats": process_value(merged_results, significant_digits)
}
# Write the JSON output to a file
output_file = os.path.join(baseWDir, 'parser_output.json')
with open(output_file, 'w') as f:
json.dump(resJSON, f, indent=4)
# Example usage:
# parse_results('/path/to/baseWDir', significant_digits=5)
if __name__ == "__main__":
import sys
parse_results(sys.argv[1], significant_digits=5)
# Copyright 2019-2020 CERN. See the COPYRIGHT file at the top-level
# directory of this distribution. For licensing information, see the
# COPYING file at the top-level directory of this distribution.
parseResultsDir=$(cd $(dirname ${BASH_SOURCE}); pwd) # needed to locate parseResults.py
# Function parseResults must be defined in each benchmark (or in a separate file parseResults.sh)
# The following variables are guaranteed to be defined and exported: NCOPIES, NTHREADS, NEVENTS_THREAD, BMKDIR, DEBUG, APP
# Logfiles have been stored in process-specific working directories <basewdir>/proc_<1...NCOPIES>
# The function is started in the base working directory <basewdir>:
# please store here the overall json summary file for all NCOPIES processes combined
function parseResults(){
echo "[parseResults] current directory: $(pwd)"
# #-----------------------
# Parse results (bash)
#-----------------------
echo "[parseResults] python parser starting"
# Call the Python script
python3 ${parseResultsDir}/parseResults.py "$baseWDir"
shstatus=$?
[ "$shstatus" != "0" ] && return $shstatus
#-----------------------
# Return status
#-----------------------
return $shstatus
}
#!/bin/bash
#$(dirname $0)/../../../common/parsertest.sh $(dirname $0)
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment