Commit 1f3a69aa authored by Andrew McNab's avatar Andrew McNab
Browse files

Update README for HTCondor

parent 38dac6b1
See https://twiki.cern.ch/twiki/bin/view/LCG/MachineJobFeaturesImplementations
for more about the mjf-scripts implementations of Machine/Job Features
Machine/Job Features Scripts
============================
Sources are maintained at https://gitlab.cern.ch/machinejobfeatures/mjf-scripts
See https://twiki.cern.ch/twiki/bin/view/LCG/MachineJobFeaturesImplementations
for more about the mjf-scripts implementations and Machine/Job Features
These files can either be used directly, or the torque-rpm and htcondor-rpm
Makefile targets can be used to build RPMs for SL 6.x.
Makefile targets can be used to build RPMs for SL 6.x. This README assumes
you have built the RPM yourself or downloaded the pre-built RPM from
https://repo.gridpp.ac.uk/machinejobfeatures/mjf-scripts/
Torque/PBS
----------
1. Common configuration
2. Torque/PBS configuration
3. HTCondor configuration
Pre-builts RPMs are available at
https://repo.gridpp.ac.uk/machinejobfeatures/mjf-scripts/
1. Common configuration
-----------------------
In the simplest case, just install the RPM and the /etc/rc.d/init.d/mjf
script will be run to create /etc/machinefeatures. $MACHINEFEATURES is
set to this value by /etc/profile.d/mjf.sh and /etc/profile.d/mjf.csh which
are (likely to be) sourced by new logins/jobs.
The files /etc/sysconfig/mjf and /var/run/mjf are read when creating the
$MACHINEFEATURES (and $JOBFEATURES) directories, and can provide default
values for Machine/Job Features key/value pairs. /var/run/mjf values take
precedence. Note that files in /var/run are deleted at system boot time.
Values given this way override values obtained from the system (eg the
total number of logical processors), but are overridden in turn when
per-job values can be determined from the batch system (eg the number
of logical processors allocated to this job.)
If you know the HS06 of the worker node, you can also include a line like
hs06=99.99 which will be picked up when populating /etc/machinefeatures/
(you can force updates after changing that file with service mjf start
as the mjf script looks like a SysV service.) This is then used to create
$MACHINEFEATURES/hs06 for the whole WN.
To use the RPMs, just install the RPM and the /etc/rc.d/init.d/mjf script
is run to create /etc/machinefeatures/ and
/var/lib/torque/mom_priv/prologue.user which is run by Torque at the start of
each job to create a jobfeatures-$PBS_JOBID in the user’s home directory.
By default, the per-job $JOBFEATURES directories will be created under
/tmp/$USER but you can use a directory other than /tmp by setting
mjf_tmp_dir=/DESIRED/PATH in either mjf file.
2. Torque/PBS
-------------
The mjf-torque RPM installs /var/lib/torque/mom_priv/prologue.user which is
run by Torque at the start of each job to create
$JOBFEATURES=/tmp/mjf-$USER/jobfeatures-$PBS_JOBID (by default), and installs
/var/lib/torque/mom_priv/epilogue.user runs at the end of the job to clean up
that directory. There are mjf.sh and mjf.csh scripts created in /etc/profile.d
which define $MACHINEFEATURES and $JOBFEATURES for the job itself to use.
If you know the HS06 of each worker node, you can also create
/etc/sysconfig/mjf with a line like hs06=99.99 which will be picked up
when populating /etc/machinefeatures/ (you can force updates after
changing that file with service mjf start as the mjf script looks like a
SysV service.) This is then used to create $MACHINEFEATURES/hs06 for the
whole WN and $JOBFEATURES/hs06_job with the pro-rata HS06 for that job
based on the number of processors Torque has assigned it (the ppn number,
but it defaults to 1.)
When creating $MACHINEFEATURES/total_cpu, the scripts use the value
given in /etc/sysconfig/mjf (and/or the transient /var/run/mjf),
or if not available, the value obtained by running the pbsnodes command,
or if not available, the value obtained by counting 'processor' lines in
that directory.
$JOBFEATURES/hs06_job is calculated from $MACHINEFEATURES/hs06 with a pro-rata
share for the job in question, based on $JOBFEATURES/allocated_cpu which is
turn taken from the Torque ppn for the job (default 1.)
When creating $MACHINEFEATURES/total_cpu, the /usr/sbin/mjf-get-total-cpu
script uses the value obtained by running the pbsnodes command for the node.
This can be overriden by setting total_cpu in either mjf file. If the value
cannot otherwise by found, it is obtained by counting 'processor' lines in
/proc/cpuinfo.
3. HTCondor
-----------
The mjf-htcondor RPM installs /usr/sbin/make-jobfeatures script which must
be run as part of the HTCondor user job wrapper. If a job wrapper is not
already defined, then this can simply be done by setting
USER_JOB_WRAPPER=/usr/sbin/mjf-job-wrapper in the HTCondor configuration.
If a job wrapper is already being used, then it must be modified to run
/usr/sbin/make-jobfeatures in the way mjf-job-wrapper does.
$JOBFEATURES/hs06_job is calculated from $MACHINEFEATURES/hs06 with a pro-rata
share for the job in question, based on $JOBFEATURES/allocated_cpu which is
turn taken from the CpusProvisioned value in the job ad (default 1.)
When creating $MACHINEFEATURES/total_cpu, the /usr/sbin/mjf-get-total-cpu
script uses the value obtained by running condor_config_val NUM_CPUS to
discover the number of logical processors HTCondor can allocated to jobs.
This can be overriden by setting total_cpu in either mjf file. If the value
cannot otherwise by found, it is obtained by counting 'processor' lines in
/proc/cpuinfo.
mjf_tmp_dir=/tmp
if ( -r /etc/sysconfig/mjf ) then
source /var/sysconfig/mjf
endif
if ( -r /var/run/mjf ) then
source /var/run/mjf
endif
if ( -d /etc/machinefeatures ) then
setenv MACHINEFEATURES /etc/machinefeatures
endif
......@@ -8,7 +8,6 @@ endif
if ( -d /etc/machinefeatures ) then
setenv MACHINEFEATURES /etc/machinefeatures
endif
test
if ( "$PBS_JOBID" != "" && -d "$mjf_tmp_dir/mjf-$USER/jobfeatures-$PBS_JOBID" ) then
setenv JOBFEATURES "$mjf_tmp_dir/mjf_$USER/jobfeatures-$PBS_JOBID"
endif
mjf_tmp_dir=/tmp
if [ -r /etc/sysconfig/mjf ] ; then
. /etc/sysconfig/mjf
fi
if [ -r /var/run/mjf ] ; then
. /etc/sysconfig/mjf
fi
if [ -d /etc/machinefeatures ] ; then
export MACHINEFEATURES=/etc/machinefeatures
fi
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment