Commit ef151f77 authored by Andrew McNab's avatar Andrew McNab
Browse files

Add Grid Engine to docs

parent 7f386bb0
......@@ -12,22 +12,31 @@ https://repo.gridpp.ac.uk/machinejobfeatures/mjf-scripts/
1. Common configuration
2. Torque/PBS configuration
3. HTCondor configuration
4. Only Machine Features
5. DIRAC Benchmark (DB12)
4. Grid Engine configuration
5. Only Machine Features
6. DIRAC Benchmark (DB12)
1. Common configuration
-----------------------
In the simplest case, just install either Torque or HTCondor RPM and the
/etc/rc.d/init.d/mjf script will be run to create /etc/machinefeatures.
$MACHINEFEATURES is set to this value by /etc/profile.d/mjf.sh and
/etc/profile.d/mjf.csh which are (likely to be) sourced by new logins/jobs.
In the simplest case, just install either the Torque, HTCondor, or Grid
Engine RPM and the /etc/rc.d/init.d/mjf script will be run to create
/etc/machinefeatures. $MACHINEFEATURES is set to this value by
/etc/profile.d/mjf.sh and /etc/profile.d/mjf.csh which are (likely to be)
sourced by new logins/jobs.
The files /etc/sysconfig/mjf and /var/run/mjf are read when creating the
$MACHINEFEATURES (and $JOBFEATURES) directories, and can provide default
values for Machine/Job Features key/value pairs. /var/run/mjf values take
precedence. Note that files in /var/run are deleted at system boot time.
These files can contain the following $MACHINEFEATURES keys:
total_cpu hs06 grace_secs shutdowntime
These files can contain the following $JOBFEATURES keys:
allocated_cpu hs06_job wall_limit_secs cpu_limit_secs
max_rss_bytes max_swap_bytes scratch_limit_bytes
Values given this way override values obtained from the system (eg the
total number of logical processors), but are overridden in turn when
per-job values can be determined from the batch system (eg the number
......@@ -36,10 +45,10 @@ of logical processors allocated to this job.)
The values cpu_limit_secs_per_cpu, max_rss_bytes_per_cpu,
max_swap_bytes_per_cpu, and scratch_limit_bytes_per_cpu can be set in
either mjf file to cause the scripts to calculate the corresponding
per-job value (eg cpu_limit_secs) using $JOBFEATURES/allocated_cpu (which
will be determined from the batch system if available, otherwise 1.)
per-job value (eg cpu_limit_secs) by multiplying by $JOBFEATURES/allocated_cpu
(which will be determined from the batch system if available, otherwise 1.)
If you know the HS06 of the worker node, you can also include a line like
If you know the HS06 of the worker node, you can include a line like
hs06=99.99 which will be picked up when populating /etc/machinefeatures/
(you can force updates after changing that file with service mjf start
as the mjf script looks like a SysV service.) This is then used to create
......@@ -55,8 +64,8 @@ mjf_tmp_dir=/DESIRED/PATH in either mjf file.
The mjf-torque RPM installs /var/lib/torque/mom_priv/prologue.user which is
run by Torque at the start of each job to create
$JOBFEATURES=/tmp/mjf-$USER/jobfeatures-$PBS_JOBID (by default), and installs
/var/lib/torque/mom_priv/epilogue.user runs at the end of the job to clean up
that directory.
/var/lib/torque/mom_priv/epilogue.user which runs at the end of the job to
clean up that directory.
$JOBFEATURES/hs06_job is calculated from $MACHINEFEATURES/hs06 with a pro-rata
share for the job in question, based on $JOBFEATURES/allocated_cpu which is in
......@@ -71,7 +80,7 @@ cannot otherwise by found, it is obtained by counting 'processor' lines in
3. HTCondor
-----------
The mjf-htcondor RPM installs /usr/sbin/make-jobfeatures script which must
The mjf-htcondor RPM installs the /usr/sbin/make-jobfeatures script which must
be run as part of the HTCondor user job wrapper. If a job wrapper is not
already defined, then this can simply be done by setting
USER_JOB_WRAPPER = /usr/sbin/mjf-job-wrapper in the HTCondor configuration.
......@@ -90,7 +99,25 @@ This can be overriden by setting total_cpu in either mjf file. If the value
cannot otherwise by found, it is obtained by counting 'processor' lines in
/proc/cpuinfo.
4. Only Machine Features
4. Grid Engine
--------------
The mjf-gridengine RPM installs the /usr/sbin/make-jobfeatures script which
must be run as part of the user environment set up and creates the $JOBFEATURES
directory. The files /etc/profile.d/mjf.sh and mjf.csh are installed to do this.
NOTE that the value of wall_limit_secs MUST be set in either /etc/sysconfig/mjf
or /var/run/mjf as this value is not supplied to jobs by Grid Engine.
$JOBFEATURES/hs06_job is calculated from $MACHINEFEATURES/hs06 with a pro-rata
share for the job in question, based on $JOBFEATURES/allocated_cpu which is in
turn taken from $NSLOTS set by Grid Engine for the job (default 1.)
Setting total_cpu in either mjf file will set the value to use for
$MACHINEFEATURES/total_cpu . Otherwise it is obtained by counting 'processor'
lines in /proc/cpuinfo.
5. Only Machine Features
------------------------
The mjf-onlymf RPM only installs the common scripts to create
......@@ -100,10 +127,10 @@ If the value cannot otherwise by found, it is obtained by counting 'processor'
lines in /proc/cpuinfo.
$JOBFEATURES is neither defined nor the files created. The mjf-onlymf RPM
should only be used on systems other than Torque/PBS or HTCondor so at
least $MACHINFEATURES is available.
should only be used on systems other than Torque/PBS, HTCondor, or Grid
Engine so at least $MACHINFEATURES is available.
5. DIRAC Benchmark (DB12)
6. DIRAC Benchmark (DB12)
-------------------------
Support for the DIRAC fast benchmark (DB12) is also included, which is
......@@ -132,4 +159,3 @@ started after db12 has run.
/etc/db12/total_cpu should match $MACHINEFEATURES/total_cpu so that the number of
DB12 instances run matches the number of processors available to be allocated to
jobs.
......@@ -181,19 +181,12 @@ for key in ['cpu_limit_secs', 'max_rss_bytes',
except:
pass
#
#
# Get jobfeatures['wall_limit_secs'] here, somehow?
#
#
if not 'cpu_limit_secs' in jobfeatures and 'wall_limit_secs' in jobfeatures:
# If not given in mjf files, we create a CPU seconds limit from wallclock
# and allocated CPUs/processors
jobfeatures['cpu_limit_secs'] = jobfeatures['wall_limit_secs'] * jobfeatures['allocated_cpu']
# Write out if these have been set from files or prologue.user arguments
# Write out if these have been set
for key in ['allocated_cpu', 'wall_limit_secs', 'cpu_limit_secs',
'max_rss_bytes', 'max_swap_bytes', 'scratch_limit_bytes']:
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment