Product SiteDocumentation Site

Red Hat Enterprise MRG 2

Grid Developer Guide

Developer-focused information for the Grid component of Red Hat Enterprise MRG

Edition 1

Alison Young

Red Hat Engineering Content Services

Legal Notice

Copyright © 2011 Red Hat, Inc.
The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version.
Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.
Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the Infinity Logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries.
Linux® is the registered trademark of Linus Torvalds in the United States and other countries.
Java® is a registered trademark of Oracle and/or its affiliates.
XFS® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.
MySQL® is a registered trademark of MySQL AB in the United States, the European Union and other countries.
All other trademarks are the property of their respective owners.


1801 Varsity Drive
 RaleighNC 27606-2072 USA
 Phone: +1 919 754 3700
 Phone: 888 733 4281
 Fax: +1 919 754 3701

Abstract
This book contains information for developers using MRG Grid.

Preface
1. Document Conventions
1.1. Typographic Conventions
1.2. Pull-quote Conventions
1.3. Notes and Warnings
2. Getting Help and Giving Feedback
2.1. Do You Need Help?
2.2. We Need Feedback!
1. Overview
2. API Types
2.1. SOAP and WSDL
2.2. Aviary Model
3. Aviary Installation and Configuration
3.1. Installation
3.2. Configuration
4. Aviary Core Types
4.1. JobId
4.2. SubmissionId
4.3. Attribute
4.4. JobStatus
4.5. ResourceConstraint
5. Job Submission and Management
6. Job Data Queries
7. Security
8. Client Examples
8.1. SOAP XML
8.2. Ruby
8.3. Python
9. More Information
A. Revision History

Preface

1. Document Conventions

This manual uses several conventions to highlight certain words and phrases and draw attention to specific pieces of information.
In PDF and paper editions, this manual uses typefaces drawn from the Liberation Fonts set. The Liberation Fonts set is also used in HTML editions if the set is installed on your system. If not, alternative but equivalent typefaces are displayed. Note: Red Hat Enterprise Linux 5 and later includes the Liberation Fonts set by default.

1.1. Typographic Conventions

Four typographic conventions are used to call attention to specific words and phrases. These conventions, and the circumstances they apply to, are as follows.
Mono-spaced Bold
Used to highlight system input, including shell commands, file names and paths. Also used to highlight keycaps and key combinations. For example:
To see the contents of the file my_next_bestselling_novel in your current working directory, enter the cat my_next_bestselling_novel command at the shell prompt and press Enter to execute the command.
The above includes a file name, a shell command and a keycap, all presented in mono-spaced bold and all distinguishable thanks to context.
Key combinations can be distinguished from keycaps by the hyphen connecting each part of a key combination. For example:
Press Enter to execute the command.
Press Ctrl+Alt+F2 to switch to the first virtual terminal. Press Ctrl+Alt+F1 to return to your X-Windows session.
The first paragraph highlights the particular keycap to press. The second highlights two key combinations (each a set of three keycaps with each set pressed simultaneously).
If source code is discussed, class names, methods, functions, variable names and returned values mentioned within a paragraph will be presented as above, in mono-spaced bold. For example:
File-related classes include filesystem for file systems, file for files, and dir for directories. Each class has its own associated set of permissions.
Proportional Bold
This denotes words or phrases encountered on a system, including application names; dialog box text; labeled buttons; check-box and radio button labels; menu titles and sub-menu titles. For example:
Choose SystemPreferencesMouse from the main menu bar to launch Mouse Preferences. In the Buttons tab, click the Left-handed mouse check box and click Close to switch the primary mouse button from the left to the right (making the mouse suitable for use in the left hand).
To insert a special character into a gedit file, choose ApplicationsAccessoriesCharacter Map from the main menu bar. Next, choose SearchFind… from the Character Map menu bar, type the name of the character in the Search field and click Next. The character you sought will be highlighted in the Character Table. Double-click this highlighted character to place it in the Text to copy field and then click the Copy button. Now switch back to your document and choose EditPaste from the gedit menu bar.
The above text includes application names; system-wide menu names and items; application-specific menu names; and buttons and text found within a GUI interface, all presented in proportional bold and all distinguishable by context.
Mono-spaced Bold Italic or Proportional Bold Italic
Whether mono-spaced bold or proportional bold, the addition of italics indicates replaceable or variable text. Italics denotes text you do not input literally or displayed text that changes depending on circumstance. For example:
To connect to a remote machine using ssh, type ssh username@domain.name at a shell prompt. If the remote machine is example.com and your username on that machine is john, type ssh john@example.com.
The mount -o remount file-system command remounts the named file system. For example, to remount the /home file system, the command is mount -o remount /home.
To see the version of a currently installed package, use the rpm -q package command. It will return a result as follows: package-version-release.
Note the words in bold italics above — username, domain.name, file-system, package, version and release. Each word is a placeholder, either for text you enter when issuing a command or for text displayed by the system.
Aside from standard usage for presenting the title of a work, italics denotes the first use of a new and important term. For example:
Publican is a DocBook publishing system.

1.2. Pull-quote Conventions

Terminal output and source code listings are set off visually from the surrounding text.
Output sent to a terminal is set in mono-spaced roman and presented thus:
books        Desktop   documentation  drafts  mss    photos   stuff  svn
books_tests  Desktop1  downloads      images  notes  scripts  svgs
Source-code listings are also set in mono-spaced roman but add syntax highlighting as follows:
package org.jboss.book.jca.ex1;

import javax.naming.InitialContext;

public class ExClient
{
   public static void main(String args[]) 
       throws Exception
   {
      InitialContext iniCtx = new InitialContext();
      Object         ref    = iniCtx.lookup("EchoBean");
      EchoHome       home   = (EchoHome) ref;
      Echo           echo   = home.create();

      System.out.println("Created Echo");

      System.out.println("Echo.echo('Hello') = " + echo.echo("Hello"));
   }
}

1.3. Notes and Warnings

Finally, we use three visual styles to draw attention to information that might otherwise be overlooked.

Note

Notes are tips, shortcuts or alternative approaches to the task at hand. Ignoring a note should have no negative consequences, but you might miss out on a trick that makes your life easier.

Important

Important boxes detail things that are easily missed: configuration changes that only apply to the current session, or services that need restarting before an update will apply. Ignoring a box labeled 'Important' will not cause data loss but may cause irritation and frustration.

Warning

Warnings should not be ignored. Ignoring warnings will most likely cause data loss.

2. Getting Help and Giving Feedback

2.1. Do You Need Help?

If you experience difficulty with a procedure described in this documentation, visit the Red Hat Customer Portal at http://access.redhat.com. Through the customer portal, you can:
  • search or browse through a knowledgebase of technical support articles about Red Hat products.
  • submit a support case to Red Hat Global Support Services (GSS).
  • access other product documentation.
Red Hat also hosts a large number of electronic mailing lists for discussion of Red Hat software and technology. You can find a list of publicly available mailing lists at https://www.redhat.com/mailman/listinfo. Click on the name of any mailing list to subscribe to that list or to access the list archives.

2.2. We Need Feedback!

If you find a typographical error in this manual, or if you have thought of a way to make this manual better, we would love to hear from you! Please submit a report in Bugzilla: http://bugzilla.redhat.com/ against the product Red Hat Enterprise MRG.
When submitting a bug report, be sure to mention the manual's identifier: Grid_Developer_Guide
If you have a suggestion for improving the documentation, try to be as specific as possible when describing it. If you have found an error, please include the section number and some of the surrounding text so we can find it easily.

Chapter 1. Overview

MRG Grid provides a web service interface for job submission, management and queries called Aviary. This interface is designed to remove some of MRG Grid's complexity and provide access using universal network such as HTTP. Aviary uses SOAP for request and response exchanges between MRG Grid, Aviary-enabled components, and web service clients. Web service clients can be developed using Java, Python, Ruby, or other languages.
Aviary is targeted at developers wanting to start using MRG Grid quickly. Developers wanting to use Aviary can do so without the depth of knowledge associated with MRG Grid's High Throughput Computing capabilities.

Chapter 2. API Types

2.1. SOAP and WSDL

The API types are described using the SOAP XML schema and operations use the Web Services Description Language (WSDL). This schema-based approach allows developers to generate the code for types and operations into their preferred native programming language. Code is generated using a client web service toolkit. Some popular web service toolkits are:
  • Apache Axis or CXF to generate Java
  • Suds to generate Python
  • Savon to generate Ruby

2.2. Aviary Model

Entities of this API include job, submission and attribute.
A job is the basic unit of work and has a minimum set of attributes. These attributes include the full path of the command to be executed, command arguments, job owner, and requirements that provide information to MRG Grid. The requirements list enables matching with a resource that can execute the job.
A submission is an association of jobs under a common name key, such as my_submission_for_today. Aviary can generate a submission name if one is not given.
An attribute describes aspects of a job. Some attributes can be set when the job is submitted or edited later when the job is still actively being processed in the MRG Grid job queue. MRG Grid will specify many job attributes after a submission but you can also provide custom attributes if they are meaningful to the execution of the application represented by a job.

Chapter 3. Aviary Installation and Configuration

3.1. Installation

RPM
Install the condor-aviary package for your platform. This will install the required software components, including the WSDL and schema files for Aviary. These files can be used to develop a remote web service client.

Note

Currently, due to a limitation in the underlying web service stack (Axis2/C), it is not possible to dynamically retrieve the WSDL and imported XSD over HTTP using the ?wsdl URL syntax.
Source
Aviary can be included in a MRG Grid source build using the following variables when cmake is invoked:
-DWANT_CONTRIB:BOOL=TRUE -DWITH_AVIARY:BOOL=TRUE

3.2. Configuration

To enable Aviary use Remote Configuration to apply the following two features to your MRG Grid pool:
  • AviaryScheduler - configuration to activate a component that provides the Aviary job submission and management capabilities
  • QueryServer - configuration to activate a component that provides the Aviary job query capabilities
Refer to the Remote Configuration chapter in the MRG Grid User Guide for information on applying features to pools.

Chapter 4. Aviary Core Types

The XML schema defines core types that are meaningful to understanding how Aviary operations are to be invoked and how results are to be interpreted.

4.1. JobId

A JobId is a unit of information that fully describes the identity of a job. It contains the following parameters:
  • job - the local identifier for a job assigned to a specific scheduler. It is a string that encodes two positive floating point numbers such as 1.0, 84.3, 2004.68. The first is a reference to a local job grouping that may have multiple parts with attributes in common that are counted by the second number. A typical example is a group of jobs that share the same command but pass different arguments to the command, each job then writes its outputs to a different file.
  • scheduler - a string that identifies which scheduler the job was submitted to.
  • pool - a string that identifies a MRG Grid deployment. A deployment is an arena of schedulers, job execution resources and components that match jobs to those resources.

4.2. SubmissionId

A SubmissionId is a unit of information that describes a submission in the following two parts:
  • name - a string provided by the user or generated on behalf of the user at the time of submission. Submission names may be considered a way to associate and aggregate jobs in such a way that is meaningful to the developer. An example of a meaningful name may be my_submission_04302011. As submissions are open-ended, a user can continue to add individual jobs to this aggregating name over time. This remains true though the individual jobs may have been scheduled and executed at different times by MRG Grid. For example, the jobs 1.0, 28.0 and 2011.0 could all be part of the submission named my_submission_04302011.
  • owner - a string containing the name of the original submitter.

4.3. Attribute

An attribute is type-coded information used by MRG Grid to evaluate, organize and execute job matching and processing. MRG Grid jobs have multiple attributes, some are user-specified before submission and many that are attached to a job by the MRG Grid infrastructure when added to the job queue. A MRG Grid job is the sum of its attributes. An attribute consists of:
  • name - a string denoting the attribute name. Names can be pre-defined and understood by the MRG Grid infrastructure or a custom attribute name.
  • type - an enumerated string with values string, integer, float, expression or boolean.
  • value - the string form of the value.

4.4. JobStatus

A JobStatus exists in one of the following states:
  • idle - the job is in a state where it is not ready or able to be assigned to a resource.
  • running - the job is assigned to and running on a resource.
  • held - the job exists in the MRG Grid queue but is held back from execution.
  • completed - the job ran to completion.
  • removed - the job was deleted from the job queue by a user.

4.5. ResourceConstraint

A ResourceConstraint is a basic quality that MRG Grid should consider when matching a new job to a resource. There are five basic constraints defined in Aviary which are:
  • OS - Linux or Windows.
  • ARCH - for 32-bit platforms, INTEL; or X86_64 for 64-bit platforms. This is important when the executable needed by the job is compiled for a particular architecture.
  • MEMORY - the expected total RAM required to execute the job.
  • DISK - expected total disk space to execute the job.
  • FILESYSTEM - the domain name representing a uniformly mounted network file system, as configured by a MRG Grid administrator.

Chapter 5. Job Submission and Management

Job Management is used for job control and reporting. Methods in job management include job submission, hold, release and removal.
Table 5.1. Job Submission and Management Operations
Operation Inputs Outputs Notes
submitJob
Job submission request fields are:
  • cmd - a string containing the absolute path to an executable or script
  • args - an optional string containing arguments for the cmd
  • owner - a string identifying the submitter
  • iwd - the initial working directory where the job will be executed
  • submission_name - an optional string identifying the submission that should be created or that this job is to be attached to
  • requirements - an optional list of ResourceConstraints that specify what type of resource this job should be targeted at
  • extra - option list of Attributes that refine the request beyond basic fields or supersede the MRG Grid Attributes implied by other basic fields in this request
'OK' and the JobId or an error containing diagnostic text if a problem was encountered.
MRG Grid users familiar with crafting specific attributes such as complex requirements may do so using the extra attribute field in conjunction with the allowOverrides XML attribute in the request.
holdJob
A single JobId and a hold reason in string format.
'OK' or an error with text if the job is not found or parsed.
A hold is a temporary interruption of job execution against a resource; holds can be used to affect job attribute edits without needing to resubmit the job.
releaseJob
A single JobId and a release reason in string format.
'OK' or an error with text if the job is not found or parsed
Releasing a job is moving it out of the held state and back where it is ready to be schedule again with a resource.
removeJob
A single JobId and a remove reason in string format.
'OK' or an error with text if the job is not found or parsed.
Job removal means that the job is prevented from executing to completion, note that its existence in the MRG Grid queue is still maintained on record.
setJobAttribute
A single JobId and a single Attribute.
'OK' or an error with text is the job is not found or parsed.
Attributes are predefined by MRG Grid or can be user-created, for example a name/type/value shorthand combination:
  • JobPrio/INTEGER/2 would be shorthand for the job attribute predefined by MRG Grid to control job priority, set to a value of 2 giving it higher priority than the default of 0
  • Recipe/STRING/secret sauce would be shorthand for a custom job attribute provided by a user, meaningful to only their application and irrelevant to the MRG Grid infrastructure

Chapter 6. Job Data Queries

Table 6.1. Job Queries for Data
Operation Inputs Outputs Notes
getJobStatus
Zero to many JobIds.
Returns the current status for each JobId input, or an error indicating that the job could not be parsed or found.
The most efficient query as it returns the least amount of data per job.
getJobSummary
Zero to many JobIds.
Returns a summary for each JobId input, or an error indicating that the job could not be parsed or found.
Summary returned includes:
  • command
  • command arguments
  • scheduler local time when job was added to job queue
  • scheduler local time of last update to job status
  • job status
  • reason why job was held, released or removed
getJobDetails
Zero to many JobIds.
Returns all Attributes for each JobId input, or an error indicating the job could not be parsed or found.
A potentially expensive operation, it is possible to request all the attributes for all the jobs tracked in MRG Grid. If performance is a concern consider judicious use of summaries for certain job sets.
getJobData
A single JobId, the type of data file content requested (ERR, OUT, LOG), the maximum number of bytes to be returned and whether the file should be read from the front to back.
Returns the file content requested if successful.
Each job can specify an error file (ERR), a log file (LOG) or an output file (OUT); the log file is used by MRG Grid to monitor the job progress.
getSubmissionSummary
Zero to many SubmissionIds.
For each valid submission returned, these job totals will be listed:
  • completed
  • held
  • idle
  • removed
  • running
Individual job summaries can be included in the response by setting the XML attribute includeJobSummaries to true in the request.

Chapter 7. Security

Aviary does not currently support HTTPS communication. A production deployment that requires transport-level security for authentication and encryption could place an SSL-capable reverse proxy server such as Squid in front of the Aviary web service endpoints. The reverse proxy server can be configured so that it forwards HTTPS traffic to the HTTP endpoints. Those endpoint hosts would then have their iptables configured appropriately to restrict remote access to the HTTP ports to the reverse proxy server.

Chapter 8. Client Examples

Aviary clients can be developed using a variety of programming languages. Below are code examples of client actions using SOAP XML, Ruby and Python.

8.1. SOAP XML

The following example shows the request and response SOAP XML for a job submission.
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:job="http://job.aviary.grid.redhat.com">
   <soapenv:Header/>
   <soapenv:Body>
      <job:SubmitJob allowOverrides="false">
         <cmd>/bin/sleep</cmd>
         <!--Optional:-->
         <args>40</args>
	 <owner>ownername</owner>
         <iwd>/tmp</iwd>
         <!--Optional:-->
	 <submission_name>my_submission</submission_name>
         <!--Zero or more repetitions:-->
         <requirements>
            <type>OS</type>
            <value>LINUX</value>
         </requirements>
         <!--Zero or more repetitions:-->
         <extra>
            <name>MYDATA</name>
            <type>STRING</type>
	    <value>the data</value>
         </extra>
      </job:SubmitJob>
   </soapenv:Body>
</soapenv:Envelope>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">
   <soapenv:Body>
      <n:SubmitJobResponse xmlns:n="http://job.aviary.grid.redhat.com">
         <id>
            <job>247.0</job>
            <pool>localhost</pool>
	    <scheduler>username@localhost.localdomain</scheduler>
            <submission>
               <name>my_submission</name>
	       <owner>username</owner>
            </submission>
         </id>
         <status>
            <code>OK</code>
            <text/>
         </status>
      </n:SubmitJobResponse>
   </soapenv:Body>
</soapenv:Envelope>

8.2. Ruby

The following example shows a Ruby Savon web service client that generates a basic submission.
# uses Savon http://savonrb.com/
require 'rubygems'
# httpi >= 0.9.2
require 'httpi'
# savon >= 0.9.1
require 'savon'
require "openssl"

client = Savon::Client.new do |wsdl|
  wsdl.document = "/var/lib/condor/aviary/services/job/aviary-job.wsdl"
  wsdl.endpoint = "http://localhost:9090/services/job/submitJob"
end

xml =  Builder::XmlMarkup.new
xml.cmd("/bin/sleep")
xml.args("40")
xml.owner("condor")
xml.iwd("/tmp")

response = client.request :job, "SubmitJob" do
    soap.namespaces["xmlns:job"] = "http://job.aviary.grid.redhat.com"
    soap.body = xml.target!
end

8.3. Python

The following example shows a Python Suds web service client that invokes a job query operation based on user input. This operation also takes an optional JobID argument.
# uses Suds - https://fedorahosted.org/suds/
import logging
from suds import *
from suds.client import Client
from sys import exit, argv, stdin
import time

# enable these to see the SOAP messages
#logging.basicConfig(level=logging.INFO)
#logging.getLogger('suds.client').setLevel(logging.DEBUG)

# change these for other default locations and ports
job_wsdl = 'file:/var/lib/condor/aviary/services/query/aviary-query.wsdl'

cmds = ['getJobStatus', 'getJobSummary', 'getJobDetails']

cmdarg = len(argv) > 1 and argv[1]
cproc =  len(argv) > 2 and argv[2]
job_url = len(argv) > 3 and argv[3] or "http://localhost:9091/services/query/"

if cmdarg not in cmds:
    print "error unknown command: ", cmdarg
    print "available commands are: ",cmds
    exit(1)

client = Client(job_wsdl);
job_url += cmdarg
client.set_options(location=job_url)

# enable to see service schema
#print client

# set up our JobID
if cproc:
    jobId = client.factory.create("ns0:JobID")
    jobId.job = cproc
else:
    # returns all jobs
    jobId = None

try:
    func = getattr(client.service, cmdarg, None)
    if callable(func):
        result = func(jobId)
except Exception, e:
    print "invocation failed: ", job_url
    print e
    exit(1)

print result

Chapter 9. More Information

Reporting Bugs
Follow these instructions to enter a bug report:
  1. You will need a Bugzilla account. You can create one at Create Bugzilla Account.
  2. Once you have a Bugzilla account, log in and click on Enter A New Bug Report.
  3. You will need to identify the product (Red Hat Enterprise MRG), the version (2.0), and whether the bug occurs in the software (component=grid) or in the documentation (component=Grid_Developer_Guide).
Further Reading
Red Hat Enterprise MRG and MRG Grid Product Information
MRG Grid User Guide and other Red Hat Enterprise MRG manuals
Condor Manual
Red Hat Knowledgebase

Revision History

Revision History
Revision 1-3Wed Sep 07 2011Alison Young
Prepared for publishing
Revision 1-2Tue Aug 23 2011Alison Young
BZ#731649 - Change getSubmissionSummaries to getSubmissionSummary
Revision 1-1Thu Jun 23 2011Alison Young
Prepared for publishing
Revision 1-0Thu Jun 23 2011Alison Young
Prepared for publishing
Revision 0.1-5Thu Jun 02 2011Alison Young
BZ#674385 - Minor updates
Revision 0.1-4Mon May 09 2011Alison Young
Minor XML updates
Revision 0.1-3Thu May 05 2011Alison Young
BZ#674385- Restructured book and inserted additional source content provided
Revision 0.1-2Wed Mar 30 2011Alison Young
Inserted Submit, Hold and Release method descriptions in Job Management
Revision 0.1-1Thu Mar 3 2011Alison Young
Added skeleton chapters and sections
Revision 0.1-0Thu Mar 3 2011Alison Young
Initial creation of book by publican