|
Submitting Jobs
The "Submit Job" screen allows you to start an SnB
job on the local computer or submit it to a batch processing system, such
as PBS or LoadLeveler, if one is available. It also supports submission
to Condor, a system that scavenges unused computing time on a network
of workstations (for more information on Condor, see http://www.cs.wisc.edu/condor/ ). These
options provide you with convenient ways to take maximum advantage of
the inherently parallel nature of the Shake-and-Bake algorithm
by dividing the trial structures among as many processors as possible.
Thus, jobs can be run in several parts with each subjob creating its own
set of output files. The results, however, are combined for inspection
using the tools provided by the Evaluate Trials screen.
We (the SnB developers) have a limited number of platforms available
for development and testing. Your system configuration may differ from
ours, and the batch submission options may not work as expected. In that
case, please contact us at snbhelp@hwi.buffalo.edu so that we
can work with you to support your configuration.
There are three sections on this screen: Required Information,
Local Options, and Batch Options. The required information
must be supplied. Whether or not the other sections need to be completed
depends on the choices you make in the required information section.
- Required Information
- Queueing System: Select the queueing system you would
like to use.
None (local machine) runs the job on the machine where the
GUI is running. If you are using X-Windows, note that this is not
necessarily the same as the machine where the GUI is being displayed.
PBS will submit the job to a PBS queue. The 'qsub' program
must be installed and configured on your local machine, even if
PBS is actually submitting jobs to a remote machine.
Loadleveler submits a job to a LoadLeveler queue on an IBM
SP system.
Condor allows submission to a Condor flock.
- Don't Run SnB: Clicking "Yes" generates
the dat files required to run SnB without actually starting
the job. This is useful if you want to run SnB via a batch
queueing system that is not supported directly by SnB. Given
the dat files, you can write a script that will submit the job to
the batch queueing system that you are using at your site.
- File name prefix for results: All
files that are generated by the SnB run will start with the
prefix entered here. Appended to this prefix will be an underscore
and a number ranging from zero to one less than the number of SnB
processes you request (see the next variable). Do NOT use an underscore
in the prefix name itself (hyphens are OK).
- Number of SnB processes to run: If the local
run method is selected, the GUI will initiate this many processes
on the local machine. If you select one of the batch methods (PBS,
LoadLeveler, Condor), this variable indicates the
number of nodes to be requested from the batch queueing system.
- Local Options
- Priority: Used to choose the "nice" value at
the time of job submission. If you are sharing a machine and wish
to run a background job, choose "low" priority.
- Process jobs: When you have finished
filling in all the required fields, click this button to begin processing
the job.
- Batch Options
- Queue: Select the queue for PBS and LoadLeveler jobs.
Condor does not support different queues.
- Copy input files to remote machine(s): Select "yes"
if you want to copy all input files to the machine where the job
will be run. When SnB is finished, it will copy the output
files back to the working directory on the local machine. Copying
the files does not really improve overall performance since the
only significant amount of I/O occurs at the start of the job. However,
it is recommended that you transfer input files to remote cluster
machines since these machines typically have low disk and network
I/O performance. Thus, their network and disk subsystems could become
overloaded when starting a job.
- Remote directory: The directory for staging files. You
need to supply this information only if you selected "yes"
for "copy input files to remote machine." If your batch
environment provides a temporary directory name in an environment
variable, you can enter that here.
- Queue type: Your choices are serial, parallel (shared
memory), and parallel (cluster). For example, suppose you entered
"8" for the number of SnB processes to run (in
the required information section). Choosing serial would
cause eight single-processor jobs to be submitted to the queue that
you selected. Both parallel selections will submit a single eight-processor
job. The difference between the two is that the parallel shared
memory option will use cp to stage files whereas the
parallel cluster option uses rcp (a shared file system
is not assumed). When running LoadLeveler jobs, you are not prompted
for this item.
Shared memory machines include the SGI Origin2000, Sun Enterprise
10000, and any other machine that has multiple processors in the
same physical unit. On these machines you should select parallel
shared memory as the queue type.
Cluster machines include the IBM SP and Beowulf-style clusters.
Clusters consist of two or more distinct computers that are coupled
together via software. For these machines you should select parallel
cluster as the queue type.
Serial can be chosen for either shared memory or cluster
computers. Whether you choose serial or one of the parallel options
is a matter of preference. One serial job will start up when a single
processor is free. On the other hand, a parallel job that requires
n processors will have to wait till n processors are
free. Your computing site will also have limits on how many jobs
you can have running as well as how many processors you can allocate
for a parallel job. These limits will also influence which option
you should choose. If you are unsure, you should contact the administrator
of the machine you are using.
- Tasks per node (LoadLeveler only): The number of tasks
to start on each SP node. If you are utilizing SMP nodes, you can
set this number to the number of processors in each node. Then,
the total number of processors that your job will use is equal to
(tasks per node)*(number of nodes).
- Number of nodes (LoadLeveler only): The number of nodes
to allocate for the job.
- Process jobs: When you have finished filling in all the
required fields, click this button to submit your job to the batch
system that you have selected.
|