|
|
|
|
Facilities Quick Navigation:
|
CCMST computer resources are for authorized research and educational purposes only. Use of CCMST computer facilities is subject to campus computer and network usage, data access, and World Wide Web policies developed by OIT. Any use in violation of this policy may lead to termination of your CCMST computer accounts.
All published work that includes calculations performed on CCMST computers should include the following statement in their acknowledgements: "Computations were supported by the Center for Computational Molecular Science and Technology at the Georgia Institute of Technology and partially funded through a Shared University Research (SUR) grant from IBM and the Georgia Institute of Technology." A paper or electronic copy of the work should also be provided to the CCMST to help us satisfy our reporting requirements.
|
Computing needs of CCMST members and external users are served by the CCMST IBM RS/6000 SP, referred to as the CCMST SP for brevity. The CCMST SP consists of 18 "WinterHawk II" nodes connected by an SP switch which provides high-bandwidth communication between the nodes. Each node has four 375-MHz POWER3-II processors, for a total of 72 processors. The machine has a total of 47 GB RAM and 756 GB temporary disk storage. The control workstation, a model IBM RS/6000 F50, hosts an additional 108 GB of long-term disk storage.
|
| Software Title | Version | Description |
| IDL | 5.4 | Powerful data analysis and visualization package |
| Jaguar | 4.1.059 | Very robust ab initio quantum chemistry package for treating large systems |
| Molcas | 5.0 | Ab initio quantum chemistry package useful for treating molecular systems with degeneracies |
| MPQC | 2.1.3 | Next generation massively parallel ab initio quantum chemistry package |
| NAMD | 2.5b1 | Parallel classical molecular dynamics package |
| Psi | 3.2 | Ab initio quantum chemistry package under active development by a number of groups, including CCMST |
| Q-Chem | 2.0 | Robust production level ab initio quantum chemistry package |
For more specific information on installed software refer to Software section of our website.
|
All access to the SP occurs through control workstation (CW) cgate.chemistry.gatech.edu only. The CW is the central server for the system. It maintains user databases, hosts user's home directories and software installations, and distributes user's jobs among the nodes of the SP. To access the CW you must have a valid account which may be obtained by filling out an account application.
After you have established an account, the next step is to login to to the control workstation. A remote login to cgate.chemistry.gatech.edu is possible with an SSH-enabled client only (see below for more information on SSH). When you log into the system for the first time you should become familiar with the directory structure of the control workstation. Most user home directories exist under directory /cgate/common, i.e. user yourusername's home directory would be in /cgate/common/yourusername. That's your permanent storage area, i.e. files in that area will never be removed. However, each user is allowed to store only up to 100 Megabytes of data in their home directory. Users may temporarily store larger files in /cgate/scratch/yourusername. There is no quota on the amount one user can store in that area, but old files will be cleaned up automatically.
Directory structure on the nodes of the SP, where actual jobs are run, is almost identical. /cgate/common/yourusername and /cgate/scratch/yourusername are accessible from every node. In addition, each node has its own additional high-bandwidth storage for temporary files. To access that area, your jobs can read and write to /scratch/yourusername. The amount of available space in that directory varies from node to node but is at least 15 GBytes.
A possible strategy for using the aforementioned storage areas might involve using node's /scratch/yourusername to store heavily accessed and/or large temporary files, /cgate/scratch/yourusername to store some temporary or data files which you want to transfer later to your local machine for further analysis, and /cgate/common/yourusername to store permanent application data, job input and output files, etc.
To transfer files between your machine and the SP control workstation you have two choices:
There are a number of choices available for editing text files. The most common UNIX editor is vi. If you are completely unfamiliar with vi, we recommend you read this online vi tutorial. For more experienced users, we may recommend this vi reference manual. Another popular choice for text file editing is emacs. It is rather more powerful than vi. If you want to learn how to use emacs refer to this tutorial.
|
Some useful utilities and software packages produce visual data or require graphical
user interface for their normal execution. If you have a Unix workstation with a graphical
window manager running, then it is very easy to use such programs. When you log into
the control workstation using a secure shell client, you will be able to execute graphical
commands and see their output on your screen.
The situation is more difficult for Windows and MacOS users. You might have to install
special X server software to be able to utilize graphical programs on cgate.
We provide the last free version of MicroImages MI/X server for windows,
TNTlite MIXServer v5.6.
MicroImages also makes the MI/X server for MacOS, which is made freely available
here (both PowerPC
and 68k series versions are available). If you want a full featured X Windows Server,
they can be purchased. Here is a listing of available commercial X Windows Servers:
|
If there is a software package that you need for your work, you may have it installed for you, or you may get a permission to install it yourself. Only licensed software products or freeware obtained from established sources may be installed on the CCMST SP. You may obtain a permission to install the software package or request assistance with installing the software by contacting CCMST system administrator.
The CCMST SP has also a rich set of development tools to enable users to develop their own computer codes. Those include IBM C for AIX compiler version 5, IBM Visual Age C++ compiler version 5, IBM XL Fortran compiler version 7.1, Perl 5.5, GNU make version 3.79.1, GNU autoconf version 2.13, GNU bison version 1.28, GNU flex version 2.5.4, IBM Engineering and Scientific Subroutines Library (ESSL) version 3.2, IBM Parallel ESSL version 2.2, IBM Parallel Operating Environment 3.1 with Message Passing Library (MPI), libpthreads library, and more. Documentation for the IBM development tools can be found on control workstation by logging into cgate, making sure an X server on your machine is running (see the section on how to access the machine), and executing command netscape http://control/ Alternatively, one may visit IBM AIX documentation library. Manual pages for GNU tools can be read on cgate via the man command. Additional information on software development tools may be found in Software section.
|
Introduction
All production jobs can only be executed on the SP via an IBM queueing software called
LoadLeveler (current version 2.2). This program keeps track of all the jobs that have
been submitted, prioritizes them, and distributes them among the SP nodes when
requested resources (memory ,disk, processors) become available. It can handle both serial (single-processor)
and parallel jobs.
Here we give a brief tutorial on how to start using LoadLeveler. Refer to the user guide to LoadLeveler for more information.
How to submit a job to LoadLeveler
There are two ways to submit a job to LoadLeveler. You may prepare a command file using your favorite
text editor, and feed it LoadLeveler with the llsubmit command. Or, if you have an X server running on your machine,
you may use xloadl - an easy-to-use graphical user interface to LoadLeveler. Even if you plan to use xloadl
exclusively, we strongly recommend you learn about the syntax of LoadLeveler command files first.
|
Each job needs a LoadLeveler command file; we will assume for the sake of
convention that these are named with a .cmd suffix, although this is
not required. A typical LoadLeveler command file for running an executable my.exe
in a single-processor mode might look as follows:
#!/bin/csh
# @ job_type = serial
# @ initialdir = /cgate/common/username/chem
# @ notify_user = your@email.address
# @ account_no = useraccount
# @ input = /dev/null
# @ output = $(jobid).stdout
# @ error = $(jobid).err
# @ class = cpu
# @ notification = complete
# @ checkpoint = no
# @ restart = no
# @ requirements = (Arch == "power3") && (OpSys == "AIX43")
# @ queue
my.exe file.in file.out
This sample job runs the command my.exe file.in file.out
in the directory /cgate/common/username/chem (you
need to ensure that the command my.exe is in your
path). After
completion of the job, an email message is sent to your@email.address.
This job is charged to account useraccount.
As configured, this job will read standard input from /dev/null
(which means it is a batch job and does not require any user input), and
send standard errors and standard output to
$(jobid).err and $(jobid).stdout,
respectively, where $(jobid) is the id number assigned to the job
by LoadLeveler when the job is submitted.
Alternatively, the user may wish to rename these files
according to the job being run (for example, file.stdout and
file.err). These files are usually only of interest if the job
crashes, since they might contain debugging information (although some
programs will send their usual output to stdout). Typically, these
files should be cleaned up after the results of the job have been checked.
Note the class = cpu setting. LoadLeveler is configured with several different classes (think of them as queues), and this is where the class is selected. A complete list of valid classes is listed below.
| Class name | CPU Time Limit | Memory Limit | Comments |
| interactive | 30 minutes | 1 Gigabyte | For very short jobs |
| quick | 24 hours | 1 Gigabyte | For medium length CPU-intensive jobs |
| cpu | 14 days | 1 Gigabyte | Most jobs fall into this category |
| long | 90 days | 2 Gigabyte | Only 2 jobs of this class can run at the same time. For super-long jobs only. |
| io | 14 days | 1 Gigabyte | Should be used for I/O-intensive medium to long jobs only |
| special | 365 days | 1 Gigabyte | By default, noone is allowed to use this class. If you need to run jobs of this length, please, contact CCMST system administrator. Only 1 such job can be run at any time. |
The CPU time limit is the limit on the total CPU time of the job. For parallel jobs, it means the sum of tasks' CPU times. Hence, while 90 days is a lot of CPU time, if you use 12 tasks, they will consume 90 days of CPU time in 7.5 days, assuming 100% execution efficiency. Hence, plan carefully to which class your jobs should belong.
Class cpu is intended for longer computations (up to 14 days, requiring up to 1GB RAM). Other classes include quick (up to 24 hours, up to 1GB RAM) and interactive (up to 30 minutes, up to 1GB RAM). Any job going beyond the specified limits is killed automatically.
To gain more confidence with LoadLeveler, let us submit a simple job. Unix provides a command hostname, which prints the canonical name of the machine on which it is run to stdout. Let's set up a job which will execute the hostname command on one of the nodes. The command file will look like this:
#!/bin/csh # @ job_type = serial # @ account_no = useraccount # @ initialdir = /cgate/common/username # @ notify_user = username@control # @ input = /dev/null # @ output = hostname.stdout # @ error = hostname.err # @ class = interactive # @ notification = complete # @ checkpoint = no # @ restart = no # @ requirements = (Arch == "power3") && (OpSys == "AIX43") # @ queue /bin/hostname
Just cut and paste this into some file (say, hostname.cmd), replace useraccount with the account name you were assigned when you registered with CCMST, and username with your actual user name. That simple. Note that we specified class = interactive, since this job will take no time at all to run.
Once the .cmd file is prepared, it's a simple matter to submit it to LoadLeveler. On cgate, simply type
llsubmit hostname.cmd
where the LoadLeveler command file is named hostname.cmd. This
will verify job's username, account name, and class, and feed the job to LoadLeveler,
which will place it in the appropriate queue.
Perhaps the easiest and most practical way to create LoadLeveler command files is to use one of the generic templates provided in /home/loadl/templates. There you will find templates for serial and parallel jobs. Then use one of them to create a command file template specific for your purposes, and just use that template in your work.
Once a job or jobs have been submitted, their status may be monitored by the llq command, which lists all jobs in the queues. Since this is a fairly long list, a particular user's jobs can be querried using llq -u username.
Sometimes you may want to cancel a job that has been submitted, or even started running. To cancel a job jobid, issue llcancel jobid. To cancel all of your jobs, use llcancel -u username.
|
Many computations, especially highly-accurate quantum computations, require a lot of memory. In order to avoid compute nodes running out of memory one should specify in command file the memory requirements of the job if they exceed 256 MB. It is done via the ConsumableMemory resource:
# @ resources = ConsumableMemory(1024)In the example shown LoadLeveler will reserve 1024 MB of memory for this job. For parallel jobs the memory requirements are specified on a per-task basis (see here).
|
To run in parallel on the SP the software (obtained from outside source or your own) has to be written and compiled in a special way. Depending on which parallel model your software uses you will run it differently:
# @ resources = ConsumableCpus(N)where N is the number of processors. Usually the number of processors should be set to the number of compute threads.
/usr/bin/poe /your/mp/program/name/usr/bin/poe can be only invoked from within a parallel LoadLeveler job (when job_type = parallel in command file). LoadLeveler will handle the allocation of necessary number of tasks, setting up the environment, etc. automatically (see the section on parallel jobs in LoadLeveler user's guide).
It is imperative that you specify memory requirements for the parallel program. The amount of ConsumableMemory resource (see here) should specify how much memory each MPI task will require.
|
There is also a graphical front-end to LoadLeveler submission and querying, called xloadl. If you are logging in from a machine running the X-Window system, you may invoke the graphical program by typing xloadl &.
The main xloadl window is split into 3 parts. The top pane is called "Jobs", and shows each job submitted to LoadLeveler (the content is identical to the output of llq command). The middle pane is called "Machines", and shows each machine that LoadLeveler is aware of. The bottom pane is called "Messages", and shows responses of LoadLeveler to various actions that you tell it to perform. Each pane, except "Messages", has its own pull-down menu.
The "Jobs" pane is most commonly used by regular users. It allows to build and submit jobs, check their status, cancel unnecessary jobs, etc. As we go through the most common job-related functions of xloadl in the next few paragraphs, we will imply using the "Jobs" pane.
Let's use xloadl to build a job identical to the one we submitted in the previous subsection. To build a job, pull down the File menu, click Build a Job..., and select the type of job you want to submit. Valid choices here are serial (single-processor job) and parallel (multiprocessor job using Message Passing Interface(MPI) library). We will assume a single-processor job for now, thus choose serial here. This will create another window called "Build a Job". It contains a web-like submission form with a number of entries. Luckily, most of those entries do not need to be filled. The most important entries are
Other important entries, that you may need with more complex jobs, can be accessed if you press "Requirements" and "Limits" buttons located on a pane in the lower right corner. The most commonly used requirements are for memory and disk, specified in megabytes. The most useful limit is the wall clock limit, specified in seconds, it can be used to limit the job length to below what default class value is. This may allow the job to be picked for execution quicker, therefore it makes sense to set this limit if there is a way for you to estimate the length of the job. For this simple job you do not need to specify any additional requirements or limits.
To submit the job - press "Submit" button on the bottom of the "Build a Job" window. In the main LoadLeveler window
you should see a message saying that the job has been submitted and what job ID it was assigned, or describing a reason why the job could not be submitted.
If a job is submitted - you will be able to find it in the "Jobs" pane. Then you can further monitor the state of the job by using
commands in the "Actions" menu.
You also have an option to construct a LoadLeveler command file from the information
input into the "Build a Job" window by pressing "Save" button. This is a useful feature
that you may use to construct command file templates as you become more experienced with
LoadLeveler.
|
Each user account is allowed to use one or more CCMST CPU accounts which are used to keep track of CPU usage. Each account has a certain quota. Once the quota is surpassed, the account cannot be used until more hours are added to the account. To check the status of a CCMST CPU account one may use command llacctinfo. Given CPU account name it will return account's quota and the number of CPU hours that has been charged to this account.
|
It is a Georgia Tech policy that all remote logins to campus servers are permitted only using secure shell clients. Secure shell protocol uses sophisticated encryption schemes to provide secure communication between machines. If you do not have a secure shell client installed on you machine, you may refer to the OIT page on ssh, from there you may obtain licensed copies of a Windows ssh client and find links to ssh implementations for MacOS, Linux, and other platforms. Solaris users may find OpenSSH binaries on Sunfreeware.com. We should also mention the FreeSSH website, which contains links to freely available implementations of ssh clients for a multitude of platforms.
|
For users of the dgate cluster, we are using the PBS batch queue system. The basic commands are summarized below:
The structure of the command file is a little funny. PBS can read lines that start with #PBS. These are not commented out. A double comment character, ## will comment something out.... An example job control file is in ~evaleev/PBS/test.cmd. The minimal information is in the sample below:
# Sample PBS script # How much memory to reserve #PBS -l mem=200mb # How much (CPU) time to request (determines queue). The below # requests 3 hours, 10 minutes #PBS -l cput=3:10:00 # Uncomment (i.e., delete one of the comment chars) this for sherrill # group I/O heavy jobs ##PBS -q sio # How many nodes and processors you want. For Sherrill group, we had # to hack the system and pretend that each node, which has 2 processors, # has 5 "virtual processors". Queue sio (above) defaults to ppn=3, # which will prevent another I/O job from running on the same queue. # For Sherrill group, double the number of processors you really want, # and add one if you need I/O. Note: ppn will default to 3 if you uncomment # the PBS -q sio line above. If you use the following line, you will # override that ppn. So, usually use the line above for sio, or the # one below, but not both. ##PBS -l nodes=1:ppn=2 # This tells PBS what directory to go to cd $PBS_O_WORKDIR # Then just put the command below. For example, runqchem h2o.in h2o.out