Cgate

From CCMSTWiki

Jump to: navigation, search

Contents

Policies for Use

CCMST computer resources are for authorized research and educational purposes only. Use of CCMST computer facilities is subject to campus computer and network usage, data access, and World Wide Web policies[1] developed by OIT[2]. Any use in violation of this policy may lead to termination of your CCMST computer accounts.

All published work that includes calculations performed on CCMST computers should include the following statement in their acknowledgements: "Computations were supported by the Center for Computational Molecular Science and Technology at the Georgia Institute of Technology and partially funded through a Shared University Research (SUR) grant from IBM and the Georgia Institute of Technology." A paper or electronic copy of the work should also be provided to the CCMST to help us satisfy our reporting requirements.

Computing Resources

Computing needs of CCMST members and external users are served by the CCMST IBM RS/6000 SP, [3] referred to as the CCMST SP for brevity. The CCMST SP consists of 18 "WinterHawk II" nodes connected by an SP switch which provides high-bandwidth communication between the nodes. Each node has four 375-MHz POWER3-II processors, for a total of 72 processors. The machine has a total of 47 GB RAM and 756 GB temporary disk storage. The control workstation, a model IBM RS/6000 F50, hosts an additional 108 GB of long-term disk storage.

Installed Application Software

Software Title Version Description
IDL[4] 5.4 Powerful data analysis and visualization package
Jaguar[5] 4.1.059 Very robust ab initio quantum chemistry package for treating large systems
Molcas[6] 5.0 Ab initio quantum chemistry package useful for treating molecular systems with degeneracies
MPQC[7] 2.1.3 Next generation massively parallel ab initio quantum chemistry package
NAMD[8] 2.5b1 Parallel classical molecular dynamics package
Psi[9] 3.2 Ab initio quantum chemistry package under active development by a number of groups, including CCMST
Q-Chem[10] 2.0 Robust production level ab initio quantum chemistry package

For more specific information on installed software refer to Software section of our website.

How to Access the CCMST IBM SP System

All access to the SP occurs through control workstation (CW) cgate.chemistry.gatech.edu only. The CW is the central server for the system. It maintains user databases, hosts user's home directories and software installations, and distributes user's jobs among the nodes of the SP. To access the CW you must have a valid account which may be obtained by filling out an account application.[11]

After you have established an account, the next step is to login to to the control workstation. A remote login to cgate.chemistry.gatech.edu is possible with an SSH-enabled client only (see below for more information on SSH). When you log into the system for the first time you should become familiar with the directory structure of the control workstation. Most user home directories exist under directory /cgate/common, i.e. user yourusername's home directory would be in /cgate/common/yourusername. That's your permanent storage area, i.e. files in that area will never be removed. However, each user is allowed to store only up to 100 Megabytes of data in their home directory. Users may temporarily store larger files in /cgate/scratch/yourusername. There is no quota on the amount one user can store in that area, but old files will be cleaned up automatically.

Directory structure on the nodes of the SP, where actual jobs are run, is almost identical. /cgate/common/yourusername and /cgate/scratch/yourusername are accessible from every node. In addition, each node has its own additional high-bandwidth storage for temporary files. To access that area, your jobs can read and write to /scratch/yourusername. The amount of available space in that directory varies from node to node but is at least 15 GBytes.

A possible strategy for using the aforementioned storage areas might involve using node's /scratch/yourusername to store heavily accessed and/or large temporary files, /cgate/scratch/yourusername to store some temporary or data files which you want to transfer later to your local machine for further analysis, and /cgate/common/yourusername to store permanent application data, job input and output files, etc.

To transfer files between your machine and the SP control workstation you have two choices:

  • a secure FTP (sftp) client. Under UNIX it comes in form of a command sftp that comes with many SSH distributions. Some GUI FTP clients also support secure FTP.
  • a secure copy (scp) command. scp is usually a part of any SSH distribution It may be more convenient to use than sftp but otherwise they offer similar functionality. Please, check your SSH documentation for more details on how to use the scp command. For example, copying a local file localfile from a Unix local machine cgate would involve issuing the following command on your local machine: scp localfile yourcgateusername@cgate.chemistry.gatech.edu:remotelocation. Analogously, scp yourcgateusername@cgate.chemistry.gatech.edu:remotelocation localhost would achieve the opposite goal.

There are a number of choices available for editing text files. The most common UNIX editor is vi. If you are completely unfamiliar with vi, we recommend you read this online vi tutorial.[12] For more experienced users, we may recommend this vi reference manual.[13] Another popular choice for text file editing is emacs. It is rather more powerful than vi. If you want to learn how to use emacs refer to this tutorial.[14]

How to Use the Graphical Capabilities of the Control Workstation

Some useful utilities and software packages produce visual data or require graphical user interface for their normal execution. If you have a Unix workstation with a graphical window manager running, then it is very easy to use such programs. When you log into the control workstation using a secure shell client, you will be able to execute graphical commands and see their output on your screen.

The situation is more difficult for Windows and MacOS users. You might have to install special X server software to be able to utilize graphical programs on cgate. X is now distributed with MacOS/X although not installed by default, and there are free X Windows Servers for Windows such as Cygwin[15] and VNC[16].

Compiling and Installing Software

If there is a software package that you need for your work, you may have it installed for you, or you may get a permission to install it yourself. Only licensed software products or freeware obtained from established sources may be installed on the CCMST SP. You may obtain a permission to install the software package or request assistance with installing the software by contacting the CCMST system administrator.

The CCMST SP has also a rich set of development tools to enable users to develop their own computer codes. Those include IBM C for AIX compiler version 5, IBM Visual Age C++ compiler version 5, IBM XL Fortran compiler version 7.1, Perl 5.5, GNU make version 3.79.1, GNU autoconf version 2.13, GNU bison version 1.28, GNU flex version 2.5.4, IBM Engineering and Scientific Subroutines Library (ESSL) version 3.2, IBM Parallel ESSL version 2.2, IBM Parallel Operating Environment 3.1 with Message Passing Library (MPI), libpthreads library, and more. Documentation for the IBM development tools can be found on control workstation by logging into cgate, making sure an X server on your machine is running (see the section on how to access the machine), and executing the command netscape http://control/. Alternatively, one may visit IBM AIX documentation library.[17] Manual pages for GNU tools can be read on cgate via the man command. Additional information on software development tools may be found in Software section.

Running Jobs

Introduction

All production jobs can only be executed on the SP via an IBM queueing software called LoadLeveler (current version 2.2). This program keeps track of all the jobs that have been submitted, prioritizes them, and distributes them among the SP nodes when requested resources (memory ,disk, processors) become available. It can handle both serial (single-processor) and parallel jobs.

Here we give a brief tutorial on how to start using LoadLeveler. Refer to the user guide to LoadLeveler for more information.[18]

How to submit a job to LoadLeveler

There are two ways to submit a job to LoadLeveler. You may prepare a command file using your favorite text editor, and feed it LoadLeveler with the llsubmit command. Or, if you have an X server running on your machine, you may use xloadl - an easy-to-use graphical user interface to LoadLeveler. Even if you plan to use xloadl exclusively, we strongly recommend you learn about the syntax of LoadLeveler command files first.

Constructing LoadLeveler Command Files

Each job needs a LoadLeveler command file; we will assume for the sake of convention that these are named with a .cmd suffix, although this is not required. A typical LoadLeveler command file for running an executable my.exe in a single-processor mode might look as follows:

#!/bin/csh
# @ job_type = serial
# @ initialdir = /cgate/common/username/chem
# @ notify_user = your@email.address
# @ account_no = useraccount
# @ input = /dev/null
# @ output = $(jobid).stdout
# @ error = $(jobid).err
# @ class = cpu
# @ notification = complete
# @ checkpoint = no
# @ restart = no
# @ requirements = (Arch == "power3") && (OpSys == "AIX43")
# @ queue

my.exe file.in file.out


This sample job runs the command my.exe file.in file.out in the directory /cgate/common/username/chem (you need to ensure that the command my.exe is in your path). After completion of the job, an email message is sent to your@email.address. This job is charged to account useraccount. As configured, this job will read standard input from /dev/null (which means it is a batch job and does not require any user input), and send standard errors and standard output to $(jobid).err and $(jobid).stdout, respectively, where $(jobid) is the id number assigned to the job by LoadLeveler when the job is submitted. Alternatively, the user may wish to rename these files according to the job being run (for example, file.stdout and file.err). These files are usually only of interest if the job crashes, since they might contain debugging information (although some programs will send their usual output to stdout). Typically, these files should be cleaned up after the results of the job have been checked.

Note the class = cpu setting. LoadLeveler is configured with several different classes (think of them as queues), and this is where the class is selected. A complete list of valid classes is listed below.

Class name CPU Time Limit Memory Limit Comments
interactive 30 minutes 1 Gigabyte For very short jobs
quick 24 hours 1 Gigabyte For medium length CPU-intensive jobs
cpu 14 days 1 Gigabyte Most jobs fall into this category
long 90 days 2 Gigabyte Only 2 jobs of this class can run at the same time. For super-long jobs only.
io 14 days 1 Gigabyte Should be used for I/O-intensive medium to long jobs only
special 365 days 1 Gigabyte By default, noone is allowed to use this class. If you need to run jobs of this length, please, contact CCMST system administrator. Only 1 such job can be run at any time.


The CPU time limit is the limit on the total CPU time of the job. For parallel jobs, it means the sum of tasks' CPU times. Hence, while 90 days is a lot of CPU time, if you use 12 tasks, they will consume 90 days of CPU time in 7.5 days, assuming 100% execution efficiency. Hence, plan carefully to which class your jobs should belong.

Class cpu is intended for longer computations (up to 14 days, requiring up to 1GB RAM). Other classes include quick (up to 24 hours, up to 1GB RAM) and interactive (up to 30 minutes, up to 1GB RAM). Any job going beyond the specified limits is killed automatically.

To gain more confidence with LoadLeveler, let us submit a simple job. Unix provides a command hostname, which prints the canonical name of the machine on which it is run to stdout. Let's set up a job which will execute the hostname command on one of the nodes. The command file will look like this:

#!/bin/csh
# @ job_type = serial
# @ account_no = useraccount
# @ initialdir = /cgate/common/username
# @ notify_user = username@control
# @ input = /dev/null
# @ output = hostname.stdout
# @ error = hostname.err
# @ class = interactive
# @ notification = complete
# @ checkpoint = no
# @ restart = no
# @ requirements = (Arch == "power3") && (OpSys == "AIX43")
# @ queue

/bin/hostname


Just cut and paste this into some file (say, hostname.cmd), replace useraccount with the account name you were assigned when you registered with CCMST, and username with your actual user name. That simple. Note that we specified class = interactive, since this job will take no time at all to run.

Once the .cmd file is prepared, it's a simple matter to submit it to LoadLeveler. On cgate, simply type

   llsubmit hostname.cmd
   

where the LoadLeveler command file is named hostname.cmd. This will verify job's username, account name, and class, and feed the job to LoadLeveler, which will place it in the appropriate queue.

Perhaps the easiest and most practical way to create LoadLeveler command files is to use one of the generic templates provided in /home/loadl/templates. There you will find templates for serial and parallel jobs. Then use one of them to create a command file template specific for your purposes, and just use that template in your work.

Once a job or jobs have been submitted, their status may be monitored by the llq command, which lists all jobs in the queues. Since this is a fairly long list, a particular user's jobs can be querried using llq -u username.

Sometimes you may want to cancel a job that has been submitted, or even started running. To cancel a job jobid, issue llcancel jobid. To cancel all of your jobs, use llcancel -u username.

Note on jobs that require a lot of memory

Many computations, especially highly-accurate quantum computations, require a lot of memory. In order to avoid compute nodes running out of memory one should specify in command file the memory requirements of the job if they exceed 256 MB. It is done via the ConsumableMemory resource:

  # @ resources = ConsumableMemory(1024)

In the example shown LoadLeveler will reserve 1024 MB of memory for this job. For parallel jobs the memory requirements are specified on a per-task basis (see here)


Note on parallel jobs

To run in parallel on the SP the software (obtained from outside source or your own) has to be written and compiled in a special way. Depending on which parallel model your software uses you will run it differently:

Multithreaded programs
If your program uses threads to achieve concurrent execution you may run it directly, as any other program, by invoking its name. An example of this would be certain modules of PSI 3. Pure multithreaded programs can only be run one a single node (see below for mixed MPI+threads programs). Unfortunately, LoadLeveler is not aware of multithreaded executables and treats every job (or task) as if it only has one thread. Thus it is your responsibility to specify how many processors to reserve for your jobs. This is done by specifying the number of ConsumableCpus the jobs needs
     # @ resources = ConsumableCpus(N)
where N is the number of processors. Usually the number of processors should be set to the number of compute threads.
Message-passing programs
Most distributed parallel programs use MPI library for message passing. Such programs have to be compiled and executed using Parallel Operating Environment (POE). For compilation POE provides MPI-enabled compilers (mpqc, etc.). Execution of message passing programs is done as follows:
     /usr/bin/poe /your/mp/program/name
/usr/bin/poe can be only invoked from within a parallel LoadLeveler job (when job_type = parallel in command file). LoadLeveler will handle the allocation of necessary number of tasks, setting up the environment, etc. automatically (see the section on parallel jobs in LoadLeveler user's guide).
Mixed MPI+threads programs
To compile and execute such programs you have to use POE as well. The only difference with the pure message-passing programs is that LoadLeveler is not thread-aware, hence you will have to specify the number of processors each task will use using ConsumableCpus resource as explained above.

It is imperative that you specify memory requirements for the parallel program. The amount of ConsumableMemory resource (see here) should specify how much memory each MPI task will require.

Using Graphical User Interface xloadl

There is also a graphical front-end to LoadLeveler submission and querying, called xloadl. If you are logging in from a machine running the X-Window system, you may invoke the graphical program by typing xloadl &.

The main xloadl window is split into 3 parts. The top pane is called "Jobs", and shows each job submitted to LoadLeveler (the content is identical to the output of llq command). The middle pane is called "Machines", and shows each machine that LoadLeveler is aware of. The bottom pane is called "Messages", and shows responses of LoadLeveler to various actions that you tell it to perform. Each pane, except "Messages", has its own pull-down menu.

The "Jobs" pane is most commonly used by regular users. It allows to build and submit jobs, check their status, cancel unnecessary jobs, etc. As we go through the most common job-related functions of xloadl in the next few paragraphs, we will imply using the "Jobs" pane.

Let's use xloadl to build a job identical to the one we submitted in the previous subsection. To build a job, pull down the File menu, click Build a Job..., and select the type of job you want to submit. Valid choices here are serial (single-processor job) and parallel (multiprocessor job using Message Passing Interface(MPI) library). We will assume a single-processor job for now, thus choose serial here. This will create another window called "Build a Job". It contains a web-like submission form with a number of entries. Luckily, most of those entries do not need to be filled. The most important entries are

  • Executable - name of the executable (include the full path to executable, e.g. /bin/hostname).
  • Arguments - command-line arguments that need to be given to the executable. In this case, the executable does not require any arguments, so leave this blank
  • Stdin - where to assign standard input. Since LoadLeveler is designed to handle batch jobs, it should be /dev/null.
  • Stdout - where to assign standard output. Although many jobs do not write to standard output, we strongly recommend to specify a file name here, e.g. hostname.out. You may also use variables like $(jobid) here. If you specify just a filename (a string that doesn't start with '/'), the file will be put in the initial directory for the job (see below).
  • Stderr - where to assign standard error output. Although many do not write to standard error output, we strongly recommend to specify a file name here, e.g. hostname.err. You may also use variables like $(jobid) here. If you specify just a filename (a string that doesn't start with '/'), the file will be put in the initial directory for the job (see below).
  • Initialdir - the current working directory for the job, i.e. where the input (in this case, none) and output files (in this case, hostname.out and hostname.err) are found. For our simple job set this to your home directory.
  • Class - the LoadLeveler class to use for the job. Valid job classes are listed here.
  • Account Number - the CCMST account name to which assign the job. You should only put a valid account name or number assigned to you by CCMST.
  • Node Usage - whether to request nodes for exclusive use (not shared) or allow sharing with other jobs (shared). The latter should be used for most occasions. To set - click on the "Set" button located to the right.

Other important entries, that you may need with more complex jobs, can be accessed if you press "Requirements" and "Limits" buttons located on a pane in the lower right corner. The most commonly used requirements are for memory and disk, specified in megabytes. The most useful limit is the wall clock limit, specified in seconds, it can be used to limit the job length to below what default class value is. This may allow the job to be picked for execution quicker, therefore it makes sense to set this limit if there is a way for you to estimate the length of the job. For this simple job you do not need to specify any additional requirements or limits.

To submit the job - press "Submit" button on the bottom of the "Build a Job" window. In the main LoadLeveler window you should see a message saying that the job has been submitted and what job ID it was assigned, or describing a reason why the job could not be submitted. If a job is submitted - you will be able to find it in the "Jobs" pane. Then you can further monitor the state of the job by using commands in the "Actions" menu. You also have an option to construct a LoadLeveler command file from the information input into the "Build a Job" window by pressing "Save" button. This is a useful feature that you may use to construct command file templates as you become more experienced with LoadLeveler.

CCMST CPU accounts


Each user account is allowed to use one or more CCMST CPU accounts which are used to keep track of CPU usage. Each account has a certain quota. Once the quota is surpassed, the account cannot be used until more hours are added to the account. To check the status of a CCMST CPU account one may use command llacctinfo. Given CPU account name it will return account's quota and the number of CPU hours that has been charged to this account.

More Information on SSH

It is a Georgia Tech policy that all remote logins to campus servers are permitted only using secure shell clients. Secure shell protocol uses sophisticated encryption schemes to provide secure communication between machines. If you do not have a secure shell client installed on you machine, you may refer to the OIT page[19] on ssh, from there you may obtain licensed copies of a Windows ssh client and find links to ssh implementations for MacOS, Linux, and other platforms. Solaris users may find OpenSSH binaries on Sunfreeware.com[20]. We should also mention the FreeSSH website[21], which contains links to freely available implementations of ssh clients for a multitude of platforms.