Center for Computational Molecular Science and Technology Georgia Institute of Technology Center for Computational Molecular Science and Technology School of Chemistry and Biochemistry

CCMST Weekly News, March 4, 2011

March 4, 2011 9:30 pm EST

1. Announcements
2. Statistics
3. Tip of the Week

ANNOUNCEMENTS

Physical Chemistry Seminar Series

March 8, 2011 4:00 PM – 5:00 PM
MoSE 3201A
Prof. Jon Camden, University of Tennesse, Knoxville
Plasmonic Nanostructures for Surface Enhanced Nonlinear Spectroscopy and Direct Imaging of Plasmon Modes

NOTE: Planned Power Outage Postponed

Please note that the power outage for the MS&E building, originally planned for March 26-27 2001 has been postponed until later in the spring semester. Date to be determined.

STATISTICS

FGATE

Uptime: 50 days
/home directory usage: 73% (1.6 TB available)
/backups directory usage: 85%

LSF usage for Week 8 (2/21-2/27) (times are in minutes)
GroupJobsTotal CPUAvg CPUAvg WaitAvg Trnr.
Bredas 2338 317258 16% 136 17 83
Hernandez 479 631838 33% 1319 1335 5695
Sherrill 2210 178749 9% 81 108 189
Other 2 6255 0% 3128 0 3128
Total 5029 1134105 59% 226 182 666

Note: percentages refer to the total CPU time available for the period.

Most productive user of the Week: atucker 330618.

EGATE

Egate is currently down. Please check back next week for the usage statistics.

TIP OF THE WEEK

By Massimo

Job Limits

The main limit is on the total number of jobs and processors a single user can have. These are currently configured as follow:

Soft Limit Hard limit
Jobs Processors Jobs Processors
Running 80 120 100 150
Running + Eligible 100 150 100 150

Soft limit are applied under heavy load conditions, otherwise hard limits apply. An eligible job is a job that is not running, but it is eligible to run as soon as resources are available, and thus sits on the queue acquiring priority. Jobs in excess of the "Running + Eligible" limit are kept in a blocked state (they sit in queue but do not acquire priority). The command showq displays the jobs divided between active (running), eligible and blocked.

Time

There are no limits on the maximum time a job can run, but the queue implements back filling, thus jobs requesting a short running time have increased chances to run sooner. Wall clock time requests are specified with the construct

#PBS -l walltime=DD:HH:MM:SS

If a job does not specify a walltime request, it will be assigned an infinite walltime and the job will never be eligible for back filling. When requested, wall time limits are enforced by the scheduler: if a running job exceeds 120% of its requested wall time the job will be killed by the queue. A warning e-mail is sent to the user when the job reaches 100% of its allotted wall time.

Memory

There are no limits on the maximum amount of memory a job can request, except that a job cannot exceed the physical memory limits of the system. Memory requests are specified by

#PBS -l pmem=1500mb

Note that memory requests are per processor, thus the total memory requested by the job will be the memory specified in the pmem line multiplied by the number of processors requested by the job. If a job does not place a memory request, it will be assigned a default values of pmem=8000mb, i.e. 8GB. That's a rather high value. Jobs with low memory requests have increased chances of an early start. Memory limits are enforced: a job using more than 105% of its requested memory will be mercilessly terminated by the batch queue.

Fair Share

The queue implements a fair share algorithm based on usage data for the past 100 days (with decaying weight). Thus users that have lower usage have increased priority with respect to heavy users.

Processor Allocation

The allocation of nodes and processors is specified by the syntax

#PBS -l nodes=n_1:ppn=n_2

The queue will allocate a total of n_1*n_2 processors to the job. Note that in this specification, the only parameter that the queue system will certainly honor is the total number of processors. When distributing processes to compute nodes, the queue system is configured to try to pack together as many processes as possible on a single compute node, thus, if resources are available, Moab will override the initial request and try to pack processes in as few compute nodes as possible. In this case the number of processes per node specified in the resources request is a lower bound to what the job will be assigned at run time. To increase the chances of an early start of the job it is advisable to specify a number of processor per nodes as low as possible, ideally just one, to leave maximum flexibility to Moab on how to allocate the processes. The scheduler will take care of packing the processes together, if possible. For example, to request a parallel job using 8 processors, the recommended request card is

#PBS -l nodes=8:ppn=1

This will give maximum flexibility to the scheduler on process allocation.

I/O Intensive Jobs

There is a special resource defined for I/O intensive jobs, to prevent competition for the I/O resources of the compute nodes. Jobs that do Heavy disk I/O should request the special resource gres=lscr. This is a compute node resource, and can be specified when requesting processors:

#PBS -l nodes=2:ppn=1:gres=lscr

Moab will not allocate more than two jobs requesting the gres=lscr resource to each compute node, to avoid saturating the I/O subsystem of the nodes. Note that this is a per processor request, thus the above examples will consume two lscr resources (if the two processors are allocated to the same compute node, no other I/O intensive job will be scheduled on that compute node. When requesting the lscr resource it is very important to never request more than 2 processor per node. For instance, the following request

#PBS -l nodes=1:ppn=4:gres=lscr

will result on a job sitting forever in the queue, as Moab will not assign less than 4 processes per node to this job, but on the other hand no more than two lres processes can run on the same node, resulting in a job waiting forever in queue.

Note also that this resource is meant for I/O intensive jobs, usually post HF calculations on large systems or with large basis set. Jobs that do only occasional I/O, like typical HF and DFT jobs, do not need to specify the gres=lscr attribute. However, even light I/O jobs should always be configured to write temporary files on the local scratch directory of the compute nodes (/scratch), to avoid overloading the NFS system.

Do you have usage tips that you want to share with the other CCMST users? Please send them to Massimo (massimo.malagoli@chemistry.gatech.edu) for inclusion in the Tip of the Week section.