Dgate
From CCMSTWiki
Contents |
Getting Started with Dgate
Dgate can be accessed using ssh software only. The cluster is accessed by logging into the master node: ssh username@dgate.chemistry.gatech.edu. The internal name of the master node is control.ccmst. Cluster nodes can be accessed from the master node using ssh. Here's the information on Dgate nodes:
| Host Names | Owner | Specs | Type of use |
|---|---|---|---|
| cnode0101-cnode0106 | Hernandez group | dual 2.8GHz Xeons, 512 MB RAM, 30 GB disk | PBS |
| cnode0201-cnode0206 | Sherrill group | dual 2.8GHz Xeons, 4 GB RAM, 210 GB disk | PBS |
| cnode0207-cnode0218 | Bredas group | dual 2.8GHz Xeons, 4 GB RAM, 30 GB disk | interactive |
| onode0101-onode0105 | Bredas group | dual 2.2GHz Opterons (248), 6 GB RAM, 140 GB disk | interactive |
To check who's running on your group's nodes, use script check_nodes or PBS command qstat.
PBS queueing system
PBS stands for Portable Batch System. It is similar to LoadLeveler, DQS, and other batch systems. It is the only way to run jobs on Dgate. For reference information about PBS, type 'man pbs'.
Most users will learn how to use PBS by copying and modifying existing PBS scripts. Look at ~evaleev/PBS/test.cmd. It is a sample PBS command file with lots of comments. You should be able to modify it rather easily to suit your needs. PBS jobs are submitted using command 'qsub' (type 'man qsub' to learn more). If the job is submitted successfully it will report an ID number associated with the job. Queued and running jobs can be monitored using the 'qstat' command ('qstat -ans' is used most often). To delete a job, type 'qdel XXX, where XXX is the ID of the particular job you are trying to kill.
NOTE: Any Sherrill group job must use different rules for requesting processors (see ~evaleev/PBS/test.cmd). The number of processors that you request per task must be at least twice the number of actual processors you need. For example, if your job needs 2 processors then you should request 4 processors by adding the following to your PBS script: #PBS -l nodes=1:ppn=4 This will instruct PBS to grab 2 physical processors on 1 node. In case of a job that uses disk I/O heavily (such as ACESII and PSI3 coupled cluster jobs, etc.) then the number of processors that you request should be three times the number of physical processors you need. Only one such job can run on any given node. Thus, if you want to run a large coupled cluster calculation you should add the following line to the PBS script: #PBS -l nodes=1:ppn=3 Note that this doesn't mean that you should increase the amount of time requested from PBS, i.e. an I/O heavy job that requests 3 processors from PBS will actually run only one task which will consume time. The processors that you request are virtual and only serve to ensure that 2 I/O jobs don't get placed on the same node.
Running Jobs
For users of the Dgate cluster, we are using the PBS batch queue system. The basic commands are summarized below:
- check_nodes: Not strictly a PBS command, but it shows what all the nodes are currently doing.
- qstat -ans: Look at the status of the queue
- qsub filename.cmd: Submits the specified command file to the queue
- qdel jobnumber: Delete a job from the queue (may leave a mess of scratch files around)
The structure of the command file is a little funny. PBS can read lines that start with #PBS. These are not commented out. A double comment character, ## will comment something out.... An example job control file is in ~evaleev/PBS/test.cmd. The minimal information is in the sample below:
# Sample PBS script # How much memory to reserve #PBS -l mem=200mb # How much (CPU) time to request (determines queue). The below # requests 3 hours, 10 minutes #PBS -l cput=3:10:00 # Uncomment (i.e., delete one of the comment chars) this for sherrill # group I/O heavy jobs ##PBS -q sio # How many nodes and processors you want. For Sherrill group, we had # to hack the system and pretend that each node, which has 2 processors, # has 5 "virtual processors". Queue sio (above) defaults to ppn=3, # which will prevent another I/O job from running on the same queue. # For Sherrill group, double the number of processors you really want, # and add one if you need I/O. Note: ppn will default to 3 if you uncomment # the PBS -q sio line above. If you use the following line, you will # override that ppn. So, usually use the line above for sio, or the # one below, but not both. ##PBS -l nodes=1:ppn=2 # This tells PBS what directory to go to cd $PBS_O_WORKDIR # Then just put the command below. For example, runqchem h2o.in h2o.out
Using Jaguar
Jaguar is an ab initio quantum chemistry software package developed by Schrodinger, Inc[1]. It is rather fast, and has an extensive list of features. Some simple tasks (HF and DFT optimizations and closed-shell MP2 energies) can be performed in parallel. For information on program's capabilities refer to Jaguar's website[2]. User manuals for all Schrodinger products are available here[3]. The current version of Jaguar on Dgate is 5.5.
To start using Jaguar on Dgate you need to set some environmental variables first. This is usually automated by adding a line source /theoryfs/common/software/etc/cshrc.schrodinger to your .cshrc shell initialization file. If your file does not have that line, append it to the end of the file and re-login.
Jaguar has a graphical user interface called Maestro. To start Maestro from the master node, simply type 'maestro'. You can use Maestro to create molecules, set up parameters of Jaguar calculations, and save resulting Jaguar input files. NEVER run Jaguar jobs from Maestro on Dgate.
To submit a Jaguar job to PBS you have to set up a simple PBS command file (see here). The command to execute Jaguar from a PBS script is /usr/local/bin/runjaguar mol.in where mol.in is the name of Jaguar input file. Once the job starts running you will see 2 files: mol.log with essential calculation output, and mol.out with complete calculation output. A sample PBS command file for use with Jaguar can be found at ~evaleev/PBS/jaguar/ethanol.cmd.
If your group still runs jobs on Dgate interactively, you cannot simply use the runjaguar script. To run Jaguar interactively on a single processor, type jaguar run mol.in, where mol.in is the name of the input file. This command will start the calculation in the background. To run an interactive parallel calculation, follow the following instructions:
- Create a file with the list of nodes available to you. Here's an example:
cnode0207.ccmst:2 cnode0208.ccmst:2
The number of processors available on each node is separated from the name by a colon. If there is only one processor per node then the node name will suffice.
- Set an environmental variable SCHRODINGER_MPI_FLAGS as follows:
setenv SCHRODINGER_MPI_FLAGS '-machinefile '
where is the name of the file with the list of nodes you just created. Use the absolute path to specify the location of the file. Any other command-line options to MPICH's mpirun can be appended to SCHRODINGER_MPI_FLAGS.
- Set other environmental variables for parallel Jaguar runs:
source /theoryfs/common/software/etc/cshrc.schrodinger.parallel
- Start the job:
jaguar run -PROCS XXX -HOST YYY mol.in
where mol.in is the name of the input file, XXX is the number of MPI tasks to use, and YYY is the name of the node on which task 0 will run (typically, the name of the first node in the machine file). The number of MPI tasks should equal the number of processors available for the job.
