Molpro
From CCMSTWiki
Contents |
Adding Content to the Page
If you wish to add any content to this page, make sure you have read the Suggestions for Adding Content. Hopefully, this will allow anything added to this page to be as helpful as possible.
User Manual
Molpro User's Manual (On vergil)
Bugs
Molpro 2002 cannot be trusted for vibrational frequencies. It computes frequencies numerically, but no matter how tightly you converge the energies, a bug creates numerical noise which means that you do not get the 6 (or 5, for a linear molecule) frequencies which are supposed to be zero to actually be zero --- often they are around 50/cm or less. Because MOLPRO cannot distinguish the actual small frequencies from these "fake" small frequencies, it suppresses small frequencies in its computation of ZPVEs or finite-temperature thermal corrections. This is not a huge problem for ZPVE's, since a small frequency won't contribute much, but it IS a huge problem for thermal corrections, because small frequencies contribute a lot to the vibrational partition function. Hence, do not use MOLRPO 2002 for temperature corrections, and treat all frequencies with extreme caution. UPDATE: Molpro 2006.1 seems to have fixed the problem of reporting small numerical frequencies. It either has high energy and gradient thresholds which make these numerical frequencies very close to zero and/or knows how to project our rotations and translations and/or suppresses the printing of these annoying little frequencies.
Direct vs Semi-Direct vs non-Direct
Molpro has the ability to run any computaions (except full CI and CCSD(T)) in a fully "direct" mode, meaning that no (or little) disk is used, and the integrals are re-calculated over and over as needed. This takes more CPU time (because of the need to recalculate integrals), but removes the need to read and write very large disk files. For a sufficiently large number of integrals, the direct mode becomes preferable because the disk files for the integrals (and other intermediates) eventually become too large to store. Even if they can be stored, if they are large enough and the disks aren't fast enough, it could possibly take more time to read and write these files than to recalculate the integrals. For MP2 (and possibly other methods?) there is also a semi-direct method that is in between fully direct and the conventional (disk based method).
To tell Molpro to do a computation in direct mode, use the keyword gdirect. For MP2, it will then try to decide if direct or semi-direct is faster (although the SCF part will be performed direct). If you want to force fully direct mode for MP2, use gdirect;dmp2=0. Remember if you run conventionally or semi-direct, you need to reserve a Scratch resource. If you run fully direct, you do not. The speed of the semi-direct and direct computations will depend on the amount of memory you give the program. The more memory, the fewer passes through the integrals will be required. The cost of the computation is (approximately) directly proportional to the number of integral passes.
How can you tell if you need to run direct or semi-direct? We need to compile some data on this. Certainly if your job crashes because the disk is filled, then you need to run direct or semi-direct instead of the conventional (non-direct) method.
| Case | Conventional | Semi-direct | Direct | Passes |
|---|---|---|---|---|
| Sandwich benzene dimer, aug-cc-pVTZ | disk fill | 28,299 | 36,762 | 1 |
Parallel Computations
To run Molpro in parallel on Egate, a typical submission script would look like
#!/bin/sh #BSUB -J 2procTemplate #BSUB -R "rusage[mem=1500] span[ptile=2]" #BSUB -W 200:00 #BSUB -n 2 setenv NUM_COMPUTE_THREADS 2 runmolprop test.com
It is essential to include "span[ptile=2]" in the resource requirement and set the environmenal variable "NUM_COMPUTE_THREADS" to 2 because parallel Molpro jobs which are supposed to use one of two processors on a node have been known to create multiples threads actually using both processors on the same node. That obviously creates a lot of problems.
You should give the MOLPRO input file the amount of memory to use per thread
Below are some benchmarks of Molpro parallel jobs on Egate. Tables II shows the scaling for large single energy point calculations while Table III compares the timing for transition state searches and vibrational frequency calculations using numerical gradients and Hessians. Table IV demonstrates the disadvantage of using parallel computations for smaller calculations where the overhead of parallelization becomes comparable to or exceeds the actual computation time.
Reported are real times in seconds decomposed into their main components, namely integral computation (ints), SCF iterations (hf) and correlated wavefuction calculations [ks, mp2, ccsd(t)]. For transition state and vibrational frequency calculations where numerical gradients and Hessians are computed numerically from a series of single point energy calculations, the total time for these processes is lumped into TSsearch/opt and freq. Parallel jobs ran with 'span[ptile=2]'.
| Method | t(n=1) | t(n=2) | t(n=4) | t(n=8) | t1/t2 | t1/t4 | t1/t8 |
|---|---|---|---|---|---|---|---|
| B3LYP-int | 46.13 | 34.82 | 16.67 | 14.72 | 1.32 | 2.77 | 3.13 |
| B3LYP-hf | 163.20 | 65.68 | 38.51 | 26.77 | 2.48 | 4.24 | 6.10 |
| B3LYP-ks | 257.86 | 116.78 | 61.68 | 36.47 | 2.21 | 4.18 | 7.07 |
| CCSD(T)-int | 39.48 | 30.82 | 21.01 | 13.84 | 1.28 | 1.88 | 2.85 |
| CCSD(T)-hf | 167.15 | 62.30 | 41.90 | 25.37 | 2.68 | 3.99 | 6.59 |
| CCSD(T) | 424.46 | 198.19 | 129.83 | 87.41 | 2.14 | 3.27 | 4.86 |
| MP2-int | 39.27 | 30.27 | 19.42 | 15.01 | 1.30 | 2.02 | 2.62 |
| MP2-hf | 156.40 | 61.35 | 40.54 | 25.45 | 2.55 | 3.86 | 6.15 |
| MP2 | 162.64 | 64.65 | 42.36 | 26.55 | 2.52 | 3.84 | 6.13 |
The scaling for single energy points is fairly good up to 8 processors. The integral computation part doesn't seem to parallelize very well.
| Method | t(n=1) | t(n=2) | t(n=4) | t1/t2 | t1/t4 |
|---|---|---|---|---|---|
| B3LYP-int | 178.18 | 119.60 | 91.59 | 1.49 | 1.95 |
| B3LYP-hf | 259.75 | 161.14 | 113.13 | 1.61 | 2.30 |
| B3LYP-ks | 353.47 | 208.89 | 138.31 | 1.69 | 2.56 |
| B3LYP-TSsearch | 51961.88 | 32422.17 | 19184.04 | 1.60 | 2.71 |
| CCSD(t)-int | 173.68 | 112.62 | 80.07 | 1.54 | 2.17 |
| CCSD(T)-hf | 255.48 | 154.43 | 101.55 | 1.65 | 2.52 |
| CCSD(T) | 3106.21 | 1746.15 | 1137.25 | 1.78 | 2.73 |
| CCSD(T)-TSsearch | 226589.35 | 123904.63 | 102099.57 | 1.83 | 2.22 |
| MP2-int | 176.30 | 113.76 | 118.92 | 1.55 | 1.48 |
| MP2-hf | 257.84 | 155.21 | 148.08 | 1.66 | 1.74 |
| MP2 | 271.20 | 162.88 | 153.93 | 1.67 | 1.76 |
| MP2-TSsearch | 15381.71 | 9459.53 | 6454.25 | 1.63 | 2.38 |
The scaling for this transition state search is also pretty good up to 4 processors. Going from 1 to 2 processors gives a much better scaling than going from 1 to 4 or 2 to 4 processors.
| Method | t(n=1) | t(n=2) | t(n=4) | t(n=8) | t1/t2 | t1/t4 | t1/t8 |
|---|---|---|---|---|---|---|---|
| B3LYP-int | 0.54 | 0.26 | 0.90 | 0.33 | 2.08 | 0.60 | 1.64 |
| B3LYP-hf | 0.66 | 0.33 | 0.95 | 0.37 | 2.00 | 0.69 | 1.78 |
| B3LYP-ks | 2.09 | 1.09 | 2.10 | 0.83 | 1.92 | 1.00 | 2.52 |
| B3LYP-opt | 72.77 | 112.43 | - | - | 0.65 | - | - |
| B3LYP-freq | 812.84 | 564.85 | - | - | 1.44 | - | - |
| CCSD(T)-int | 0.23 | 0.27 | 0.34 | 0.26 | 0.85 | 0.68 | 0.88 |
| CCSD(T)-hf | 0.35 | 0.34 | 0.39 | 0.30 | 1.03 | 0.90 | 1.17 |
| CCSD(T) | 1.00 | 0.73 | 1.66 | 0.58 | 1.37 | 0.60 | 1.72 |
| CCSD(T)-opt | 27.05 | 19.08 | 17.47 | 14.92 | 1.42 | 1.55 | 1.81 |
| CCSD(T)-freq | 277.15 | 176.85 | 136.56 | 116.67 | 1.57 | 2.03 | 2.38 |
| MP2-int | 0.23 | 0.26 | 1.04 | 0.35 | 0.88 | 0.22 | 0.66 |
| MP2-hf | 0.34 | 0.33 | 1.10 | 0.40 | 1.03 | 0.31 | 0.85 |
| MP2 | 0.51 | 0.39 | 1.32 | 0.65 | 1.31 | 0.39 | 0.78 |
| MP2-opt | 15.71 | 31.65 | - | - | 0.50 | - | - |
| MP2-freq | 53.93 | 104.37 | - | - | 0.52 | - | - |
The scaling here is not good and often sporadic. MP2 and B3LYP optimization and frequency calculations couldn't be done for more than 2 processors. This is probably an ideal case showing that using multiple processors on small jobs is not a good idea because the overhead of managing different processes and threads becomes comparable or more time-consuming than the actual computation. So, unless a Molpro job is sufficiently large, one should not run it in parallel.
| R | n=1, 1 pass | n=2, 1 pass | n=2, 2passes |
|---|---|---|---|
| - | (850MW) | (400MWx2) | (200MWx2) |
| 3.7 | 292035 | 288758 | 526072 |
| 3.9 | 304145 | - |
Note: We haven't really tested the effect of the global array memory size (specified by the -G option) on the efficiency of MOLPRO parallel computations.
Quick Notes
Disclaimer: The comments below are pertinent to Molpro 2002.6, but not necessarily to newer versions.
- Molpro has no analytic gradients or hessians for any method except for closed shell SCF.BT
- Molpro has disabled UMP2 geometry optimization and transition state search. BT
- Molpro assumes all GTOs are spherical, so make sure you use the 'cartesian' keyword when specifying GTOs in basis sets like 3-21G, 6-31G ...etc if they contain d, f, and higher order angular momentum functions. BT
- Molpro's default convergence criteria are generally questionable. One should probably set thresholds using the GTHRESH keyword. Very conservative settings would be:
GTHRESH,ZERO=1.d-14,ONEINT=1.d-14,TWOINT=1.d-14,ENERGY=1.d-12,GRAD=1.d-6,step=1.d-6
For single-points of van der Waals complexes, one should use an energy convergence of at least 1.d-8. Somewhat less conservative settings, which should be ok if no numerical differentiation is being done, might be:
GTHRESH,ENERGY=1.e-8,ONEINT=1.e-14,TWOINT=1.e-12,GRAD=1.d-6
- Molpro's default print levels are not great either - it doesn't even print orbital energies. One change the print level using the GPRINT keywork like
GPRINT,basis,orbital
- Molpro freezes core for correlated wavefunction methods be default. One must include the 'core' keyword to unfreeze core orbitals. Especially when using ECPs, it might become problematic because it could attempt to freeze core electrons of the atom for which an ECP is used. BT
- When trying to compute include relativistic effects using Douglas-Kroll integrals, whether you include 'dkroll=1' keyword before before or after your basis set makes a difference, especially when using correlation consistent basis.
dkroll=1 basis=cc-pvdz
computes Douglas-Kroll integrals for the cc-pVDZ basis. On the other hand,
basis=cc-pvdz dkroll=1
sets the basis as cc-pVDZ-DK and computes Douglas-Kroll integrals on that. For the case of H2 the energies computed using the above two specifications are -1.16684858 a.u. and -1.16684111 a.u., respectively. For larger systems, the difference would probably be much more significant.
- Whenever using Molpro for DFT calculations, one must set a very tight integration grid by using the GRID and THR keywords.
GRID,1800.2,UNKNOWN THR,1e-10,0,0
The calculations would take longer, considering the default is THR,1e-5,0,0. For more on this, see the Molpro man pages
SCS-MP2 Singlet and Triplet Pair Energies
MOLPRO SCS calculations print out a section that looks like:
SCS-MP2 correlation energy: -2.247339312134 (PS= 1.200000 PT= 0.333333)
SCS-MP2 total energy: -555.538672906519
Norm of first-order wave function: 0.589320334537
Coefficient of reference function: 0.793221140007
MP2 singlet pair energy -1.417576015355
MP2 triplet pair energy -0.877898722031
Reference energy -553.291333594385
Correlation energy -2.295474737386
!MP2 STATE 1.1 ENERGY -555.586808331771
The SCS-MP2 correlation energy is:
ESCS = (2/3)*PT*Et + PS*Eopp
or:
ESCS = (2/3)*PT*Et + (1/3)*PS*Et+ PS*Es
where,
PT = scsfact, the same spin scaling factor
Et = MP2 triplet pair energy
PS = scsfacs, the opposite spin scaling factor
Eopp = ECorrelation energy - (2/3)Et
Es = MP2 singlet pair energy
