PBS

Portable Batch System (PBS) is the name of computer software that performs job scheduling. Its primary task is to allocate computational tasks, i.e., batch jobs, among the available computing resources.1

qsub Synopsis2

qsub is the command used for job submission to the cluster. It takes several command line arguments and can also use special directives found in the submission scripts or command file. Several of the most widely used arguments are described in detail below(you can use man qsub to get more info).

qsub
[-a date_time]
[-A account_string]
[-b secs]
[-c checkpoint_options]
              n  No checkpointing is to be performed.
              s  Checkpointing is to be performed only when the server executing the job is shutdown.
              c  Checkpointing is to be performed at the default minimum time for the server executing
                 the job.
              c=minutes
                 Checkpointing is to be performed at an interval of minutes, which is the integer number
                 of minutes of CPU time used by the job. This value must be greater than zero.
[-C directive_prefix] [-d path] [-D path] [-e path] [-f] [-h]
[-I ]
[-j join ]
[-k keep ]
[-l resource_list ]
[-m mail_options]
[-M user_list]
[-N name]
[-o path]
[-p priority]
[-P user[:group]]
[-q destination]
[-r c]
[-S path_list]
[-t array_request]
[-u user_list]
[-v variable_list]
[-V ]
[-W additional_attributes]
[-X]
[-z]

An Example of PBS Script

The following example is a .pbs for scf calculation of Pervosikie-MgSiO3.

Line 1 indicates this script uses bash shell.

Line 2-6 specify the job name, nodes, error file, output file, destination respectively.

The rest lines mainly give the input for Quantum Espresso. For details please check input description of Quantum Espresso.

The second to last line tells job system to use MPI to execute pw.x (part of Quantum Espresso).

The last line will remove temporary folder, i.e. $toutput.

#!/bin/bash
#PBS -N MgSiO3-scf-phon                          #<- job name is MgSiO3-scf-phon which will be shown in the queue if you use qstat
#PBS -l nodes=node03:ppn=16          #<- use node03 (16 cores in total).
#PBS -l walltime=03:00:00                        #<- run job for 3 hours(that means the job will be terminated if 3 hour limit is reached) 
#PBS -e error                                    #<- errors will be written to file 'error'
#PBS -o OUT                                      #<- output will be written to file 'OUT'
#PBS -q gentai                                   #<- use “gentai” queue (there only one queue on our cluster)
                                                 #<- your commands start here
toutput=/tmp${PBS_O_WORKDIR#/home}/$PBS_JOBID    #temporary data saved to the disk of the computing node, this is crutial

mkdir -p $toutput

cd  $PBS_O_WORKDIR

cat>P0.scf.in<<EOF
&CONTROL
    calculation='relax',
    outdir='$toutput',
    pseudo_dir='/home/coiby/pseudo',
    forc_conv_thr=1.0d-4,
    dt=30,
    disk_io='none', 
    nstep=400,
    tstress=.true.
    prefix='MgCO3',
    tprnfor=.true.
    restart_mode = 'from_scratch'
/
&SYSTEM
    ibrav=0,
    celldm(1)=1,
    ntyp=3,
    nat=30,
    ecutwfc=70.0,
/
&ELECTRONS
    mixing_beta=0.7,
    conv_thr=1.0d-8,
/
&IONS
    pot_extrapolation='second_order'
    wfc_extrapolation='second_order'
/
ATOMIC_SPECIES
 C   12.001   C_ca_bm.vdb
 Mg  24.305   Mg.vbc3
 O   15.999   O.rw2

CELL_PARAMETERS (alat)
8.816874495 -0.000004976    0.000002135
-4.408437524    7.560034917 -0.000002844
6.98011E-06 -0.000006627    27.60465642

ATOMIC_POSITIONS (crystal)
C       -0.000001320  -0.000000481   0.249999911
C        0.666672716   0.333323958   0.583331350
C        0.333323138   0.666672504   0.916668710
C        0.000001320   0.000000481   0.750000089
C        0.666676862   0.333327496   0.083331290
C        0.333327284   0.666676042   0.416668650
Mg       0.000000000   0.000000000   0.500000000
Mg       0.666662516   0.333341968   0.833335218
Mg       0.333337484   0.666658032   0.166664782
Mg       0.000000000   0.000000000   0.000000000
Mg       0.666658052   0.333337960   0.333334661
Mg       0.333341948   0.666662040   0.666665339
O        0.278033459  -0.000001189   0.249999051
O        0.944708349   0.333326638   0.583332706
O        0.611357605   0.666671004   0.916668766
O       -0.000001585   0.278034535   0.250000631
O        0.666671111   0.611358469   0.583331487
O        0.333325623   0.944708030   0.916667204
O        0.721963072   0.721963994   0.249999894
O        0.388638421   0.055288841   0.583331107
O        0.055288234   0.388638169   0.916669162
O        0.000001585   0.721965465   0.749999369
O        0.666674377   0.055291970   0.083332796
O        0.333328889   0.388641531   0.416668513
O        0.721966541   0.000001189   0.750000949
O        0.388642395   0.333328996   0.083331234
O        0.055291651   0.666673362   0.416667294
O        0.278036928   0.278036006   0.750000106
O        0.944711766   0.611361831   0.083330838
O        0.611361579   0.944711159   0.416668893

K_POINTS {automatic}
2  2  2  0  0  0

EOF
mpirun -np 16 -npool 4 /opt/software/espresso-5.4.0/bin/pw.x <  P0.scf.in > P0.scf.out
rm -rf $toutput

There are seven computing nodes on our cluster, node01-2 (64 cores), node03-6 (32 cores), node08 (36 cores).

Useful Commands

  • submit a job

$qsub filename.pbs

  • check job status

$qstat jobid

  • delete a job

$qdel jobid

  • check all jobs

$qstat -a

  • pbsnodes

show status of all nodes

  • bash delete jobs

$qdel 62{64..76}.server

Important Notes

​1. Node05, 06 (128+64 = 192G) have much larger memory than node03, 04 (128G), if you have memory-consuming jobs, please submit them to node05, 06.

​2. Avoiding writing temporary data to disk as much as possible. To learn more about disk_io, please check 3.3.1 Understanding parallel I/O in User’s Guide for Quantum ESPRESSO.

'low' :

    store wfc in memory, save only at the end


'none' :

    do not save anything, not even at the end
    ('scf', 'nscf', 'bands' calculations; some data
    may be written anyway for other calculations)

​3. If we have to save the temporay data during computation, output it to the local disk of the computing node. Never choose a folder under your home direcotory. toutput=/tmp${PBS_O_WORKDIR#/home}/$PBS_JOBID in the above PBS script will be a folder located in the local disk of the computing node.

​4. Choose proper parameter value for parallelization level, for example, you may choose less CPU cores (-npools, -nk) to reduce communication costs between cores. For details, check 3.3 Parallelization levels in User’s Guide for Quantum ESPRESSO.

mpirun -np 4096 ./neb.x -ni 8 -nk 2 -nt 4 -nd 144 -i my.input

​5. Select the node which has the most free resources.

Q&A

1. What can I do if the job is in the state "E" and also can't be deleted?

qdel: Request invalid for state of job MSG=invalid state for job - EXITING

The administrator can forcibly purge the job

qdel -p jobid

References

1. Wikipedia - Portable Batch System
2. Tutorial - Submitting a job using qsub - High Performance Computing at NYU - NYU Wikis

results matching ""

    No results matching ""