Tutorial: Phonons, EELS and magnons for HPC and GPUs

Slides

Phonon modes of CnSnI3 at Gamma

pwscf simulation, step 1

Files needed:

 1#!/bin/bash
 2#SBATCH --nodes 1
 3#SBATCH --time=00:10:00
 4#SBATCH --partition=gpu
 5#SBATCH --ntasks-per-node=4
 6#SBATCH --ntasks-per-socket=2
 7#SBATCH --cpus-per-task=32
 8#SBATCH --gres=gpu:4
 9#SBATCH --job-name=phstep1
10#SBATCH --error=err.job-%j
11#SBATCH --output=out.job-%j
12#SBATCH --hint=nomultithread
13#SBATCH --reservation=maxgpu
14
15export WORK=/ceph/hpc/data/d2021-135-users
16
17module purge
18module use ${WORK}/modules
19
20module load QuantumESPRESSO/DEV-NVHPC-21.2
21
22export ESPRESSO_PSEUDO=${PWD}/../../../pseudo
23export OMP_NUM_THREADS=1
24
25mpiopt="-mca pml ucx -mca btl ^uct,tcp,openib,vader  --map-by socket:PE=32 --rank-by core"
26
27mpirun $mpiopt -np npw pw.x -ni 1 -nk 1 -i pw.CnSnI3.in > pw.CnSnI3.out

Perform a vc-relax calculation for CnSnI3 using the pw.x program.

  1. Copy ../inputs/pw.CnSnI3.in in the current folder and modify the &CONTROL namelist to do a vc-relax calculation

    calculation=""

  2. Open submit.slurm and modify npw to use R&G on 4 MPIs:GPUs

  3. Submit the job file

    sbatch submit.slurm

    Check if convergence has been achieved.

  4. Copy the output directory (out/)in the folder of step2.

    cp -r ./out ../step2/

Solution

Phonon calculation, step2

Files needed:

 1#!/bin/bash
 2#SBATCH --nodes 1
 3#SBATCH --time=00:20:00
 4#SBATCH --partition=gpu
 5#SBATCH --ntasks-per-node=1
 6#SBATCH --ntasks-per-socket=1
 7#SBATCH --cpus-per-task=32
 8#SBATCH --gres=gpu:1
 9#SBATCH --job-name=phstep2
10#SBATCH --error=err.job-%j
11#SBATCH --output=out.job-%j
12#SBATCH --hint=nomultithread
13#SBATCH --reservation=maxgpu
14
15export WORK=/ceph/hpc/data/d2021-135-users
16
17module purge
18module use ${WORK}/modules
19module load QuantumESPRESSO/DEV-NVHPC-21.2
20
21export ESPRESSO_PSEUDO=${PWD}/../../../pseudo
22export OMP_NUM_THREADS=1
23
24mpiopt="-mca pml ucx -mca btl ^uct,tcp,openib,vader --map-by socket:PE=32 --rank-by core "
25mpirun $mpiopt -np 1 ph.x -ni 1 -nk 1 -i ph.CnSnI3.in > ph.CnSnI3.out

Perform a phonon calculation at Gamma for CnSnI3 using the ph.x program.

  1. Copy ../inputs/ph.CnSnI3.in in the current folder and modify the &inputph namelist ; add coordinates of the Gamma point

    &inputph
    	prefix=''
    	amass(1)=
    	amass(2)=
    	amass(3)=
    /
    X	Y	Z
    
  2. Submit the jobfile to run ph.x on 1 MPI : GPU

  3. Check the number of k points

    awk '/number of k/' ph.CnSnI3.out

  4. Check the number of irreducible representations

    awk '/irreducible/' ph.CnSnI3.out

  5. Check the dynamical matrix in dynmat.out

    tail -n 97 harmdyn_support

Solution

ASR rule application, step 3

Files needed:

 1#!/bin/bash
 2#SBATCH --nodes 1
 3#SBATCH --time=00:10:00
 4#SBATCH --partition=gpu
 5#SBATCH --ntasks-per-node=1
 6#SBATCH --ntasks-per-socket=1
 7#SBATCH --cpus-per-task=32
 8#SBATCH --gres=gpu:1
 9#SBATCH --job-name=phstep3
10#SBATCH --error=err.job-%j
11#SBATCH --output=out.job-%j
12#SBATCH --hint=nomultithread
13#SBATCH --reservation=maxgpu
14
15export WORK=/ceph/hpc/data/d2021-135-users
16
17module purge
18module use ${WORK}/modules
19module load QuantumESPRESSO/DEV-NVHPC-21.2
20
21export ESPRESSO_PSEUDO=${PWD}/../../../pseudo
22export OMP_NUM_THREADS=1
23
24mpiopt="-mca pml ucx -mca btl ^uct,tcp,openib,vader --map-by socket:PE=32 --rank-by core "
25mpirun $mpiopt -np 1 dynmat.x -ni 1 -nk 1 -i dyn.CnSnI3.in > dyn.CnSnI3.out

Apply the Acoustic Sum Rule (ASR) with dynmat.x

  1. Copy ../inputs/dyn.CnSnI3.in and add the ‘crystal’ ASR rule

    &input
    	asr=''
    
  2. Copy ../step2/harmdyn_support in the current folder

  3. Submit the job

  4. Check phonon frequencies with ASR rule applied in dyn.CnSnI3.out

Solution

Multi GPU offload with pools, step 4

Files needed:

 1#!/bin/bash
 2#SBATCH --nodes 1
 3#SBATCH --time=00:20:00
 4#SBATCH --partition=gpu
 5#SBATCH --ntasks-per-node=2
 6#SBATCH --ntasks-per-socket=1
 7#SBATCH --cpus-per-task=32
 8#SBATCH --gres=gpu:2
 9#SBATCH --job-name=phstep4
10#SBATCH --error=err.job-%j
11#SBATCH --output=out.job-%j
12#SBATCH --hint=nomultithread
13#SBATCH --reservation=maxgpu
14
15export WORK=/ceph/hpc/data/d2021-135-users
16
17module purge
18module use ${WORK}/modules
19module load QuantumESPRESSO/DEV-NVHPC-21.2
20
21export ESPRESSO_PSEUDO=${PWD}/../../../pseudo
22export OMP_NUM_THREADS=1
23
24mpiopt="-mca pml ucx -mca btl ^uct,tcp,openib,vader --map-by socket:PE=32 --rank-by core"
25
26mpirun $mpiopt -np 2 ph.x -ni 1 -nk npools -i ph.CnSnI3.in > ph.CnSnI3.out

Perform a phonon calculation at Gamma on 2 GPUs for CnSnI3 using the ph.x program.

  1. Copy the input of step2 ../step2/ph.CnSnI3.in in the current folder

  2. Copy the ../step1/out directory in the current folder

  3. Modify npools in submit.slurm to distribute the calculation on 2 MPIs : GPUs with pool parallelization

  4. Submit the jobfile

    sbatch submit.slurm

  5. Check wall time of parallel execution

    tail ph.CnSnI3.out

Solution

Multi gpu offload with images, step 5

Files needed:

 1#!/bin/bash
 2#SBATCH --nodes 1
 3#SBATCH --time=00:30:00
 4#SBATCH --partition=gpu
 5#SBATCH --ntasks-per-node=4
 6#SBATCH --ntasks-per-socket=2
 7#SBATCH --cpus-per-task=32
 8#SBATCH --gres=gpu:4
 9#SBATCH --job-name=phstep5
10#SBATCH --error=err.job-%j
11#SBATCH --output=out.job-%j
12#SBATCH --hint=nomultithread
13#SBATCH --reservation=maxgpu
14
15export WORK=/ceph/hpc/data/d2021-135-users
16
17module purge
18module use ${WORK}/modules
19module load QuantumESPRESSO/DEV-NVHPC-21.2
20
21export ESPRESSO_PSEUDO=${PWD}/../../../pseudo
22export OMP_NUM_THREADS=1
23
24mpiopt="-mca pml ucx -mca btl ^uct,tcp,openib,vader --map-by socket:PE=32 --rank-by core "
25
26mpirun $mpiopt -np 4 ph.x -ni nimages -nk 1 -i ph.CnSnI3.in > out.0_0
27mpirun $mpiopt -np 1 ph.x -ni 1 -nk 1 -i ph.CnSnI3.recover.in > ph.CnSnI3.recover.out

Perform a phonon calculation at Gamma on 4 GPUs for CnSnI3 using the ph.x program.

  1. Copy the input of step2 ../step2/ph.CnSnI3.in

  2. Copy ph.CnSnI3.in as ph.CnSnI3.recover.in and add recover=.true. in &inputph

  3. Copy the ../step1/out directory in the current folder

  4. Modify nimages in submit.slurm to distribute the calculation on 4 MPIs : GPUs with image parallelization

  5. Submit the jobfile

    sbatch submit.slurm

  6. With image parallelism there is 1 output file for each image. These are named out.*_0, with * the image rank. Check the workload of each image

    $ awk '/I am image/ {x=NR+3} (NR<=x) {print $0} ' out.*_0
    
  7. Compare wall times. Which images takes the longest time ? Why ?

Solution


EELS in bulk Silicon

Calculation of the electron energy loss spectra (EELS) of bulk silicon.

Submit files needed:

 1#!/bin/bash
 2#SBATCH --job-name=pwSi
 3#SBATCH -N 1
 4#SBATCH --ntasks=64
 5#SBATCH --time=00:30:00
 6#SBATCH --partition=cpu
 7#SBATCH --ntasks-per-node=64
 8#SBATCH --cpus-per-task=1
 9#SBATCH --exclusive
10#SBATCH --reservation=maxcpu
11
12module purge
13module load QuantumESPRESSO/7.1-foss-2022a
14
15export OMP_NUM_THREADS=1
16
17mpirun -np 64 pw.x -nk 16 -i pw.Si.scf.in > pw.Si.scf.out

Input files needed:

 &control
    calculation='scf'
    restart_mode='from_scratch',
    prefix='Sieels'
    pseudo_dir = '../../pseudo'
    outdir='./tempdir'
 /
 &system
    ibrav = 2,
    celldm(1) = 10.26,
    nat = 2,
    ntyp = 1,
    ecutwfc = 20.0
 /
 &electrons
    conv_thr =  1.0d-10
 /
ATOMIC_SPECIES
Si  28.08  Si.upf
ATOMIC_POSITIONS {alat}
Si 0.00 0.00 0.00
Si 0.25 0.25 0.25
K_POINTS {automatic}
12 12 12 1 1 1

Step-by-step for running the tutorial can be found in the slides linked at the top of this page!


Calculation of the magnon spectra of bulk iron

Submit files needed:

 1#!/bin/bash
 2#SBATCH --job-name=pwFe
 3#SBATCH -N 1
 4#SBATCH --ntasks=64
 5#SBATCH --time=00:30:00
 6#SBATCH --partition=cpu
 7#SBATCH --ntasks-per-node=64
 8#SBATCH --cpus-per-task=1
 9#SBATCH --exclusive
10#SBATCH --reservation=maxcpu
11
12module purge
13module load QuantumESPRESSO/7.1-foss-2022a
14
15export OMP_NUM_THREADS=1
16
17mpirun -np 64  pw.x -nk 16 -i pw.Fe.scf.in > pw.Fe.scf.out

Input files needed:

 &control
    calculation='scf'
    restart_mode='from_scratch',
    outdir='./tempdir',
    prefix='Femag'
    pseudo_dir="../../pseudo"
    verbosity='high'
 /
 &system
    nosym           = .true.
    noinv           = .true.
    noncolin        = .true.
    lspinorb        = .false.
    ibrav           = 3
    celldm(1)       = 5.406
    nat             = 1
    ntyp            = 1
    ecutwfc         = 40
    occupations     = 'smearing'
    smearing        = 'gaussian'
    degauss         = 0.01
    starting_magnetization(1) = 0.15
 /
 &electrons
    mixing_beta     = 0.3
    conv_thr        = 1.d-9
 /
ATOMIC_SPECIES
Fe  55.85   Fe.pz-n-nc.UPF
ATOMIC_POSITIONS alat
Fe  0.00000000 0.00000000 0.00000000
K_POINTS automatic
 4 4 4 0 0 0


Step-by-step for running the tutorial can be found in the slides linked at the top of this page!