:-) GROMACS - gmx mdrun, 2021.3 (-:

                            GROMACS is written by:
     Andrey Alekseenko              Emile Apol              Rossen Apostolov     
         Paul Bauer           Herman J.C. Berendsen           Par Bjelkmar       
       Christian Blau           Viacheslav Bolnykh             Kevin Boyd        
     Aldert van Buuren           Rudi van Drunen             Anton Feenstra      
    Gilles Gouaillardet             Alan Gray               Gerrit Groenhof      
       Anca Hamuraru            Vincent Hindriksen          M. Eric Irrgang      
      Aleksei Iupinov           Christoph Junghans             Joe Jordan        
    Dimitrios Karkoulis            Peter Kasson                Jiri Kraus        
      Carsten Kutzner              Per Larsson              Justin A. Lemkul     
       Viveca Lindahl            Magnus Lundborg             Erik Marklund       
        Pascal Merz             Pieter Meulenhoff            Teemu Murtola       
        Szilard Pall               Sander Pronk              Roland Schulz       
       Michael Shirts            Alexey Shvetsov             Alfons Sijbers      
       Peter Tieleman              Jon Vincent              Teemu Virolainen     
     Christian Wennberg            Maarten Wolf              Artem Zhmurov       
                           and the project leaders:
        Mark Abraham, Berk Hess, Erik Lindahl, and David van der Spoel

Copyright (c) 1991-2000, University of Groningen, The Netherlands.
Copyright (c) 2001-2019, The GROMACS development team at
Uppsala University, Stockholm University and
the Royal Institute of Technology, Sweden.
check out http://www.gromacs.org for more information.

GROMACS is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License
as published by the Free Software Foundation; either version 2.1
of the License, or (at your option) any later version.

GROMACS:      gmx mdrun, version 2021.3
Executable:   /veracruz/projects/t/training/gromacs-2021.3/bin/gmx
Data prefix:  /veracruz/projects/t/training/gromacs-2021.3
Working dir:  /veracruz/home/m/mabraham/git/gromacs-gpu-performance-master/content/exercises/using-pme
Process ID:   174568
Command line:
  gmx mdrun -ntmpi 1 -noconfout -resetstep 10000 -g manual-nb -nb gpu -pme cpu

GROMACS version:    2021.3
Verified release checksum is c5bf577cc74de0e05106b7b6426476abb7f6530be7b4a2c64f637d6a6eca8fcb
Precision:          mixed
Memory model:       64 bit
MPI library:        thread_mpi
OpenMP support:     enabled (GMX_OPENMP_MAX_THREADS = 64)
GPU support:        CUDA
SIMD instructions:  AVX_512
FFT library:        fftw-3.3.9-sse2-avx
RDTSCP usage:       enabled
TNG support:        enabled
Hwloc support:      disabled
Tracing support:    disabled
C compiler:         /apps/software/GCCcore/9.3.0/bin/cc GNU 9.3.0
C compiler flags:   -mavx512f -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -O3 -DNDEBUG
C++ compiler:       /apps/software/GCCcore/9.3.0/bin/c++ GNU 9.3.0
C++ compiler flags: -mavx512f -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -fopenmp -O3 -DNDEBUG
CUDA compiler:      /apps/software/CUDAcore/11.0.2/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver;Copyright (c) 2005-2020 NVIDIA Corporation;Built on Thu_Jun_11_22:26:38_PDT_2020;Cuda compilation tools, release 11.0, V11.0.194;Build cuda_11.0_bu.TC445_37.28540450_0
CUDA compiler flags:-std=c++17;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-Wno-deprecated-gpu-targets;-gencode;arch=compute_35,code=compute_35;-gencode;arch=compute_53,code=compute_53;-gencode;arch=compute_80,code=compute_80;-use_fast_math;;-mavx512f -mfma -Wno-missing-field-initializers -fexcess-precision=fast -funroll-all-loops -pthread -fopenmp -O3 -DNDEBUG
CUDA driver:        11.10
CUDA runtime:       11.0


Running on 1 node with total 80 cores, 80 logical cores, 1 compatible GPU
Hardware detected:
  CPU info:
    Vendor: Intel
    Brand:  Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz
    Family: 6   Model: 85   Stepping: 4
    Features: aes apic avx avx2 avx512f avx512cd avx512bw avx512vl avx512secondFMA clfsh cmov cx8 cx16 f16c fma hle htt intel lahf mmx msr nonstop_tsc pcid pclmuldq pdcm pdpe1gb popcnt pse rdrnd rdtscp rtm sse2 sse3 sse4.1 sse4.2 ssse3 tdt x2apic
    Number of AVX-512 FMA units: 2
  Hardware topology: Only logical processor count
  GPU info:
    Number of GPUs detected: 1
    #0: NVIDIA Tesla V100-PCIE-16GB, compute cap.: 7.0, ECC: yes, stat: compatible


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, E.
Lindahl
GROMACS: High performance molecular simulations through multi-level
parallelism from laptops to supercomputers
SoftwareX 1 (2015) pp. 19-25
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Páll, M. J. Abraham, C. Kutzner, B. Hess, E. Lindahl
Tackling Exascale Software Challenges in Molecular Dynamics Simulations with
GROMACS
In S. Markidis & E. Laure (Eds.), Solving Software Challenges for Exascale 8759 (2015) pp. 3-27
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Pronk, S. Páll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R.
Shirts, J. C. Smith, P. M. Kasson, D. van der Spoel, B. Hess, and E. Lindahl
GROMACS 4.5: a high-throughput and highly parallel open source molecular
simulation toolkit
Bioinformatics 29 (2013) pp. 845-54
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and C. Kutzner and D. van der Spoel and E. Lindahl
GROMACS 4: Algorithms for highly efficient, load-balanced, and scalable
molecular simulation
J. Chem. Theory Comput. 4 (2008) pp. 435-447
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
D. van der Spoel, E. Lindahl, B. Hess, G. Groenhof, A. E. Mark and H. J. C.
Berendsen
GROMACS: Fast, Flexible and Free
J. Comp. Chem. 26 (2005) pp. 1701-1719
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
E. Lindahl and B. Hess and D. van der Spoel
GROMACS 3.0: A package for molecular simulation and trajectory analysis
J. Mol. Mod. 7 (2001) pp. 306-317
-------- -------- --- Thank You --- -------- --------


++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
H. J. C. Berendsen, D. van der Spoel and R. van Drunen
GROMACS: A message-passing parallel molecular dynamics implementation
Comp. Phys. Comm. 91 (1995) pp. 43-56
-------- -------- --- Thank You --- -------- --------


++++ PLEASE CITE THE DOI FOR THIS VERSION OF GROMACS ++++
https://doi.org/10.5281/zenodo.5053201
-------- -------- --- Thank You --- -------- --------


The number of OpenMP threads was set by environment variable OMP_NUM_THREADS to 20

Input Parameters:
   integrator                     = md
   tinit                          = 0
   dt                             = 0.002
   nsteps                         = 20000
   init-step                      = 0
   simulation-part                = 1
   mts                            = false
   comm-mode                      = Linear
   nstcomm                        = 100
   bd-fric                        = 0
   ld-seed                        = 1993
   emtol                          = 10
   emstep                         = 0.01
   niter                          = 20
   fcstep                         = 0
   nstcgsteep                     = 1000
   nbfgscorr                      = 10
   rtpi                           = 0.05
   nstxout                        = 0
   nstvout                        = 0
   nstfout                        = 0
   nstlog                         = 0
   nstcalcenergy                  = 10
   nstenergy                      = 0
   nstxout-compressed             = 0
   compressed-x-precision         = 1000
   cutoff-scheme                  = Verlet
   nstlist                        = 10
   pbc                            = xyz
   periodic-molecules             = false
   verlet-buffer-tolerance        = 0.005
   rlist                          = 1
   coulombtype                    = PME
   coulomb-modifier               = Potential-shift
   rcoulomb-switch                = 0
   rcoulomb                       = 1
   epsilon-r                      = 1
   epsilon-rf                     = 1
   vdw-type                       = Cut-off
   vdw-modifier                   = Potential-shift
   rvdw-switch                    = 0
   rvdw                           = 1
   DispCorr                       = EnerPres
   table-extension                = 1
   fourierspacing                 = 0.12
   fourier-nx                     = 96
   fourier-ny                     = 96
   fourier-nz                     = 128
   pme-order                      = 4
   ewald-rtol                     = 1e-05
   ewald-rtol-lj                  = 0.001
   lj-pme-comb-rule               = Geometric
   ewald-geometry                 = 0
   epsilon-surface                = 0
   tcoupl                         = V-rescale
   nsttcouple                     = 10
   nh-chain-length                = 0
   print-nose-hoover-chain-variables = false
   pcoupl                         = No
   pcoupltype                     = Semiisotropic
   nstpcouple                     = -1
   tau-p                          = 5
   compressibility (3x3):
      compressibility[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compressibility[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      compressibility[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   ref-p (3x3):
      ref-p[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   refcoord-scaling               = COM
   posres-com (3):
      posres-com[0]= 0.00000e+00
      posres-com[1]= 0.00000e+00
      posres-com[2]= 0.00000e+00
   posres-comB (3):
      posres-comB[0]= 0.00000e+00
      posres-comB[1]= 0.00000e+00
      posres-comB[2]= 0.00000e+00
   QMMM                           = false
qm-opts:
   ngQM                           = 0
   constraint-algorithm           = Lincs
   continuation                   = false
   Shake-SOR                      = false
   shake-tol                      = 0.0001
   lincs-order                    = 4
   lincs-iter                     = 1
   lincs-warnangle                = 45
   nwall                          = 0
   wall-type                      = 9-3
   wall-r-linpot                  = -1
   wall-atomtype[0]               = -1
   wall-atomtype[1]               = -1
   wall-density[0]                = 0
   wall-density[1]                = 0
   wall-ewald-zfac                = 3
   pull                           = false
   awh                            = false
   rotation                       = false
   interactiveMD                  = false
   disre                          = No
   disre-weighting                = Conservative
   disre-mixed                    = false
   dr-fc                          = 1000
   dr-tau                         = 0
   nstdisreout                    = 100
   orire-fc                       = 0
   orire-tau                      = 0
   nstorireout                    = 100
   free-energy                    = no
   cos-acceleration               = 0
   deform (3x3):
      deform[    0]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    1]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
      deform[    2]={ 0.00000e+00,  0.00000e+00,  0.00000e+00}
   simulated-tempering            = false
   swapcoords                     = no
   userint1                       = 0
   userint2                       = 0
   userint3                       = 0
   userint4                       = 0
   userreal1                      = 0
   userreal2                      = 0
   userreal3                      = 0
   userreal4                      = 0
   applied-forces:
     electric-field:
       x:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
       y:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
       z:
         E0                       = 0
         omega                    = 0
         t0                       = 0
         sigma                    = 0
     density-guided-simulation:
       active                     = false
       group                      = protein
       similarity-measure         = inner-product
       atom-spreading-weight      = unity
       force-constant             = 1e+09
       gaussian-transform-spreading-width = 0.2
       gaussian-transform-spreading-range-in-multiples-of-width = 4
       reference-density-filename = reference.mrc
       nst                        = 1
       normalize-densities        = true
       adaptive-force-scaling     = false
       adaptive-force-scaling-time-constant = 4
grpopts:
   nrdf:     67964.4     49247.5      196258
   ref-t:         310         310         310
   tau-t:         0.1         0.1         0.1
annealing:          No          No          No
annealing-npoints:           0           0           0
   acc:	           0           0           0
   nfreeze:           N           N           N
   energygrp-flags[  0]: 0

Changing nstlist from 10 to 100, rlist from 1 to 1.132

1 GPU selected for this run.
Mapping of GPU IDs to the 1 GPU task in the 1 rank on this node:
  PP:0
PP tasks will do (non-perturbed) short-ranged and most bonded interactions on the GPU
PP task will update and constrain coordinates on the CPU
Using 1 MPI thread

Non-default thread affinity set, disabling internal thread affinity

Using 20 OpenMP threads 


The -resetstep functionality is deprecated, and may be removed in a future version.
System total charge: 0.000
Will do PME sum in reciprocal space for electrostatic interactions.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee and L. G. Pedersen 
A smooth particle mesh Ewald method
J. Chem. Phys. 103 (1995) pp. 8577-8592
-------- -------- --- Thank You --- -------- --------

Using a Gaussian width (1/beta) of 0.320163 nm for Ewald
Potential shift: LJ r^-12: -1.000e+00 r^-6: -1.000e+00, Ewald -1.000e-05
Initialized non-bonded Ewald tables, spacing: 9.33e-04 size: 1073

Generated table with 1066 data points for 1-4 COUL.
Tabscale = 500 points/nm
Generated table with 1066 data points for 1-4 LJ6.
Tabscale = 500 points/nm
Generated table with 1066 data points for 1-4 LJ12.
Tabscale = 500 points/nm

Long Range LJ corr.: <C6> 6.7202e-04


Using GPU 8x8 nonbonded short-range kernels

Using a dual 8x8 pair-list setup updated with dynamic, rolling pruning:
  outer list: updated every 100 steps, buffer 0.132 nm, rlist 1.132 nm
  inner list: updated every  14 steps, buffer 0.001 nm, rlist 1.001 nm
At tolerance 0.005 kJ/mol/ps per atom, equivalent classical 1x1 list would be:
  outer list: updated every 100 steps, buffer 0.283 nm, rlist 1.283 nm
  inner list: updated every  14 steps, buffer 0.057 nm, rlist 1.057 nm

Using Lorentz-Berthelot Lennard-Jones combination rule
Removing pbc first time

Initializing LINear Constraint Solver

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
B. Hess and H. Bekker and H. J. C. Berendsen and J. G. E. M. Fraaije
LINCS: A Linear Constraint Solver for molecular simulations
J. Comp. Chem. 18 (1997) pp. 1463-1472
-------- -------- --- Thank You --- -------- --------

The number of constraints is 13620

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
S. Miyamoto and P. A. Kollman
SETTLE: An Analytical Version of the SHAKE and RATTLE Algorithms for Rigid
Water Models
J. Comp. Chem. 13 (1992) pp. 952-962
-------- -------- --- Thank You --- -------- --------


The -noconfout functionality is deprecated, and may be removed in a future version.

++++ PLEASE READ AND CITE THE FOLLOWING REFERENCE ++++
G. Bussi, D. Donadio and M. Parrinello
Canonical sampling through velocity rescaling
J. Chem. Phys. 126 (2007) pp. 014101
-------- -------- --- Thank You --- -------- --------

There are: 141677 Atoms

Constraining the starting coordinates (step 0)

Constraining the coordinates at t0-dt (step 0)
Center of mass motion removal mode is Linear
We have the following groups for center of mass motion removal:
  0:  System
RMS relative constraint deviation after constraining: 3.88e-06
Initial temperature: 279.104 K

Started mdrun on rank 0 Thu Sep  9 10:25:46 2021

           Step           Time
              0        0.00000

   Energies (kJ/mol)
           Bond          Angle    Proper Dih. Ryckaert-Bell.  Improper Dih.
    8.51519e+02    8.91827e+04    8.15628e+04    1.56545e+04    1.81361e+03
  Improper Dih.          LJ-14     Coulomb-14        LJ (SR)  Disper. corr.
    3.30515e+03    4.75419e+04    3.40936e+05    7.67685e+04   -3.46877e+04
   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.   Total Energy
   -2.51729e+06    2.10380e+04   -1.87332e+06    3.63069e+05   -1.51025e+06
  Conserved En.    Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
   -1.51025e+06    2.78605e+02   -3.54583e+02    2.48652e+03    3.78959e-06

step  900: timed with pme grid 96 96 128, coulomb cutoff 1.000: 1554.6 M-cycles
step 1100: timed with pme grid 84 84 112, coulomb cutoff 1.116: 1305.7 M-cycles
step 1300: timed with pme grid 72 72 96, coulomb cutoff 1.302: 1075.1 M-cycles
step 1500: timed with pme grid 64 64 84, coulomb cutoff 1.472: 986.6 M-cycles
step 1500: the maximum allowed grid scaling limits the PME load balancing to a coulomb cut-off of 1.563
step 1700: timed with pme grid 60 60 80, coulomb cutoff 1.563: 942.1 M-cycles
step 1900: timed with pme grid 64 64 80, coulomb cutoff 1.545: 956.2 M-cycles
step 2100: timed with pme grid 64 64 84, coulomb cutoff 1.472: 984.6 M-cycles
step 2300: timed with pme grid 64 64 96, coulomb cutoff 1.465: 1019.2 M-cycles
step 2500: timed with pme grid 80 80 96, coulomb cutoff 1.288: 1142.7 M-cycles
step 2700: timed with pme grid 80 80 100, coulomb cutoff 1.236: 1164.7 M-cycles
step 2900: timed with pme grid 80 80 104, coulomb cutoff 1.189: 1192.7 M-cycles
step 3100: timed with pme grid 80 80 108, coulomb cutoff 1.172: 1206.2 M-cycles
step 3300: timed with pme grid 84 84 108, coulomb cutoff 1.145: 1264.9 M-cycles
              optimal pme grid 60 60 80, coulomb cutoff 1.563

step 10000: resetting all time and cycle counters
Restarted time on rank 0 Thu Sep  9 10:26:43 2021

           Step           Time
          20000       40.00000

   Energies (kJ/mol)
           Bond          Angle    Proper Dih. Ryckaert-Bell.  Improper Dih.
    4.39138e+04    8.77100e+04    8.18219e+04    1.53269e+04    1.91457e+03
  Improper Dih.          LJ-14     Coulomb-14        LJ (SR)  Disper. corr.
    3.28573e+03    4.34871e+04    3.30255e+05    7.62739e+04   -3.46877e+04
   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.   Total Energy
   -2.49297e+06    4.32923e+03   -1.83934e+06    4.02728e+05   -1.43661e+06
  Conserved En.    Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
   -1.50946e+06    3.09038e+02   -3.54583e+02    9.00010e+01    3.91816e-06


Energy conservation over simulation part #1 of length 40 ns, time 0 to 40 ns
  Conserved energy drift: 1.40e-04 kJ/mol/ps per atom


	<======  ###############  ==>
	<====  A V E R A G E S  ====>
	<==  ###############  ======>

	Statistics over 20001 steps using 2001 frames

   Energies (kJ/mol)
           Bond          Angle    Proper Dih. Ryckaert-Bell.  Improper Dih.
    4.09673e+04    8.82822e+04    8.19917e+04    1.66794e+04    1.88034e+03
  Improper Dih.          LJ-14     Coulomb-14        LJ (SR)  Disper. corr.
    3.32440e+03    4.36054e+04    3.30595e+05    7.66853e+04   -3.46877e+04
   Coulomb (SR)   Coul. recip.      Potential    Kinetic En.   Total Energy
   -2.49512e+06    5.60827e+03   -1.84019e+06    4.03868e+05   -1.43632e+06
  Conserved En.    Temperature Pres. DC (bar) Pressure (bar)   Constr. rmsd
   -1.50984e+06    3.09912e+02   -3.54583e+02    1.21126e+02    0.00000e+00

   Total Virial (kJ/mol)
    1.28727e+05    3.38679e+02   -2.48871e+01
    3.38283e+02    1.28160e+05   -4.91087e+00
   -2.40573e+01   -3.71763e+00    1.29182e+05

   Pressure (bar)
    1.27361e+02   -7.43006e+00    8.13203e-01
   -7.42198e+00    1.38735e+02    5.49980e-01
    7.96263e-01    5.25619e-01    9.72825e+01

      T-Protein         T-DOPCT-Water_and_ions
    3.09767e+02    3.09128e+02    3.10160e+02


       P P   -   P M E   L O A D   B A L A N C I N G

 NOTE: The PP/PME load balancing was limited by the maximum allowed grid scaling,
       you might not have reached a good load balance.

 PP/PME load balancing changed the cut-off and PME settings:
           particle-particle                    PME
            rcoulomb  rlist            grid      spacing   1/beta
   initial  1.000 nm  1.001 nm      96  96 128   0.117 nm  0.320 nm
   final    1.563 nm  1.564 nm      60  60  80   0.188 nm  0.500 nm
 cost-ratio           3.81             0.24
 (note that these numbers concern only part of the total PP and PME load)


	M E G A - F L O P S   A C C O U N T I N G

 NB=Group-cutoff nonbonded kernels    NxN=N-by-N cluster Verlet kernels
 RF=Reaction-Field  VdW=Van der Waals  QSTab=quadratic-spline table
 W3=SPC/TIP3p  W4=TIP4p (single or pairs)
 V&F=Potential and force  V=Potential only  F=Force only

 Computing:                               M-Number         M-Flops  % Flops
-----------------------------------------------------------------------------
 Pair Search distance check            2192.954896       19736.594     0.0
 NxN QSTab Elec. + LJ [F]           2692271.571840   110383134.445    84.9
 NxN QSTab Elec. + LJ [V&F]          299440.249472    17666974.719    13.6
 Calc Weights                          4250.735031      153026.461     0.1
 Spread Q Bspline                     90682.347328      181364.695     0.1
 Gather F Bspline                     90682.347328      544094.084     0.4
 3D-FFT                              104472.126168      835777.009     0.6
 Solve PME                               36.003600        2304.230     0.0
 Shift-X                                 14.309377          85.856     0.0
 Virial                                 141.863722        2553.547     0.0
 Stop-CM                                 14.309377         143.094     0.0
 Calc-Ekin                              283.495677        7654.383     0.0
 Lincs                                  136.213620        8172.817     0.0
 Lincs-Mat                              715.871580        2863.486     0.0
 Constraint-V                          1251.905178       11267.147     0.0
 Constraint-Vir                         111.669558        2680.069     0.0
 Settle                                 326.492646      120802.279     0.1
-----------------------------------------------------------------------------
 Total                                               129942634.917   100.0
-----------------------------------------------------------------------------


     R E A L   C Y C L E   A N D   T I M E   A C C O U N T I N G

On 1 MPI rank, each using 20 OpenMP threads

 Computing:          Num   Num      Call    Wall time         Giga-Cycles
                     Ranks Threads  Count      (s)         total sum    %
-----------------------------------------------------------------------------
 Neighbor search        1   20        101       1.169         55.957   3.0
 Launch GPU ops.        1   20      10001       0.797         38.162   2.0
 Force                  1   20      10001       0.408         19.514   1.0
 PME mesh               1   20      10001      29.177       1397.202  73.8
 Wait Bonded GPU        1   20       1001       0.035          1.656   0.1
 Wait GPU NB local      1   20      10001       0.236         11.321   0.6
 NB X/F buffer ops.     1   20      19901       3.452        165.284   8.7
 Update                 1   20      10001       1.808         86.571   4.6
 Constraints            1   20      10001       1.773         84.886   4.5
 Rest                                           0.661         31.658   1.7
-----------------------------------------------------------------------------
 Total                                         39.514       1892.211 100.0
-----------------------------------------------------------------------------
 Breakdown of PME mesh computation
-----------------------------------------------------------------------------
 PME spread             1   20      10001      14.416        690.352  36.5
 PME gather             1   20      10001       9.493        454.568  24.0
 PME 3D-FFT             1   20      20002       4.644        222.376  11.8
 PME solve Elec         1   20      10001       0.546         26.143   1.4
-----------------------------------------------------------------------------

               Core t (s)   Wall t (s)        (%)
       Time:      790.275       39.514     2000.0
                 (ns/day)    (hour/ns)
Performance:       43.735        0.549
Finished mdrun on rank 0 Thu Sep  9 10:27:23 2021