Home / HPC Application Performance of NVIDIA Data Center GPUs

HPC Application Performance of NVIDIA Data Center GPUs

seimaxim datacenters

Some of the world’s most pressing scientific and engineering problems can only be solved using modern HPC data centers. Breakthrough performance can be achieved with dramatically fewer servers, less power consumption, and reduced network overhead, resulting in a total cost savings of 5X to 10X with NVIDIA A100, V100 and T4 GPUs.

The node replacement factor is the number of CPU-only servers that can be replaced by a single GPU-accelerated server (NRF). NRF is calculated by comparing the performance of an application on up to eight CPU-only servers. The NRF is then calculated at a larger scale using linear scaling. Application-specific factors influence NRF.

AMBER

Molecular Dynamics Suite of programs to simulate molecular dynamics on biomolecule

VERSION 20.12-AT_21.12

ACCELERATED FEATURES PMEMD Explicit Solvent and GB Implicit Solvent

SCALABILITY Multi-GPU and Single Node

MORE INFORMATION  http://ambermd.org/GPUSupport.ph

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
AMBER [PME-Cellulose_NPT_4fs]ns/dayDC-Cellulose_NPTyes4.341523036061,2121543076141,228
AMBER [PME-Cellulose_NPT_4fs]NRFDC-Cellulose_NPTyes1x35x70x140x279x35x71x141x283x
AMBER [PME-Cellulose_NVE_4fs]ns/dayDC-Cellulose_NVEyes4.371643286561,3131653316621,324
AMBER [PME-Cellulose_NVE_4fs]NRFDC-Cellulose_NVEyes1x38x75x150x300x38x76x151x303x
AMBER [PME-FactorIX_NPT_4fs]ns/dayDC-FactorIX_NPTyes22.245341,0672,1344,2685171,0332,0664,133
AMBER [PME-FactorIX_NPT_4fs]NRFDC-FactorIX_NPTyes1x24x48x96x192x23x46x93x186x
AMBER [PME-FactorIX_NVE_4fs]ns/dayDC-FactorIX_NVEyes22.745701,1402,2794,5585901,1802,3604,720
AMBER [PME-FactorIX_NVE_4fs]NRFDC-FactorIX_NVEyes1x25x50x100x200x26x52x104x208x
AMBER [PME-JAC_NPT_4fs]ns/dayDC-JAC_NPTyes93.91,1782,3564,7129,4241,2632,5275,05310,106
AMBER [PME-JAC_NPT_4fs]NRFDC-JAC_NPTyes1x13x25x50x100x13x27x54x108x
AMBER [PME-JAC_NVE_4fs]ns/dayDC-JAC_NVEyes95.411,2422,4844,9679,9341,2562,5125,02410,048
AMBER [PME-JAC_NVE_4fs]NRFDC-JAC_NVEyes1x13x26x52x104x13x26x53x105x
AMBER [PME-STMV_NPT_4fs]ns/dayDC-STMV_NPTyes1.435511022144155110220440
AMBER [PME-STMV_NPT_4fs]NRFDC-STMV_NPTyes1x39x77x154x308x38x77x154x307x
AMBER [FEP-GTI_Complex 1fs]ns/dayFEP-GTI_Complexyes10.971452895781,1571513026031,206
AMBER [FEP-GTI_Complex 1fs]NRFFEP-GTI_Complexyes1x13x26x53x105x14x27x55x110x

CHROMA

Physics Lattice Quantum Chromodynamics (LQCD)

VERSION V 2021.05

ACCELERATED FEATURES Wilson-clover fermions, Krylov solvers, Domain-decomposition

SCALABILITY Multi-GPU and Multi-Node

MORE INFORMATION http://jeffersonlab.github.io/chroma/

https://ngc.nvidia.com/catalog/containers/hpc:chroma

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
ChromaTotal Time (Sec)szscl21_24_128no1,11536201174425139
ChromaNRFszscl21_24_128yes1x32x55x99x163x26x46x84x129x

FUN3D

Engineering A suite of tools actively developed at NASA for Aeronautics and Space Technology by modeling fluid flow

VERSION 13.7 (update 1)

ACCELERATED FEATURES Full range of Mach number regimes for the Reynolds-averaged Navier Stokes (RANS) formulation

SCALABILITY Multi-GPU and Single-Node

MORE INFORMATION https://fun3d.larc.nasa.gov

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
FUN3DLoop Time (Sec)dpw_wbt0_crs-3.6Mn_5no5245228161154281612
FUN3DNRFdpw_wbt0_crs-3.6Mn_5yes1x13x24x41x58x12x23x42x53x

GROMACS

Molecular Dynamics Simulation of biochemical molecules with complicated bond interactions

VERSION 2022

ACCELERATED FEATURES Implicit (5x), Explicit (2x) Solvent

SCALABILITY Multi-GPU, Single Node

MORE INFORMATION http://www.gromacs.org https://ngc.nvidia.com/catalog/containers/hpc:gromacs

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GROMACS [ADH Dodec]ns/dayADH Dodecyes58331-598-324370548-
GROMACS [ADH Dodec]NRFADH Dodecyes1x8x-14x-7x8x13x-
GROMACS [Cellulose]ns/dayCelluloseyes179714525029195134169204
GROMACS [Cellulose]NRFCelluloseyes1x9x13x22x26x8x12x15x18x
GROMACS [STMV]ns/daySTMVyes423405911422395780
GROMACS [STMV]NRFSTMVyes1x6x10x15x30x6x10x15x21x

GTC

Physics GTC is used for Gyrokinetic Particle Simulation of Turbulent Transport in Burning Plasmas

VERSION V 4.5 Updated

ACCELERATED FEATURES Push, shift, and collision

SCALABILITY Multi-GPU and Multi-Node

MORE INFORMATION

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
GTCMpush/Secmoi#proc.inyes354899231,8033,5664808931,7483,436
GTCNRFmoi#proc.inyes1x14x27x53x104x14x26x51x100x

ICON

Weather and Climate A global unified atmosphere model for numerical weather prediction and climate modeling research

VERSION 2.6.5_RC

ACCELERATED FEATURES Full model of dynamics and physics

SCALABILITY Multi-GPU and Multi-Node

MORE INFORMATION https://code.mpimet.mpg.de/projects/iconpublic

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB
ICON [SLAM 191 - 160KM - no radiation]Integrate_nh (sec)SLAM 191 levels 160 km resolution without radiationno2,292326242177157331245200
ICON [SLAM 191 - 160KM - no radiation]NRFSLAM 191 levels 160 km resolution without radiationyes1x7x9x13x15x7x9x11x
ICON [QUBICC 160 km resolution]Integrate_nh (sec)SLAM 191 levels 160 km resolution with radiationno2,304302215155139294209155
ICON [QUBICC 160 km resolution]NRFSLAM 191 levels 160 km resolution with radiationyes1x8x11x15x17x8x11x15x

LAMMPS

Molecular Dynamics Classical molecular dynamics package

VERSION patch_4May2022

ACCELERATED FEATURES Lennard-Jones, Gay-Berne, Tersoff, many more potentials

SCALABILITY Multi-GPU and Multi-Node

MORE INFORMATION http://lammps.sandia.gov/index.html https://ngc.nvidia.com/catalog/containers/hpc:lammps

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
LAMMPS [LJ 2.5]ATOM-Time Steps/sLJ 2.5yes1.11E+085.57E+081.07E+092.00E+093.64E+095.44E+081.03E+091.86E+09-
LAMMPS [LJ 2.5]NRFLJ 2.5yes1x5x10x18x34x5x10x17x-
LAMMPS [EAM]ATOM-Time Steps/sEAMyes5.04E+072.77E+084.89E+088.76E+081.53E+092.76E+084.84E+088.19E+08-
LAMMPS [EAM]NRFEAMyes1x6x10x17x30x6x10x16x-
LAMMPS [ReaxFF/C]ATOM-Time Steps/sReaxFF/Cyes4.23E+054.32E+067.98E+061.46E+072.34E+074.29E+067.92E+061.37E+071.76E+07
LAMMPS [ReaxFF/C]NRFReaxFF/Cyes1x17x32x59x94x17x32x55x71x
LAMMPS [SNAP]ATOM-Time Steps/sSNAPyes1.09E+052.11E+064.19E+068.23E+061.57E+072.07E+064.10E+068.21E+061.57E+07
LAMMPS [SNAP]NRFSNAPyes1x19x38x75x143x19x37x75x143x
LAMMPS [Tersoff]ATOM-Time Steps/sTersoffyes2.80E+074.90E+088.98E+081.60E+092.86E+094.76E+088.05E+081.30E+09-
LAMMPS [Tersoff]NRFTersoffyes1x17x32x57x102x17x29x46x-

MILC

Physics Lattice Quantum Chromodynamics (LQCD) codes simulate how elemental particles are formed and bound by the “strong force” to create larger particles like protons and neutrons

VERSION develop_3971e182

ACCELERATED FEATURES Staggered fermions, Krylov solvers, Gauge-link fattening

SCALABILITY Multi-GPU and Multi-Node

MORE INFORMATION https://ngc.nvidia.com/catalog/containers/hpc:milc

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
MILCTotal Time (Sec)Apex Mediumno67,3662,2911,3427033992,3571,369676664
MILCNRFApex Mediumyes1x32x55x105x186x31x54x110x112x

NAMD

Molecular Dynamics Designed for high-performance simulation of large molecular systems

VERSION GPU, AMD CPU V 3.0a11 ; Intel CPU V 2.15a AVX512

ACCELERATED FEATURES Full electrostatics with PME and most simulation features

SCALABILITY Up to 100M atom capable, multi-GPU, single node

MORE INFORMATION http://www.ks.uiuc.edu/Research/namd/ https://ngc.nvidia.com/catalog/containers/hpc:namd

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
NAMD [apoa1_npt_cuda]Ave ns/dayapoa1_npt_cudayes19.121342705421,0721402775561,112
NAMD [apoa1_npt_cuda]NRFapoa1_npt_cudayes1x7x14x28x56x7x14x29x58x
NAMD [apoa1_nptsr_cuda]Ave ns/dayapoa1_nptsr_cudayes19.631412825651,1231452905761,161
NAMD [apoa1_nptsr_cuda]NRFapoa1_nptsr_cudayes1x7x14x29x57x7x15x29x59x
NAMD [apoa1_nve_cuda]Ave ns/dayapoa1_nve_cudayes20.641853637271,4691893787561,506
NAMD [apoa1_nve_cuda]NRFapoa1_nve_cudayes1x9x18x35x71x9x18x37x73x
NAMD [stmv_npt_cuda]Ave ns/daystmv_npt_cudayes1.78142754108142754108
NAMD [stmv_npt_cuda]NRFstmv_npt_cudayes1x8x15x30x61x8x15x30x61x
NAMD [stmv_nptsr_cuda]Ave ns/daystmv_nptsr_cudayes1.8142855111142855111
NAMD [stmv_nptsr_cuda]NRFstmv_nptsr_cudayes1x8x15x31x62x8x16x31x62x
NAMD [stmv_nve_cuda]Ave ns/daystmv_nve_cudayes1.95163264129163264128
NAMD [stmv_nve_cuda]NRFstmv_nve_cudayes1x8x17x33x66x8x16x33x65x

QUANTUM ESPRESSO

Material Science (Quantum Chemistry) An Open-source suite of computer codes for electronic structure calculations and materials modeling at the nanoscale

VERSION V6.7 CPU; V7.0 GPU

ACCELERATED FEATURES linear algebra (matrix multiply), explicit computational kernels, 3D FFTs

SCALABILITY Multi-GPU and Multi-Node

MORE INFORMATION http://www.quantum-espresso.org

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Quantum EspresssoTotal Wall TimeAUSURF112-jRno718118775342120755347
Quantum EspresssoNRFAUSURF112-jRyes1x7x10x15x19x7x11x15x17x

RELION

Microscopy Stand-alone computer program that employs an empirical Bayesian approach to refinement of (multiple) 3D reconstructions or 2D class averages in electron cryo-microscopy (cryo-EM)

VERSION 3.1.3

ACCELERATED FEATURES Reduced memory requirements; high-resolution cryo-EM structure determination in a matter of day on a single workstation

SCALABILITY Multi-GPU and Single Node

MORE INFORMATION https://www2.mrc-lmb.cam.ac.uk/relion/index.php/Download_%26_install

https://ngc.nvidia.com/catalog/containers/hpc:relion

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
Relion [Plasmodium Ribosome]Total Wall Clock (Sec)MB numbers Plasmodium Ribosime on Relion-3.0no12,7423,1101,7811,4581,3133,4011,9941,838-
Relion [Plasmodium Ribosome]NRFMB numbers Plasmodium Ribosime on Relion-3.0yes1x4x7x9x10x4x6x7x-
Relion [Plasmodium Ribosome 2D]Total Wall Clock (Sec)Plasmodium Ribosome (2D)no90,10822,96012,4168,5876,29925,27513,4149,5837,266
Relion [Plasmodium Ribosome 2D]NRFPlasmodium Ribosome (2D)yes1x4x7x10x14x4x7x9x12x

RTM

Geoscience Reverse time migration (RTM) modeling is a critical component in the seismic processing workflow of oil and gas exploration

VERSION nvidia_2021_05

ACCELERATED FEATURES Batch algorithm

SCALABILITY Multi-GPU and Multi-Node

MORE INFORMATION http://www.tsunamidevelopment.com/assets/rtm.pdf

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
RTM [Isotropic Radius 4]Mcells/sIsotropic Radius 4yes11,31889,414178,194356,418712,73185,271170,060339,993679,883
RTM [Isotropic Radius 4]NRFIsotropic Radius 4yes1x8x16x31x63x8x15x30x60x
RTM [TTI Radius 8 1-pass]Mcells/sTTI Radius 8 1-passyes3,77312,82825,46250,863100,87112,86425,68851,313102,502
RTM [TTI Radius 8 1-pass]NRFTTI Radius 8 1-passyes1x3x7x13x27x3x7x14x27x
RTM [TTI RX 2Pass mgpu]Mcells/sTTI RX 2Pass mgpuyes3,77313,91027,43754,323107,53313,58927,04953,755107,254
RTM [TTI RX 2Pass mgpu]NRFTTI RX 2Pass mgpuyes1x4x7x14x29x4x7x14x28x

SPECFEM3D

Geoscience Simulates Seismic wave propagation

VERSION devel_44e098a3

ACCELERATED FEATURES OpenCL and CUDA hardware accelerators, based on an automatic source-to-source transformation library

SCALABILITY Multi-GPU and Single-Node

MORE INFORMATION https://geodynamics.org/cig/software/specfem3d/

ApplicationMetricTest ModulesBigger is betterDual Cascade Lake 6240 (CPU-Only)1x A100 SXM 80GB2x A100 SXM 80GB4x A100 SXM 80GB8x A100 SXM 80GB1x A100 PCIe 80GB2x A100 PCIe 80GB4x A100 PCIe 80GB8x A100 PCIe 80GB
SPECFEM3DTotal Time (Sec)four_material_simple_modelno1,9207640221477402116
SPECFEM3DNRFfour_material_simple_modelyes1x29x54x101x155x28x55x102x135x

Leave a Reply