Hopper GPU Accelerates Dynamic Programming Up to 40x Using DPX Instructions


Dynamic programming, which is used in algorithms for genomics, quantum computing, route optimization, and more, can be sped up by up to 40% with new DPX instructions in the NVIDIA Hopper GPU architecture that was shown off today at GTC.

An instruction set built into NVIDIA H100 GPUs called DPX will help developers write code that can speed up dynamic programming algorithms in a wide range of industries, like disease diagnosis and quantum simulation.

What is Dynamic Programming?

Dynamic programming was first used in the 1950s to solve difficult problems. It uses two important techniques: recursion and memoization.

Recursion is a way to break down a problem into smaller, easier problems, which saves both time and money. These sub-problems, which are used a lot when solving the main problem, are kept in mind when memorizing. Using a technique called “memorization,” you can speed things up by not having to recalculate subproblems when they come up later in the main problem.

Up to 7x faster: Dynamic programming algorithms run more quickly on an NVIDIA H100 GPU than on NVIDIA Ampere-based GPUs. That speed can be even faster in a node with four NVIDIA H100 GPUs.

Use Cases

It is used in many optimizations, data processing, and omics algorithms. Until now, most developers have run these kinds of algorithms on CPUs or FPGAs. But with DPX instructions on NVIDIA Hopper GPUs, they can get huge speedups.


Omics is a term for a lot of different types of biology, like genomics, proteomics, and transcriptomics (focused on RNA). They all use algorithms that can be sped up with DPX instructions, which are used in important work like disease research and drug discovery.

For example, the Smith-Waterman and Needleman-Wunsch dynamic programming algorithms are used to align DNA sequences, classify proteins, and fold proteins, for example. Scores are used in both ways to see if the genetic sequences of different samples match up well.

Smith-Waterman is a good way to get accurate results, but it takes more computing resources and time than other alignment methods. A node with four NVIDIA H100 GPUs can speed up this process by 35 times. This means that the work of base calling and aligning can be done at the same rate as DNA sequencing.

This will help make genomic analysis more common in hospitals around the world, which will help scientists get closer to giving patients personalized medicine.

Route Optimization

It’s important for autonomous robots to find the best way to move through a dynamic warehouse, or even for a sender to send data to multiple receivers in a computer network.

Developers use Floyd-Warshall, a dynamic programming algorithm, to solve this optimization problem. Floyd-Warshall is used to find the shortest distances between all pairs of destinations on a map or graph, like a graph or a map. In a server with four NVIDIA H100 GPUs, Floyd-Warshall acceleration is 40 times faster than in a server with a traditional dual-socket CPU.

For real-time applications in factories, self-driving cars or algorithms that map and route in abstract graphs, this speedup could be used with NVIDIA cuOpt AI logistics.

Quantum Simulation

DPX instructions on NVIDIA H100 GPUs could be used to speed up many other dynamic programming algorithms. There is a lot of promise in quantum computing, where dynamic programming is used in tensor optimization algorithms to make quantum simulations. DPX instructions could help developers speed up the process of finding the right way to shrink tensors.

SQL Query Optimization

Another possible use is in data science. There are often a lot of “join” operations that must be done when data scientists use the SQL programming language to work with a group of tables. Dynamic programming helps find the best way to join these two databases, which can save a lot of time and speed up SQL queries.

Leave a Reply