Understanding the NVIDIA FP8 format

Reviewed on 14 October 2024 • Published on 23 October 2023

Scaleway offers GPU Instances featuring L4, L40S and H100 GPUs that support FP8 (8-bit floating point), a revolutionary datatype introduced by NVIDIA. It enables higher throughput of matrix multipliers and convolutions.

FP8 is an 8-bit floating point standard which was jointly developed by NVIDIA, ARM, and Intel to speed up AI development by improving memory efficiency during AI training and inference processes.

The ongoing industry evolution is evident in the shift from 32-bit to 16-bit, and currently, to 8-bit precision formats. This change is especially advantageous for transformer networks, crucial AI breakthroughs, as they perform better with the detailed precision provided by 8-bit floating point formats.

This uniform standard ensures compatibility across diverse hardware and software environments, thus driving computing capabilities to unprecedented levels.

FP8 sets forth two distinct eight-bit floating-point formats, E5M2 and E4M3, to facilitate interoperability between different hardware platforms. It aligns closely with the current IEEE 754 standards for floating-point computations, achieving an ideal harmony between hardware efficiency and software performance. This strategy seeks to leverage existing frameworks, speed up uptake, and boost developer efficiency.

The E5M2 format adapts the IEEE FP16 format, allocating five bits to the exponent and two bits to the mantissa. The E4M3 format assigns four bits for the exponent and three bits for the mantissa with slight adjustments. These two eight-bit configurations are designed to optimize both training and inference phases, promising to cut down computational loads in comparison to their more precise counterparts.

The FP8 standard preserves accuracy comparable to 16-bit formats across a wide range of applications, architectures, and networks.

For more information about the FP8 standard, and instructions how to use it with H100 GPU Instances, refer to NVIDIA’s offical FP8 documentation and the code example repository.