A Fused Multiply Add computes a multiply-accumulate

FMA(A,B,C) = A*B+C

with a single rounding of floating point numbers.

When implemented in a microprocessor this is typically faster than a multiply operation followed by an add. It also allows for getting the bottom half of the multiplication. E.g.,

  • H = FMA(A,B,0.0)
  • L = FMA(A,B,-H)

This is implemented on the PowerPC and Itanium processor families. Because of this instruction there is no need for a hardware divide or square root unit since they can both be implemented using the FMA in software.

The FMA operation will likely be added to IEEE 754 in IEEE 754r.