Static single assignment form

In compilers, static single assignment form, more often abbreviated SSA form or just SSA, is an intermediate representation in which every variable is assigned exactly once. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript, so that every definition gets its own version. In SSA form, use-def chains are explicit and each contain a single element.

SSA was developed by researchers at IBM in the 80's.

Table of contents

1 Benefits of SSA
2 Converting to SSA

2.1 Introduction
2.2 Computing minimal SSA using dominance frontiers

Benefits of SSA

The primary usefulness of SSA comes from how it simultaneously simplifies and improves the results of a variety of compiler optimizations, by simplifying the properties of variables. For example, consider this piece of code:

y := 1
y := 2
x := y

As humans, we can see that the first assignment isn't necessary, and that the value of y being used in the third line comes from the second assignment of y. A program would have to perform reaching definition analysis to determine this. But if the program is in SSA form, both of these are immediate:

y₁ := 1
y₂ := 2
x₁ := y₂

Compiler optimization algorithms which are either permitted or strongly enhanced by the use of SSA include:

constant propagation
dead code elimination
global value numbering
partial redundancy elimination
register allocation
sparse conditional constant propagation

Converting to SSA

Introduction

Converting ordinary code into SSA form is primarily a simple matter of replacing the target of each assignment with a new variable, and replacing each use of a variable with the "version" of the variable reaching that point. For example, consider the following control flow graph:

Notice that we could change the name on the left side of "x x - 3", and change the following uses of x to use that new name, and the program would still do the same thing. We exploit this in SSA by create two new variables, x₁ and x₂, each of which is assigned only once. We likewise give distinguishing subscripts to all the other variables, and we get this:

We've figured out which definition each use is referring to, except for one thing: the uses of y in the bottom block could be referring to either y₁ or y₂, depending on which way the control flow came from. So how do we know which one to use?

The answer is that we add a special statement, called a φ (phi) function, to the beginning of the last block. This statement will generate a new definition of y, y₃, by "choosing" either y₁ or y₂, depending on which arrow control arrived from:

Now, the uses of y in the last block can simply use y₃, and they'll obtain the correct value either way. You might ask at this point, do we need to add a φ function for x too? The answer is no; only one version of x, namely x₂ is reaching this place, so there's no problem.

A more general question along the same lines is, given an arbitrary control flow graph, how can I tell where to insert φ functions, and for what variables? This is a difficult question, but one that has an efficient solution that can be computed using a concept called dominance frontiers.

Computing minimal SSA using dominance frontiers

More needed here describing the dominance-frontier based algorithm for computing "minimal" SSA, and other things.