In computer science, a reference is a small object containing information which refers to data elsewhere, as opposed to containing the data itself. Accessing the value that a reference refers to is called dereferencing it. References are fundamental in constructing many data structures and in exchanging information between different parts of a program.

References also increase flexibility in where objects can be stored, how they are allocated, and how they are passed between areas of code. As long as we can access a reference to the data, we can access the data through it, and the data itself need not be moved. They also make sharing of data between different code areas easier; each keeps a reference to it.

The mechanism of references, if varying in implementation, is a fundamental programming language feature common to nearly all modern programming languages. Even some languages that support no direct use of references have some internal or implicit use. For example, the call by reference calling convention can be implemented with either explicit or implicit use of references.

Pointers are the most primitive but also one of the most powerful and efficient - and dangerous - types of references, storing only the address of an object in memory. Smart pointers are opaque data structures that act like pointers but can only be accessed through particular methods.

External and internal storage

In many data structures, large, complex objects are composed of smaller objects. These objects are typically stored in one of two ways:

Internal storage is usually more efficient, because there is a space cost for the references and dynamic allocation metadata, and a time cost associated with dereferencing a reference and with allocating the memory for the smaller objects. Internal storage also enhances locality of reference by keeping different parts of the same large object close together in memory. However, there are a variety of situations in which external storage is preferred:

  • If the data structure is recursive, meaning it may contain itself. This cannot be represented in the internal way.
  • If the larger object is being stored in an area with limited space, such as the stack, then we can prevent running out of storage by storing large component objects in another memory region and referring to them using references.
  • If the smaller objects may vary in size, it's often inconvenient or expensive to resize the larger object so that it can still contain it.
  • References are often easier to work with and adapt better to new requirements.

External links

  • Pointer Fun With Binky Introduction to pointers in a 3 minute educational video - Stanford Computer Science Education Library