Data compression, a fundamental topic of computer science, is the process of encoding data so that it takes less storage space or less transmission time than it would if it were not compressed. This is possible because most real-world data is very redundant or not most concisely represented in its obvious form.
One very simple means of compression, for example, is run-length encoding, wherein large runs of consecutive identical data values are replaced by a simple code with the data value and length of the run. This is an example of lossless data compression, where the data is compressed in such a way that it can be recovered exactly. For symbolic data such as spreadsheets, text, executable programs, etc., losslessness is essential because changing even a single bit cannot be tolerated (except in some limited cases).
In other kinds of data such as sounds and pictures, a small loss of quality can be tolerated without losing the essential nature of the data, so lossy data compression methods can be used. These frequently offer a range of compression efficiencies, where the user can choose whether he wants highly-compressed data with noticeable loss of quality or higher-quality data with less compression. In particular, compression of images and sounds can take advantage of limitations of the human sensory system to compress data in ways that are lossy, but nearly indistinguishable from the original.
Many data compression systems are best viewed with a four-stage model.
Closely allied with data compression are the fields of coding theory and cryptography. Theoretical background is provided by information theory and algorithmic information theory. When compressing information in the form of signals we often use digital signal processing methods. The idea of data compression is deeply connected with statistical inference and particularly with the maximum likelihood principle.
Data compression topics:
- Lossless Compression
- Run-length encoding
- Minimum-redundancy coding, also called entropy encoding
- Huffman coding (simple entropy coding)
- Arithmetic coding (more advanced entropy coding; encumbered by patents as of October 2001)
- Dictionary coders
- Burrows-Wheeler transform (block sorting preprocessing that makes compression easier)
- Prediction by partial matching (PPM)
- Lossy Compression
- Discrete cosine transform based coders
- Fractal compression
- Wavelet compression
Compression of sounds is generally called audio compression, where methods of psychoacoustics are used to remove non-audible components of the signal to make compression more efficient. Audio compression is therefore lossy compression. Different audio compression standards are listed under audio codecs.
See also:*algorithmic complexity theory, minimum description length, zip, tar, gzip, bzip2