Hyper-threading is Intel's trademark for their implementation of the simultaneous multithreading technology on the Pentium 4 microarchitecture. It is basically a more advanced form of Super-threading that first debuted on the Intel Xeon processors and later added to Pentium 4 processors. The technology improves processor performance under certain workloads by providing useful work for execution units that would otherwise be idle, for example during a cache miss.
The advantages of Hyper-Threading are listed as improved support for multi-threaded code, allowing multiple threads to run simultaneously, improved reaction and response time, and increased number of users a server can support.
Hyper-Threading works by duplicating certain sections of the processor—those that store the architectural state—but not duplicating the main execution resources. This allows a Hyper-Threading equipped processor to pretend to be two "logical" processors to the host operating system, allowing the operating system to schedule two threads or processes simultaneously. Where execution resources in a non-Hyper-Threading capable processor are not used by the current task, and especially when the processor is stalled, a Hyper-Threading equipped processor may use those execution resources to execute the other scheduled task. (Reasons for the processor to stall include a cache miss, a branch misprediction and waiting for results of previous instructions before the current one can be executed.)
Except for its performance implications, this innovation is transparent to operating systems and programs. All that is required to take advantage of Hyper-Threading is SMP support in the Operating System, as the logical processors appear as standard separate processors.
However, it is possible to optimise operating system behaviour on Hyper-Threading capable systems, such as the Linux techniques discussed in Kernel Traffic. (One such optimisation concerns a dual-processor system where both processors are capable of Hyper-Threading. The cost of moving a process from one logical processor to another on the same physical processor is almost nothing, whereas processor affinity provides significant reasons to keep processes on the same physical processor.)
According to Intel, the first implementation only used an additional 5% of the die area over the "normal" processor.