A computer which, among existing general-purpose computers at any given time, is superlative, often in several senses: highest computation rate, largest memory, or highest cost. Predominantly, the term refers to the fastest “number crunchers,” that is, machines designed to perform numerical calculations at the highest speed that the latest electronic device technology and the state of the art of computer architecture allow.
The demand for the ability to execute arithmetic operations at the highest possible rate originated in computer applications areas collectively referred to as scientific computing. Large-scale numerical simulations of physical processes are often needed in fields such as physics, structural mechanics, meteorology, and aerodynamics. A common technique is to compute an approximate numerical solution to a set of partial differential equations which mathematically describe the physical process of interest but are too complex to be solved by formal mathematical methods. This solution is obtained by first superimposing a grid on a region of space, with a set of numerical values attached to each grid point. Large-scale scientific computations of this type often require hundreds of thousands of grid points with 10 or more values attached to each point, with 10 to 500 arithmetic operations necessary to compute each updated value, and hundreds of thousands of time steps over which the computation must be repeated before a steady-state solution is reached. See Computational fluid dynamics, Numerical analysis, Simulation
Two lines of technological advancement have significantly contributed to what roughly amounts to a doubling of the fastest computers' speeds every year since the early 1950s—the steady improvement in electronic device technology and the accumulation of improvements in the architectural designs of digital computers.
Computers incorporate very large-scale integrated (VLSI) circuits with tens of millions of transistors per chip for both logic and memory components. A variety of types of integrated circuitry is used in contemporary supercomputers. Several use high-speed complementary metallic oxide semiconductor (CMOS) technology. Throughout most of the history of digital computing, supercomputers generally used the highest-performance switching circuitry available at the time—which was usually the most exotic and expensive. However, many supercomputers now use the conventional, inexpensive device technology of commodity microprocessors and rely on massive parallelism for their speed. See Computer storage technology, Concurrent processing, Integrated circuits, Logic circuits
Increases in computing speed which are purely due to the architectural structure of a computer can largely be attributed to the introduction of some form of parallelism into the machine's design: two or more operations which were performed one after the other in previous computers can now be performed simultaneously. See Computer systems architecture
Pipelining is a technique which allows several operations to be in progress in the central processing unit at once. The first form of pipelining used was instruction pipelining. Since each instruction must have the same basic sequence of steps performed, namely instruction fetch, instruction decode, operand fetch, and execution, it is feasible to construct an instruction pipeline, where each of these steps happens at a separate stage of the pipeline. The efficiency of the instruction pipeline depends on the likelihood that the program being executed allows a steady stream of instructions to be fetched from contiguous locations in memory.
The central processing unit nearly always has a much faster cycle time than the memory. This implies that the central processing unit is capable of processing data items faster than a memory unit can provide them. Interleaved memory is an organization of memory units which at least partially relieves this problem.
Parallelism within arithmetic and logical circuitry has been introduced in several ways. Adders, multipliers, and dividers now operate in bit-parallel mode, while the earliest machines performed bit-serial arithmetic. Independently operating parallel functional units within the central processing unit can each perform an arithmetic operation such as add, multiply, or shift. Array processing is a form of parallelism in which the instruction execution portion of a central processing unit is replicated several times and connected to its own memory device as well as to a common instruction interpretation and control unit. In this way, a single instruction can be executed at the same time on each of several execution units, each on a different set of operands. This kind of architecture is often referred to as single-instruction stream, multiple-data stream (SIMD).
Vector processing is the term applied to a form of pipelined arithmetic units which are specialized for performing arithmetic operations on vectors, which are uniform, linear arrays of data values. It can be thought of as a type of SIMD processing, since a single instruction invokes the execution of the same operation on every element of the array. See Computer programming, Programming languages
A central processing unit can contain multiple sets of the instruction execution hardware for either scalar or vector instructions. The task of scheduling instructions which can correctly execute in parallel with one another is generally the responsibility of the compiler or special scheduling hardware in the central processing unit. Instruction-level parallelism is almost never visible to the application programmer.
Multiprocessing is a form of parallelism that has complete central processing units operating in parallel, each fetching and executing instructions independently from the others. This type of computer organization is called multiple-instruction stream, multiple-data stream (MIMD). See Multiprocessing