CUDA

(redirected from Compute Unified Device Architecture)
Also found in: Acronyms.

CUDA

The architecture of the NVIDIA graphics processing unit (GPU), starting with its GeForce 8 chips. The CUDA programming interface (API) exposes the inherent parallel processing capabilities of the GPU to the developer and enables scientific and financial applications to be run on the graphics GPU chip rather than the CPU (see GPGPU).

CUDA also supports NVIDIA's PhysX physics simulation algorithms for game developers who want to create more realism in their video games (see PhysX). CUDA was originally an acronym for Compute Unified Device Architecture.

CUDA C/C++ and CUDA Fortran
CUDA operations are programmed in traditional programming languages. C/C++ and Fortran source code is compiled with NVIDIA's own CUDA compilers for each language. The CUDA Fortran compiler was developed by the Portland Group (PGI), which was acquired by NVIDIA. See GPU.
References in periodicals archive ?
Developers are primarily monitoring the global memory's bandwidth while memory latency is compensated on the Compute Unified Device Architecture by invoking threads originating in different warps.
An important metric that must be taken into account for when developing Compute Unified Device Architecture applications is represented by the ratio between the number of memory operations and the number of arithmetic ones.
The latest Pascal GP100 streaming multiprocessor architecture brings significant improvements in areas such as the Compute Unified Device Architecture cores occupancy level, the performance per watt, thus ensuing noteworthy enhancements that translate in an increased overall performance when compared to previous architectures.
The streaming multiprocessor of the latest Pascal GP100 architecture comprises two processing blocks that offer Compute Unified Device Architecture processing cores with 32 single precision, two instruction buffers, two warp schedulers and four dispatch units, two per each processing block (Figure 2).
A novel addition to the Pascal GP100's Compute Unified Device Architecture cores is the facility of processing instructions and data that have a precision of both 16-bit and 32-bit, while the amount of FP16 processed operations is up to two times the throughput of FP32 operations.
The introduction of the NVIDIA Compute Unified Device Architecture made possible to accelerate and improve a large suite of applications.
The algorithm described in this article is designed so that it scales across a significant range of parallel hardware architectures that implement the Compute Unified Device Architecture.
Using a specialized hardware and software appropriate elements, NVIDIA launched in 2006 the Compute Unified Device Architecture, the first solution in the world for general purpose computing on GPUs.
Starting with the CUDA 4 version, the Compute Unified Device Architecture has offered support for Unified Virtual Addressing (UVA).
8] ***, Nvidia CUDA Compute Unified Device Architecture - Programming Guide, Version 6.