The architecture also provides two supplementary memory types that are read-only and are accessible by all the threads: the constant memory and the texture memory
. The lifetime of the global, constant and texture memory
is the same with the kernel function's lifetime .
Unlike the strategy of thread-based task assignment proposed for aggregation and disaggregation at coarser level , we make a further step in data storage by using the four times larger texture memory
in Kepler than Fermi.
More precisely, we load the terrain profile into the Compute Unified Device Architecture (CUDA) texture memory
and parallelize the transmitter-receiver path profile computation.
Both constant and texture memory
provide small on-chip caches that allow threads to take advantage of fine-grained spatial and temporal locality.
Due to non-linear, tiled texture memory
addressing (seehttp://www.x.org/wiki/Development/Documentation/HowVideoCardsWork/#index1h3), doing horizontal and vertical compute shader passes has essentially equal performance.
For example, global memory, constant memory, and texture memory
are visible for both the parent and child grids and can be written within the parent and child grids coherently.
There are two additional read-only memory spaces accessible by all threads and constant and texture memory
spaces (Figure 1).
CUDA threads can access data from different memory spaces on device, such as global, shared, constant, texture memory
, and registers.
Therefore, it is not surprising that, after careful optimizations (e.g., to efficiently use GPUs' texture memory
and improve data access locality) GPU ports of exact (5,6) or approximate (10) sequence alignment tools achieve significant speedups.
Topics covered include thread cooperation, texture memory
, atomics and the use of CUDA C with multiple GPUs.