The architecture also provides two supplementary memory types that are read-only and are accessible by all the threads: the constant memory and the
texture memory. The lifetime of the global, constant and
texture memory is the same with the kernel function's lifetime [13].
Unlike the strategy of thread-based task assignment proposed for aggregation and disaggregation at coarser level [16], we make a further step in data storage by using the four times larger
texture memory in Kepler than Fermi.
Both constant and
texture memory provide small on-chip caches that allow threads to take advantage of fine-grained spatial and temporal locality.
Due to non-linear, tiled
texture memory addressing (seehttp://www.x.org/wiki/Development/Documentation/HowVideoCardsWork/#index1h3), doing horizontal and vertical compute shader passes has essentially equal performance.
For example, global memory, constant memory, and
texture memory are visible for both the parent and child grids and can be written within the parent and child grids coherently.
CUDA threads can access data from different memory spaces on device, such as global, shared, constant,
texture memory, and registers.
Topics covered include thread cooperation,
texture memory, atomics and the use of CUDA C with multiple GPUs.