We are using single precision numbers, so a single
cache line can store 16 values.
Besides, with [T.sub.C] representing the time used for copying a memory block which maps to a single
cache line, [T.sub.CP] can then be expressed as:
This technique states that we should replace the
cache line that has not been accessed for the longest period.
In this paper there are analyzed important technical aspects that can influence the overall performance of an application developed for CUDA enabled GPUs: the increased speedup offered by putting into use the shared and cache memory; the alignment of data in memory; optimal memory access patterns; aligning to the L1
cache line; the balance achieved between single or double precision and its effect on memory usage; joining more kernel functions into a single one; adjusting the code to the available memory bandwidth in accordance with the memory latency and the necessity to transfer data between the host and the device.
When occurs the
cache line loading from dynamic RAM, the two LSB bits, are obtained from the numerator, which increments the address with four locations.
The most sophisticated of the external cache systems also offer the ability to flexibly configure the
cache line size to match the size of incoming data traffic and to control the rate at which data is flushed to attached RAID systems.
Sun is also going to increase the number of transistors dedicated to L2 memory tags by a factor of eight in the Jaguar kickers while shortening the
cache line size from 512 bytelines to 128 bytelines.
The two-way set-associative caches are nonblocking and support
cache line locking.
We assume the following in these examples: a
cache line is 32 bytes long; an instruction is four bytes long (hence one
cache line contains eight instructions); hardware next-8-line prefetching is enabled; and the prefetching distance is 20 instructions.
In a direct-mapped cache, the relationship between the memory address and the
cache line number is simply