"With the elimination of the current 3.5GB limit on D3 virtual memory, the per-process memory limit imposed by a 32-bit application and true 64-bit provide for improved process performance, larger memory resident dataset, reduced memory contention
, and enhanced overall system performance," explained John Bramley, Lab Business Area executive at Rocket Software.
Linear speedup is not entirely achieved because of memory contention, that is, concurrent accesses to different data structures in the same memory modules.
We believe that the degradation is due to (1) memory contention as the disks transfer data to memory, (2) preemption of the file system and application threads due to disk interrupts, and (3) contention for kernel data structures such as the ready queues.
We suspect that increased memory contention is the culprit; the increased number of memory accesses to file structure data (building blocks and hash tables in this case) compete in the memory with the DMA accesses from the disks.
For good performance, we had to spend a large part of our implementation effort on developing techniques to distribute and replicate file system state across multiple memory modules to increase locality and avoid memory contention. This may have been partly an artifact of our ("mature") hardware base that was not cache coherent.