In , cache misses found in cc-NUMA multiprocessors are firstly classified in terms of the actions performed by directories to satisfy them, and then, it is proposed a novel node architecture that makes extensive use of on-processor-chip integration in order to reduce the latency of each one of the types of the classification.
The new Compaq AlphaServer GS320  constitutes an example of cc-NUMA architecture specifically targeted at medium-scale multiprocessing (up to 64 processors).
Additionally, the authors presented the first design for a speculative coherent cc-NUMA using pattern-based predictors by executing coherence operations speculatively to hide the remote read latency .
Owner Prediction for Accelerating Cache-to-Cache Transfer Misses in cc-NUMA Multiprocessors".
The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors".
Switch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors".
Using Switch Directories to Speed Up Cache-to-Cache Transfers in CC-NUMA Multiprocessors".
Besides the flexibility to support a wide variety of workloads efficiently, this approach has a number of additional advantages over other system software designs targeted for CC-NUMA machines.
Besides handling CC-NUMA multiprocessors, the approach also inherits all the advantages of traditional virtual machine monitors.
This is a particularly important task on CC-NUMA machines, since the commodity operating system is depending on Disco to deal with the nonuniform memory access times.
Disco provides a complete CC-NUMA memory management facility that includes page placement as well as a dynamic page migration and page replication policy.
Operating system support for improving data locality on CC-NUMA computer servers.