Authors: Philipp A. Friese, Martin Schulz
Abstract:
The continuously rising number of cores per socket puts a growing demand on on-chip interconnects. The topology of these inter- connects are largely kept hidden from the user, yet, they can be the source of measurable performance differences for large many-core processors due to core placement on that interconnect. This paper investigates the ARM Coherent Mesh Network (CMN) on an Ampere Altra Max processor. We provide novel insights into the interconnect by experimentally deriving key information on the CMN topology, such as the position of cores or memory and cache controllers. Based on this insight, we evaluate the performance characteristics of several benchmarks and tune the thread- to-core mapping to improve application performance. Our methodology is directly applicable to all ARM-based processors using the ARM CMN, but in principle applies to all mesh-based on-chip networks.
#Link: