Apple-CORE: Microgrids of SVP cores: flexible, general-purpose, fine-grained hardware concurrency management

Authors
Publication date 2012
Host editors
  • S. Nair
Book title DSD 2012: proceedings, 15th Euromicro Conference on Digital System Design: 5-8 September 2012, Cesme, Izmir, Turkey
ISBN
  • 9780769547985
Event 15 th Euromicro Conference on Digital System Design
Pages (from-to) 501-508
Publisher Piscataway, NJ: IEEE Computer Society
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
To harness the potential of CMPs for scalable, energy-efficient performance in general-purpose computers, the Apple-CORE project has co-designed a general machine model and concurrency control interface with dedicated hardware support for concurrency control across multiple cores. Its SVP interface combines dataflow synchronisation with imperative programming, towards the efficient use of parallelism in general-purpose workloads. The corresponding hardware implementation provides dedicated logic able to coordinate single-issue, in-order multi-threaded RISC cores into computation clusters on chip, called Microgrids. In contrast with the traditional "accelerator" approach, Microgrids are intended to be used as components in distributed systems on chip that consider both clusters of small cores and optional larger cores optimized towards sequential performance as system services shared between applications. The key aspects of the design are asynchrony, i.e. the ability to tolerate operations with irregular, variable long latencies, a scale-invariant programming model, a distributed vision of the chip’s structure, and the transparent performance scaling of a single program binary code across multiple cluster sizes. This paper describes the execution model, the core micro-architecture, its realization in a many-core, general-purpose processor chip and its software environment. The reference chip parameters include 128 cores, a 4 MB on-chip distributed cache network and four DDR3-1600 memory channels. This paper presents cycle-accurate simulation results for various key algorithmic and cryptographic kernels. The results show good efficiency in terms of the utilisation of hardware despite the high-latency memory accesses and good scalability across relatively large clusters of cores.
Document type Conference contribution
Language English
Published at https://doi.org/10.1109/DSD.2012.25
Permalink to this page
Back