학술논문

Detail Design and Evaluation of Fab Cache
Document Type
Conference
Source
2014 Second International Symposium on Computing and Networking Computing and Networking (CANDAR), 2014 Second International Symposium on. :591-595 Dec, 2014
Subject
Computing and Processing
Multicore processing
Random access memory
Power demand
Estimation
Microarchitecture
Registers
Heterogeneous multi-core processor
Cache generator
Design automation
VLSI design
Language
ISSN
2379-1888
2379-1896
Abstract
Single-ISA heterogeneous multi-core architecture which consists of diverse superscalar cores is increasing importance in the processor architecture. Using a proper superscalar core for characteristic in a program contributes to reduce energy consumption and improve performance. However, designing a heterogeneous multi-core processor requires a large design and verification effort. Therefore, we have proposed Fab Hetero which generates diverse heterogeneous multi-core processors automatically using Fab Scalar, Fab Cache, and Fab Bus which generate various designs of superscalar core, cache system, and flexible shared bus system, respectively. This paper is extended from our previous work, and it also presents the detail of Fab Cache. In the previous paper, the detail design of L1 data cache is not described, and the mechanism for high-end performance such as non-blocking cache is not implemented. In addition, the physical design and power estimation are not described. To solve these problems, this paper describes detail design of Fab Cache, in particular L1 data cache to show the suitability for high-end processors. This paper also focuses on performance estimation and the physical design of the caches which have arbitrary parameters such as cache capacity, line size, associativity, access latency, and line transmission width between cache hierarchies generated by Fab Cache. According to the estimation results, Fab Cache generates cache systems which have almost the same area and power consumption as hand-tuned cache because the ratio of L1 instruction cache controller including extra circuits is only 3.5% and the increased power consumption by comparing with hand-tuned cache is less than 0.1% although having the overhead of automatic generation.