학술논문

From Xetal-II to Xetal-Pro: On the Road Toward an Ultralow-Energy and High-Throughput SIMD Processor
Document Type
Periodical
Source
IEEE Transactions on Circuits and Systems for Video Technology IEEE Trans. Circuits Syst. Video Technol. Circuits and Systems for Video Technology, IEEE Transactions on. 21(4):472-484 Apr, 2011
Subject
Components, Circuits, Devices and Systems
Communication, Networking and Broadcast Technologies
Computing and Processing
Signal Processing and Analysis
Frequency modulation
Kernel
Random access memory
System-on-a-chip
Computer architecture
Throughput
Hybrid memory system
SIMD
sub/near threshold
ultralow-energy
Xetal
Language
ISSN
1051-8215
1558-2205
Abstract
Looking forward to the next generation of mobile streaming computing, the demanded energy efficiency of end-user terminals will become ever stringent. The Xetal-Pro processor, which is the latest member of the Xetal low-power single-instruction multiple data (SIMD) processor family from Philips, is presented in this paper. The predecessor of Xetal-Pro, known as Xetal-II, already ranks as one of the most computational-efficient [in terms of giga operations per second (GOPS)/Watt] processors available today, however, it cannot yet achieve the demanded energy efficiency (less than 1 pJ per operation). Unlike Xetal-II, Xetal-Pro supports ultrawide supply voltage $(V_{dd})$ scaling from the nominal supply to the subthreshold region. Although aggressive $V_{dd}$ scaling causes severe throughput degradation, this can be partly compensated for by the massive parallelism in the Xetal family. Xetal-II includes a large on-chip frame memory (FM), which cannot be scaled well to an ultralow $V_{dd}$ hence creating a big obstacle to increase energy efficiency. Therefore, we investigate both different FM realizations and memory organization alternatives. A hybrid memory system (HMS), which reduces the non-local memory traffic and enables further $V_{dd}$ scaling, is proposed. For design space exploration of the right number of the scratchpad memory (SM) entries, the corresponding data locality analysis is provided, too. Moreover, some unique circuit implementation issues of Xetal-Pro such as the customized level-shifter are also discussed. Compared to Xetal-II operating at the nominal voltage, Xetal-Pro provides up to two times energy efficiency improvement even without $V_{dd}$ scaling (essentially a consequence of data localization in the SM) when delivering the same amount of ultrahigh throughput. With $V_{dd}$ scaling into the sub/near threshold region, Xetal-Pro could gain more than ten times energy reduction while still delivering a high throughput of 0.69 GOPS (counting multiply and add operations only). The new insight of Xetal-Pro sheds light on the direction of future ultralow-energy SIMD processors.