by Richard Dorrance, Dejan Marković
Abstract:
A DSP for sparse-BLAS is realized in 40nm CMOS. Featuring an efficient data stream reordering scheme and an intelligent, CSC-aware memory controller, the DSP achieves a peak energy efficiency of 190 GFLOPS/W at 0.6V, 160MHz, and a peak performance of 4.12 GFLOPS at 1V, 515MHz showing more than 6,600x, 2,700x, 1,100x, and 450x higher energy efficiency than state-of-the-art CPU, GPU, DSP, and FPGA hardware designs, respectively.
Reference:
R. Dorrance and D. Marković, "A 190GFLOPS/W DSP for Energy-Efficient Sparse-BLAS in Embedded IoT," in Proceedings of the 2016 Symposium on VLSI Circuits (VLSI’16), pp. 182–183, June 2016.
Bibtex Entry:
@INPROCEEDINGS{Dorrance2016:VLSI,
author = {Dorrance, Richard and Markovi'{c}, Dejan},
title = {{A 190GFLOPS/W DSP for Energy-Efficient Sparse-BLAS in Embedded IoT}},
booktitle = {Proceedings of the 2016 Symposium on VLSI Circuits (VLSI'16)},
year = {2016},
month = {June},
pages = {182--183},
doi = {10.1109/VLSIC.2016.7573527},
abstract = {A DSP for sparse-BLAS is realized in 40nm CMOS. Featuring an efficient data stream reordering scheme and an intelligent, CSC-aware memory controller, the DSP achieves a peak energy efficiency of 190 GFLOPS/W at 0.6V, 160MHz, and a peak performance of 4.12 GFLOPS at 1V, 515MHz showing more than 6,600x, 2,700x, 1,100x, and 450x higher energy efficiency than state-of-the-art CPU, GPU, DSP, and FPGA hardware designs, respectively.},
url = {https://rdorrance.com/pdf/Dorrance2016VLSI.pdf}
}