Document Type

Article

Publication Date

2002

DOI

10.1080/10655140290011140

Publication Title

VLSI Design

Volume

Issue

Pages

337-347

Abstract

We propose a VLSI inner product processor architecture involving broadcasting only over short buses (containing less than 64 switches). The architecture leads to an efficient algorithm for the inner product computation. Specifically, it takes 13 broadcasts, each over less than 64 switches, plus 2 carry-save additions (t_csa) and 2 carry-lookahead additions (t_cla) to compute the inner product of two arrays of N = 2⁹ elements, each consisting of m = 64 bits. Using the same order of VLSI area, our algorithm runs faster than the best known fast inner product algorithm of Smith and Torng ["Design of a fast inner product processor," Proceedings of IEEE 7th Symposium on Computer Arithmetic (1985)], which takes about 28 t_csa + t_cla for the computation.

Original Publication Citation

Lin, R., & Olariu, S. (2002). Fast inner product computation on short buses. VLSI Design, 14(4), 337-347. doi: 10.1080/10655140290011140

Repository Citation

Lin, R., & Olariu, S. (2002). Fast inner product computation on short buses. VLSI Design, 14(4), 337-347. doi: 10.1080/10655140290011140

Download

Included in

Computer Sciences Commons

COinS

ODU Digital Commons

Computer Science Faculty Publications

Fast Inner Product Computation on Short Buses

Document Type

Publication Date

DOI

Publication Title

Volume

Issue

Pages

Abstract

Original Publication Citation

Repository Citation

Included in

Search

Browse

Contribute

Links

Contact Us

ODU Digital Commons

Computer Science Faculty Publications

Fast Inner Product Computation on Short Buses

Authors

Document Type

Publication Date

DOI

Publication Title

Volume

Issue

Pages

Abstract

Original Publication Citation

Repository Citation

Included in

Share

Search

Browse

Contribute

Links

Contact Us