|
|
| |
A Reconfigurable
Multiprocessor System for DSP Behavioral Simulation
Wook Kuh, Ph.D. 1990
(advisor: Jan Rabaey)
A major part of the design effort for DSP systems
is devoted to the algorithmic verification and specification process. The
behavioral simulation of DSP algorithms on a
programmable computer will provide the flexibility to develop the algorithms
and enable the short design cycle. However, the simulation often requires
high computational throughput and the simulation of
large amounts of data. It takes too long or
is too costly to simulate on a general purpose computer. Therefore
a dedicated simulation engine called SMART has been developed and
presented in this report. It is a multiprocessor architecture optimized for
real- time behavioral simulation of Digital
Signal Processing (DSP) systems. The first prototype,
containing 10 processors, is currently operational with a peak performance
of 120 MFLOPS. The
SMART system features a Configurable Bus and a Bypass Unit to trade off
overall communication bandwidth and latency by taking advantage of the local
communication between processors. The system
performance is further improved by a
Distributed Shared Memory system which lets the communication latency overlap
with the computation time of the processors.
Barriers, locks and events are supported by hardware to minimize the
synchronization overhead. The benchmarks have
demonstrated that the SMART architecture actually achieves the targeted low
communication and synchronization overhead. In
a SMART simulation environment, the designer can describe the algorithms using a
high level language: C and Silage. The C programming environment, which
requires the partitioning information in the program, is currently available. A
high level software system, based on Silage, is
under development to auto-schedule the
algorithmic description onto the SMART processor array with a balanced loading
and an effcient usage of the communication system. Performance of the actual
SMART system was measured for typical DSP programs
using floating-point operations. The measurement shows an average speedup of 76
times over SUN 3/60 and a speedup of 29 times
over SUN SPARC Station 1. With extensive uses of library routines
in programming, the speedup can be easily doubled over the above results.
The performance is expected to increase even further
when the system is upgraded from 120 MFLOPS
to 200 MFLOPS.

| |
|
|