Thursday, April 9, 2009

Signal Architecture

Shuvra Bhattacharyya explores how emerging hardware platforms enable more advanced software for image-processing applications


Shuvra Bhattacharyya is a professor in the Department of Electrical and Computer Engineering, University of Maryland at College Park, and holds a joint appointment in the University of Maryland Institute for Advanced Computer Studies and an affiliate appointment in the Department of Computer Science. He received his BS from the University of Wisconsin at Madison and PhD from the University of California at Berkeley.


VSD: Could you provide us with some background information on your experience?


Bhattacharyya: My research interests include architectures and design tools for signal-processing systems, biomedical circuits and systems, embedded software, and hardware/software co-design. Before joining the University of Maryland, I was a researcher at Hitachi America Semiconductor Research Laboratory (San Jose, CA, USA) and a compiler developer at Kuck & Associates (Champaign, IL, USA). I'm presently the chair of the IEEE Signal Processing Society technical committee on design and implementation of signal-processing systems.


Books that I have co-authored or co-edited include Embedded Multiprocessors: Scheduling and Synchronization (second edition to be published by CRC Press in 2009); Embedded Computer Vision (Springer, 2008); and Memory Management for Synthesis of DSP Software (CRC Press, 2006).


VSD: Which aspects of image processing interest you? What current research are you or your students pursuing?


Bhattacharyya: My research group at the University of Maryland--known as the Maryland DSPCAD Research Group--is focused on design methodologies and CAD tools for efficiently implementing DSP systems.


The objective of our work in the area of image processing is to develop programming models that capture the high-level structure of image-processing systems. We are also looking at analysis techniques for deriving implementation properties such as memory requirements and processing throughput from these representations. And we are looking at synthesis techniques for deriving optimized implementations on different kinds of target architectures, including programmable DSPs, FPGAs, and embedded multiprocessors.


The programming models we work with are based on dataflow principles and specialized to the area of signal processing, including applications that process signals from image, wireless communication, audio, and video streams. By applying specialized programming models, our methods are able to efficiently expose and exploit high-level computational structure in signal-processing applications that is extremely time consuming or impossible to derive from general-purpose program representations.


Some particular challenges in applying dataflow-based design methodologies to image-processing systems include incorporating multidimensional data into the formal stream representations used by the programming models and managing the large volumes of data and high performance requirements. In addition, increasing use of image processing in portable, energy-constrained systems makes it important to incorporate methods for aggressively optimizing power consumption while maintaining adequate image-processing performance and accuracy.


Two image-processing domains that I have been specifically involved in developing new design methods and tools for are distributed networks of smart cameras and medical image registration. The first is through an NSF-sponsored collaboration with Rama Chellappa (University of Maryland) and Wayne Wolf (Georgia Institute of Technology); and the second is through a collaboration with Raj Shekhar and William Plishker, who are jointly affiliated with the schools of Engineering and Medicine at the University of Maryland.


VSD: How do you think this research will impact future generations of image-processing and machine-vision systems?


Bhattacharyya: I think that research on dataflow programming environments and tools will allow designers of these future systems greater flexibility in experimenting with different kinds of embedded processors and heterogeneous multiprocessor platforms. Most dataflow-based tools for signal processing operate at a high level of abstraction, where individual software components in conventional programming languages (e.g., C or Verilog/VHDL) are selected based on the back-end tools associated with the targeted platform.


These platform language components are interfaced through dataflow-style restrictions and conventions that allow for the inter-component behavior to be analyzed and optimized using formal dataflow techniques. The output of these tools is an optimized, monolithic implementation in the selected platform language; or, for heterogeneous platforms, the output is a set of multiple, cooperating platform language implementations. This output can then be further processed by the toolchain (e.g., the C compiler or HDL synthesis tools) associated with the target platform.
This kind of design flow provides a number of advantages that are promising for next-generation image-processing and computer-vision systems. First, the emphasis on component-based design--where components adhere to thoroughly and precisely defined interfacing conventions--facilitates agile, high-productivity, modularity-oriented design practices.


Second, the use of dataflow as effectively a source-to-source framework in terms of the platform language provides for efficient re-targetability across different kinds of platforms, and allows designers to leverage the often highly developed, and highly specialized back-end tools of commercial embedded processing platforms. This provides a complementary relationship between the high-level design transformations, which are handled effectively by dataflow tools, and low-level (intra-component) optimizations and machine-level translation, which are best handled by platform tools.


A general challenge facing this kind of two-level design methodology is the overhead of inter-component data communications, which can sometimes dominate performance if it is not handled through a more integrated design flow. I expect that designers and tool developers will continue to make advances in this direction by using techniques for carefully controlling the granularity of components, using block processing within components, and exploring new ways to model and optimize the mapping of component interfaces into hardware and software.



Dataflow graph that represents an accelerator for evaluating polynomials. Each circle or oval represents a computational operation; the arrows that connect operations specify how data passes between operations. Annotations specify certain properties about the rates at which the incident operations produce and consume data. The operation labeled "controller" (broken out on right) has a hierarchical "nested" dataflow representation. (Adapted from Plishker, W. et al, Proc. International Symposium on Rapid System Prototyping, pp. 17-23, Monterey, CA, June 2008).


VSD: What developments in FPGA design will affect hardware developments and how will system designers incorporate them?


Bhattacharyya: I think that support for heterogeneous multiprocessing in FPGAs--both in terms of rapid prototyping and developing high-performance implementations--will contribute significantly to the increased use and customization of such multiprocessor technologies in image-processing systems. Modern FPGA devices provide valuable platforms on which designers can experiment with different multiprocessor architectures, including different combinations of processing units and different kinds of networks for inter-processor communication. This opens up a valuable dimension of the design space that must be explored more deeply to achieve the most competitive implementations of next-generation applications. Both "hard" and "soft" processor cores play useful roles in FPGA-based design methodologies and applying these methodologies to develop embedded multiprocessor systems. Although soft cores incur significant penalties in terms of performance and resource utilization, they are relatively easy to configure in different ways to experiment with different numbers and kinds of processors, and get an idea of how an application will map onto and scale with different system architectures.


This kind of rapid prototyping approach allows designers to develop much better intuition about system architecture alternatives before investing large amounts of specialized effort developing or applying a specific multiprocessor platform. On the other hand, hard processor cores, together with signal processing accelerators and other kinds of specialized IP blocks, provide valuable frameworks for accelerating image-processing applications in performance-oriented production systems.


VSD: Recent software developments in image processing include pattern recognition, tracking, and 3-D modeling. What algorithms and specific software developments do you see emerging in the next five years?


Bhattacharyya: I expect an accelerated use of heterogeneous platforms for image-processing software development, such as platforms involving combinations of GPUs and CPUs, or multiprocessors and FPGA-based accelerators. Heterogeneous platforms allow for more streamlined implementation, including exploitation of different forms and levels of parallelism in the application, and efficient integration of control and data processing.


The use of heterogeneous platforms, however, is conceptually more difficult, and the associated design flows are more complex. I expect increased attention to and application of frameworks that are aimed at application development on heterogeneous multiprocessor platforms. Some examples of emerging frameworks in this space are the open computing language (OpenCL), which is geared towards platforms that integrate GPU and CPU devices, and openDF, which is a dataflow-based toolset geared towards platform FPGAs and multicore systems.