Friday, September 25, 2009

Production Necessity

Jun Mitsudo describes advances in semiconductor inspection enabled by machine-vision algorithms, sensors, and processors

Jun Mitsudo holds a PhD in 3-D shape measurement from Ritsumeikan University (Kyoto, Japan) and is currently assistant manager of the Research and Development Center of Canon Machinery (Kusatsu, Japan). He has been involved with machine-vision technology since the late 1990s.

VSD: What is the mission of Canon Machinery in designing and building machine-vision systems for end users? Which industries do you serve?

Mitsudo: Canon Machinery consists of two business divisions: one that develops machines for factory automation and another that builds die-bonding machines for semiconductor test and assembly. Canon is the largest manufacturer of these machines in Japan and fourth worldwide.
Being committed to investment in research and development in semiconductor production technology, we realize that machine-vision technology is a necessity. Indeed, because semiconductor production equipment is always increasing in complexity, the number of cameras required per machine is becoming larger each year. These cameras are used in a number of automated machine-vision processes including high-accuracy alignment, part recognition, part identification, and optical character recognition (OCR) applications.

VSD: What are end users requiring from Canon Machinery in the design of new systems?

Mitsudo: In factory automation systems, many different features are required that can only be produced at a reasonable cost by closely collaborating with end users. However, in the development of automated die-bonding equipment, the most important criterion is the throughput of the system. To achieve the highest possible throughput, many different technical factors such as speed, accuracy, and robustness need to be considered.

In addition, machine operators must configure these systems as quickly as possible. This is especially important since semiconductor manufacturing is now being performed in developing countries, where an easy-to-use operator interface is critical to the manufacturer's success. In future, these sophisticated operator interfaces will take advantage of different types of sensing technologies including machine vision to detect the status of a system and inform the operator accordingly.

VSD: What technologies and components do you use in these applications?

Mitsudo: Depending on the type of application, the best fitting components that address the different requested features andspecifications of each machine are individually chosen on a case-by-case basis. Because semiconductor devices differ in size, die-bonding machines are required to accommodate many different types. Indeed, the smaller the size of the device, the greater the required throughput of the system.

For this reason, CMOS cameras with programmable regions of interest (ROIs) are especially useful since these ROIs can be dynamically changed depending on the size of the individual IC. These types of cameras also eliminate the necessity to use relatively expensive zoom lenses.
To perform image analysis, we use Halcon from MVTec Software (Munich, Germany; and create our own features based on the library. In the past, we developed our own image-processing hardware or bought off-the-shelf image-processing boards. However, in late 1990, the processing power of the PC increased dramatically and after an extensive evaluation we selected Halcon as our software package of choice.

VSD: What developments in embedded computing, GPUs, multicore CPUs, and multicore DSPs do you see? How will these technologies affect hardware development and how will system designers incorporate these developments?

Mitsudo: Of the different types of hardware currently available, perhaps graphics processing units (GPUs) are the most important. The high level of data parallelism used in these devices makes them an interesting alternative to general-purpose CPUs, especially in image-processing applications where very large images must be processed at high speeds.

For this to occur, however, system designers must have an intimate knowledge of computer architectures, algorithms, signal processing, optics, and mechanical design. In current die-bonding applications, newer algorithms are required to replace gray value edge-based template matching, and we expect such algorithms to be ported to GPU-based machines to increase their speed.

Canon Bestem-D02 is a multipurpose die bonder with a bonding speed of 0.29 s/cycle. The bonder incorporates CMOS image sensors with programmable ROI imaging. Image analysis is performed using Halcon from MVTec and a library customized by Canon Machinery.

VSD: What algorithms and specific software developments do you see emerging in the next five years?

Mitsudo: Different algorithms for 3-D pose calculation and 3-D shape reconstruction must become easier to integrate and maintain. Although these technologies are already practical, their use is limited due to limited acceptance by system designers. In the future, however, sophisticated software interfaces will make such software much easier to use.

VSD: What could vision component manufacturers do to make your job easier?

Mitsudo: In industrial machine-vision systems, the introduction of high-end machine-vision tools for template matching, caliper measurement, and blob analysis has made the development of die-bonding machines much easier. As these features migrate to smart vision sensors, they will become more practical and more widely used on the factory floor.

Other functions such as the fast Fourier transform (FFT), feature point extraction, calibration tools, neural network, and support vector machines (SVMs) are also being incorporated into many off-the-shelf software packages. As system designers, we are committed to providing end users with the best solutions by combining these elemental technologies.

For this, we must test the feasibility of use of each function and this requires an enormous amount of time. Single software packages that incorporate all of these functions therefore prove most valuable.

Because we incorporate ROI processing of CMOS cameras, we can dynamically change image-acquisition parameters to search for any specialized ROI within the image. Because this requires sending commands to the cameras continuously, standard digital interfaces such as Camera Link, FireWire, or GigE are useful in easing the setup of these types of cameras in semiconductor inspection applications.

VSD: In which industries do you see the most growth? In which geographic areas?

Mitsudo: Alternative energy sources have found increased popularity, especially after the price of oil increased to over $140 per barrel. We see this trend continuing with developers looking to produce automated systems for the inspection of solar wafers, solar cells, solar panels, and compact rechargeable batteries.

VSD: What kinds of new applications for machine vision do you expect to emerge? What new software, components, and subsystems will be needed?

Mitsudo: Although many newer machine image-processing algorithms offer high potential, they typically cannot overcome the cost and speed requirements of die-bonding applications. However, looking at future innovations in systems based on DSPs, GPUs, multiple CPUs, or FPGAs, it is likely that such algorithms may soon become practical.

In future, we hope to deploy systems that automatically detect the multiple processing resources available on a system and combine them efficiently for different processing tasks. These systems may perform functions such as point processing and neighborhood operations on an FPGA and perform other functions using a distributed computing system consisting of multiple GPUs or multicore CPUs. From a user's perspective, the use of this hardware must be transparent.

Wednesday, May 6, 2009

Toward a Machine-Vision Benchmark

Following the publication of Vision Systems Design's proposal in November 2008*, Wolfgang Eckstein shows how a machine-vision benchmark could be realized

To develop a successful benchmark for a machine-vision or image-processing system, it is necessary to understand the purpose of benchmarking. Although information about other components such as illumination, cameras, or frame grabbers may be required, it should not be the aim of a vision benchmark to evaluate this hardware.

Any successful machine-vision benchmark (MVB) should evaluate only the software and how it performs on various types of hardware. Results should be presented as they relate to whether a standard CPU or a GPU is used. Having said this, an MVB should not be limited to software packages running on PCs but should also evaluate how image-processing solutions perform on open systems, embedded systems, and smart cameras.

The intention of any MVB should be to bring more transparency into the market for vision software and vision systems. It should enable vision-system users to determine more easily which software is most suitable for the requirements of a given application.

The aim of developing a benchmark should not be to compare single methods such as the execution time of a Sobel filter but to evaluate how well an application can be solved with the software. Additionally, a single benchmark should focus not only on the speed of such applications but also their accuracy and robustness.

This kind of benchmark can be accomplished by supplying machine-vision and image-processing vendors with a set of one or more images stored as image files -- together with a description of the images and the benchmark task.

To develop this type of benchmark, a company or an organization could specify the rules and benchmarks, perform them, and publish the data and results. As a second option, experts within the vision community could propose such rules, which would then be edited by an unbiased third party or by an MVB consortium.

Based on these rules, single benchmarks could be offered by different manufacturers and added to an overall MVB. Everyone in the vision community could then download the MVB and perform the benchmarks. Or the benchmarks could be hosted by a neutral organization, such as the European Machine Vision Association (EMVA) or the Automated Imaging Association (AIA).
In practice, the second option is preferable since the MVB would not be controlled by a single company but would be open to every manufacturer. Furthermore, this approach would facilitate the development of an extensible MVB and, because the results would be visible to the whole community and to end users, every manufacturer would have a vested interest in ensuring that the MVB is up to date by using their latest software. This would ensure the MVB remains viable and always contains relevant information.

Rules for a benchmark
In the development of an MVB, certain rules first need to be established. This could include a description of a task to be solved and how the benchmark data was generated.

Benchmarks would be chosen from classical fields of image processing, like blob analysis, measuring, template matching, or OCR. Such benchmarks require a general statement of the task to be accomplished—without restricting the selection of operators. Alternatively, a specific -- but widely needed -- feature of a tool should be analyzed, such as the robustness of a data code reader that is used to read perspectively distorted codes.

Finally, a benchmark must specify how the data used are generated -- whether they were generated synthetically (or modified) or whether the image used was captured from a camera. For general documentation purposes, it would be useful to specify further data such as the optics and camera used for acquiring the test images.

In addition to data, there must be a clear description of the task that must be solved. It is important that the solution is not limited and that any suitable software can be used.

Benchmark results must specify which information was used to solve the task. For example, it must be clear whether the approximate location of an object or the orientation of a barcode was used to restrict the search for a barcode within an image, because these restrictions influence speed and robustness.

MVTec proposes a number of benchmarks, each of which consists of a set of image sequences. Each sequence tests a specific behavior of a method. Within each sequence the influence of a "defect" is continuously increased. For example, in template matching, a sequence of a PCB position could be generated by successively changing the distance to check for robustness against defocus.

To motivate many companies and organizations to perform the MVB, it is important that the results be transparent. To accomplish this, each manufacturer or organization must show the specific version of the software that was used, the hardware that the software was run on, and the benchmark's execution time.

Various methods of image processing also require the tuning of parameters used within a specific software package. Since these parameters might differ from the default values, they must also be specified. Optional information could also include the code fragment used to solve the benchmark task. This would allow users to learn more about the use of a given system and to perform the same test.

How to perform a benchmark
After developing the MVB, the benchmark data and its description should be made freely available. Based on these benchmarks, each manufacturer can develop optimal solutions, perform them, and provide the results. After checking whether the rules are fulfilled for each specific task, the results would then be tabulated and be made freely available to others to cross-validate the published data.

To begin the development of an MVB, these single benchmarks should be easy to understand, have clear semantics, cover typical machine-vision tasks, and allow an easy comparison of vision systems.

MVTec proposes a number of benchmarks (see below), each of which consists of a set of image sequences. Each sequence tests a specific behavior of a method. Within each sequence the influence of a "defect" is continuously increased. In template matching, an original image of a PCB could be generated and then successively defocused to provide a specific image sequence (see figure). The quality of specific software can then be measured by the number of images that can be processed correctly. The tests would check the speed, robustness, and accuracy of each application task.

For each test sequence, typically 20-40 images of VGA resolution would be required. Since one image typically has a size of 200 kbytes using, for example, the PNG format, this results in a total size of about 500 Mbytes for all the benchmarks listed.

MVTec would offer these test images together with the appropriate task descriptions, if a neutral organization such as the EMVA or the AIA would be willing to host it. Besides this, MVTec invites other manufacturers and users to an open discussion to bring the idea of an MVB forward to increase transparency in the machine-vision market.

Wolfgang Eckstein is managing director of MVTec Software, Munich, Germany;

* "Setting the Standard: Despite the myriad machine-vision software packages now available, there is yet no means to properly benchmark their performance," Vision Systems Design, November 2008, pp. 89-95.

Thursday, April 9, 2009

Signal Architecture

Shuvra Bhattacharyya explores how emerging hardware platforms enable more advanced software for image-processing applications

Shuvra Bhattacharyya is a professor in the Department of Electrical and Computer Engineering, University of Maryland at College Park, and holds a joint appointment in the University of Maryland Institute for Advanced Computer Studies and an affiliate appointment in the Department of Computer Science. He received his BS from the University of Wisconsin at Madison and PhD from the University of California at Berkeley.

VSD: Could you provide us with some background information on your experience?

Bhattacharyya: My research interests include architectures and design tools for signal-processing systems, biomedical circuits and systems, embedded software, and hardware/software co-design. Before joining the University of Maryland, I was a researcher at Hitachi America Semiconductor Research Laboratory (San Jose, CA, USA) and a compiler developer at Kuck & Associates (Champaign, IL, USA). I'm presently the chair of the IEEE Signal Processing Society technical committee on design and implementation of signal-processing systems.

Books that I have co-authored or co-edited include Embedded Multiprocessors: Scheduling and Synchronization (second edition to be published by CRC Press in 2009); Embedded Computer Vision (Springer, 2008); and Memory Management for Synthesis of DSP Software (CRC Press, 2006).

VSD: Which aspects of image processing interest you? What current research are you or your students pursuing?

Bhattacharyya: My research group at the University of Maryland--known as the Maryland DSPCAD Research Group--is focused on design methodologies and CAD tools for efficiently implementing DSP systems.

The objective of our work in the area of image processing is to develop programming models that capture the high-level structure of image-processing systems. We are also looking at analysis techniques for deriving implementation properties such as memory requirements and processing throughput from these representations. And we are looking at synthesis techniques for deriving optimized implementations on different kinds of target architectures, including programmable DSPs, FPGAs, and embedded multiprocessors.

The programming models we work with are based on dataflow principles and specialized to the area of signal processing, including applications that process signals from image, wireless communication, audio, and video streams. By applying specialized programming models, our methods are able to efficiently expose and exploit high-level computational structure in signal-processing applications that is extremely time consuming or impossible to derive from general-purpose program representations.

Some particular challenges in applying dataflow-based design methodologies to image-processing systems include incorporating multidimensional data into the formal stream representations used by the programming models and managing the large volumes of data and high performance requirements. In addition, increasing use of image processing in portable, energy-constrained systems makes it important to incorporate methods for aggressively optimizing power consumption while maintaining adequate image-processing performance and accuracy.

Two image-processing domains that I have been specifically involved in developing new design methods and tools for are distributed networks of smart cameras and medical image registration. The first is through an NSF-sponsored collaboration with Rama Chellappa (University of Maryland) and Wayne Wolf (Georgia Institute of Technology); and the second is through a collaboration with Raj Shekhar and William Plishker, who are jointly affiliated with the schools of Engineering and Medicine at the University of Maryland.

VSD: How do you think this research will impact future generations of image-processing and machine-vision systems?

Bhattacharyya: I think that research on dataflow programming environments and tools will allow designers of these future systems greater flexibility in experimenting with different kinds of embedded processors and heterogeneous multiprocessor platforms. Most dataflow-based tools for signal processing operate at a high level of abstraction, where individual software components in conventional programming languages (e.g., C or Verilog/VHDL) are selected based on the back-end tools associated with the targeted platform.

These platform language components are interfaced through dataflow-style restrictions and conventions that allow for the inter-component behavior to be analyzed and optimized using formal dataflow techniques. The output of these tools is an optimized, monolithic implementation in the selected platform language; or, for heterogeneous platforms, the output is a set of multiple, cooperating platform language implementations. This output can then be further processed by the toolchain (e.g., the C compiler or HDL synthesis tools) associated with the target platform.
This kind of design flow provides a number of advantages that are promising for next-generation image-processing and computer-vision systems. First, the emphasis on component-based design--where components adhere to thoroughly and precisely defined interfacing conventions--facilitates agile, high-productivity, modularity-oriented design practices.

Second, the use of dataflow as effectively a source-to-source framework in terms of the platform language provides for efficient re-targetability across different kinds of platforms, and allows designers to leverage the often highly developed, and highly specialized back-end tools of commercial embedded processing platforms. This provides a complementary relationship between the high-level design transformations, which are handled effectively by dataflow tools, and low-level (intra-component) optimizations and machine-level translation, which are best handled by platform tools.

A general challenge facing this kind of two-level design methodology is the overhead of inter-component data communications, which can sometimes dominate performance if it is not handled through a more integrated design flow. I expect that designers and tool developers will continue to make advances in this direction by using techniques for carefully controlling the granularity of components, using block processing within components, and exploring new ways to model and optimize the mapping of component interfaces into hardware and software.

Dataflow graph that represents an accelerator for evaluating polynomials. Each circle or oval represents a computational operation; the arrows that connect operations specify how data passes between operations. Annotations specify certain properties about the rates at which the incident operations produce and consume data. The operation labeled "controller" (broken out on right) has a hierarchical "nested" dataflow representation. (Adapted from Plishker, W. et al, Proc. International Symposium on Rapid System Prototyping, pp. 17-23, Monterey, CA, June 2008).

VSD: What developments in FPGA design will affect hardware developments and how will system designers incorporate them?

Bhattacharyya: I think that support for heterogeneous multiprocessing in FPGAs--both in terms of rapid prototyping and developing high-performance implementations--will contribute significantly to the increased use and customization of such multiprocessor technologies in image-processing systems. Modern FPGA devices provide valuable platforms on which designers can experiment with different multiprocessor architectures, including different combinations of processing units and different kinds of networks for inter-processor communication. This opens up a valuable dimension of the design space that must be explored more deeply to achieve the most competitive implementations of next-generation applications. Both "hard" and "soft" processor cores play useful roles in FPGA-based design methodologies and applying these methodologies to develop embedded multiprocessor systems. Although soft cores incur significant penalties in terms of performance and resource utilization, they are relatively easy to configure in different ways to experiment with different numbers and kinds of processors, and get an idea of how an application will map onto and scale with different system architectures.

This kind of rapid prototyping approach allows designers to develop much better intuition about system architecture alternatives before investing large amounts of specialized effort developing or applying a specific multiprocessor platform. On the other hand, hard processor cores, together with signal processing accelerators and other kinds of specialized IP blocks, provide valuable frameworks for accelerating image-processing applications in performance-oriented production systems.

VSD: Recent software developments in image processing include pattern recognition, tracking, and 3-D modeling. What algorithms and specific software developments do you see emerging in the next five years?

Bhattacharyya: I expect an accelerated use of heterogeneous platforms for image-processing software development, such as platforms involving combinations of GPUs and CPUs, or multiprocessors and FPGA-based accelerators. Heterogeneous platforms allow for more streamlined implementation, including exploitation of different forms and levels of parallelism in the application, and efficient integration of control and data processing.

The use of heterogeneous platforms, however, is conceptually more difficult, and the associated design flows are more complex. I expect increased attention to and application of frameworks that are aimed at application development on heterogeneous multiprocessor platforms. Some examples of emerging frameworks in this space are the open computing language (OpenCL), which is geared towards platforms that integrate GPU and CPU devices, and openDF, which is a dataflow-based toolset geared towards platform FPGAs and multicore systems.