Wednesday, May 6, 2009

Toward a Machine-Vision Benchmark

Following the publication of Vision Systems Design's proposal in November 2008*, Wolfgang Eckstein shows how a machine-vision benchmark could be realized

To develop a successful benchmark for a machine-vision or image-processing system, it is necessary to understand the purpose of benchmarking. Although information about other components such as illumination, cameras, or frame grabbers may be required, it should not be the aim of a vision benchmark to evaluate this hardware.

Any successful machine-vision benchmark (MVB) should evaluate only the software and how it performs on various types of hardware. Results should be presented as they relate to whether a standard CPU or a GPU is used. Having said this, an MVB should not be limited to software packages running on PCs but should also evaluate how image-processing solutions perform on open systems, embedded systems, and smart cameras.

The intention of any MVB should be to bring more transparency into the market for vision software and vision systems. It should enable vision-system users to determine more easily which software is most suitable for the requirements of a given application.

The aim of developing a benchmark should not be to compare single methods such as the execution time of a Sobel filter but to evaluate how well an application can be solved with the software. Additionally, a single benchmark should focus not only on the speed of such applications but also their accuracy and robustness.

This kind of benchmark can be accomplished by supplying machine-vision and image-processing vendors with a set of one or more images stored as image files -- together with a description of the images and the benchmark task.

To develop this type of benchmark, a company or an organization could specify the rules and benchmarks, perform them, and publish the data and results. As a second option, experts within the vision community could propose such rules, which would then be edited by an unbiased third party or by an MVB consortium.

Based on these rules, single benchmarks could be offered by different manufacturers and added to an overall MVB. Everyone in the vision community could then download the MVB and perform the benchmarks. Or the benchmarks could be hosted by a neutral organization, such as the European Machine Vision Association (EMVA) or the Automated Imaging Association (AIA).
In practice, the second option is preferable since the MVB would not be controlled by a single company but would be open to every manufacturer. Furthermore, this approach would facilitate the development of an extensible MVB and, because the results would be visible to the whole community and to end users, every manufacturer would have a vested interest in ensuring that the MVB is up to date by using their latest software. This would ensure the MVB remains viable and always contains relevant information.

Rules for a benchmark
In the development of an MVB, certain rules first need to be established. This could include a description of a task to be solved and how the benchmark data was generated.

Benchmarks would be chosen from classical fields of image processing, like blob analysis, measuring, template matching, or OCR. Such benchmarks require a general statement of the task to be accomplished—without restricting the selection of operators. Alternatively, a specific -- but widely needed -- feature of a tool should be analyzed, such as the robustness of a data code reader that is used to read perspectively distorted codes.

Finally, a benchmark must specify how the data used are generated -- whether they were generated synthetically (or modified) or whether the image used was captured from a camera. For general documentation purposes, it would be useful to specify further data such as the optics and camera used for acquiring the test images.

In addition to data, there must be a clear description of the task that must be solved. It is important that the solution is not limited and that any suitable software can be used.

Benchmark results must specify which information was used to solve the task. For example, it must be clear whether the approximate location of an object or the orientation of a barcode was used to restrict the search for a barcode within an image, because these restrictions influence speed and robustness.

MVTec proposes a number of benchmarks, each of which consists of a set of image sequences. Each sequence tests a specific behavior of a method. Within each sequence the influence of a "defect" is continuously increased. For example, in template matching, a sequence of a PCB position could be generated by successively changing the distance to check for robustness against defocus.

To motivate many companies and organizations to perform the MVB, it is important that the results be transparent. To accomplish this, each manufacturer or organization must show the specific version of the software that was used, the hardware that the software was run on, and the benchmark's execution time.

Various methods of image processing also require the tuning of parameters used within a specific software package. Since these parameters might differ from the default values, they must also be specified. Optional information could also include the code fragment used to solve the benchmark task. This would allow users to learn more about the use of a given system and to perform the same test.

How to perform a benchmark
After developing the MVB, the benchmark data and its description should be made freely available. Based on these benchmarks, each manufacturer can develop optimal solutions, perform them, and provide the results. After checking whether the rules are fulfilled for each specific task, the results would then be tabulated and be made freely available to others to cross-validate the published data.

To begin the development of an MVB, these single benchmarks should be easy to understand, have clear semantics, cover typical machine-vision tasks, and allow an easy comparison of vision systems.

MVTec proposes a number of benchmarks (see below), each of which consists of a set of image sequences. Each sequence tests a specific behavior of a method. Within each sequence the influence of a "defect" is continuously increased. In template matching, an original image of a PCB could be generated and then successively defocused to provide a specific image sequence (see figure). The quality of specific software can then be measured by the number of images that can be processed correctly. The tests would check the speed, robustness, and accuracy of each application task.

For each test sequence, typically 20-40 images of VGA resolution would be required. Since one image typically has a size of 200 kbytes using, for example, the PNG format, this results in a total size of about 500 Mbytes for all the benchmarks listed.

MVTec would offer these test images together with the appropriate task descriptions, if a neutral organization such as the EMVA or the AIA would be willing to host it. Besides this, MVTec invites other manufacturers and users to an open discussion to bring the idea of an MVB forward to increase transparency in the machine-vision market.

Wolfgang Eckstein is managing director of MVTec Software, Munich, Germany;

* "Setting the Standard: Despite the myriad machine-vision software packages now available, there is yet no means to properly benchmark their performance," Vision Systems Design, November 2008, pp. 89-95.