Vision and Security Technology Lab
Home Projects
People/Photos
Projects Quick Links
Adaptive High Dynamic Range Imaging C2FUSE INSPECCT GPS-Based Tracking System for Trauma Patients Intelligent Imaging System NCIIA Network Security NSF Privacy ONR FAMME Privacy Enhanced Camera Projective Biometric Invariants Revocable Biometrics with Robust Distance Metrics SEE Port Semquest NEATR Trauma-Net UAV Underwater Imaging
People Quick Links
Mary (Joy) Aquino David Bacon Abhijit Bendale Terrance Boult Benjamin Cable Patrick Caldwell Ankur Chattopadhyay Jonathan Dugan Chris Eberle Christopher Gilbert Dennis Ippoliti R.C. Johnson Erica Kirkbride Shane Kirkbride Andrew Kurfess Palden Lama Josh Lawson Edith Leung Miguel Lezcano Mackenzie Lowrance Anthony Magee Andrew Purkett Kyle Roucis Walter Scheirer Justin Schiff Daniel Szarkowski Kim Tran Benjamin Wood
Contact Us
Job Openings
Search
Flexible Imaging Systems and Biometrics and Biometric fusion FLEXIBLE IMAGING SYSTEMS

FLEXIBLE IMAGING SYSTEMS

Report 5/2004

DARPA HID program contract number N00014-00-1-0929

 

 Principal Investigators:     

 

            Terrence E. Boult     

          Department of Computer Science

          University of Colorado at Colorado Springs              

          1420 Austin Bluffs Parkway

          Colorado Springs CO 80933-7150

          Fax 719 262 3900

          email 

         

            Shree K. Nayar

          1214 Amsterdam Avenue

            Department of Computer Science, Columbia University

            New York, NY 10027.

            Phone: 212-939-7092

            Fax: 212-939-7172

            Email:

 

 

 

1. Overview: 2

2. Omni-directional and Super wide field-of-view Sensors for HID   2

3. Surveillance Issues in HID   7

4. Synthetic Sensor Evaluations  8

5. Dynamic Range  10

6. Evaluation and prediction  17

Conclusions  21

Publications  22

 


 

1. Overview:

Typical pan-tilt-zoom sensors have a five to fifty degree field of view, and track a single target. Not only is locating and retaining a facial target over a large range difficult, but also limited by sensor parameters. To address these limitations, this project designed, developed, and implemented new catadioptric omnidirectional sensors capable of a large effective optical zoom. The fundamental feature underlying the sensors is the ability to transition between an omnidirectional image and a fine resolution image for use in a HID system.

 

The first sensor, dubbed the Zoomnicam uses a combination of catadioptrics and a physical zoom. By physically zooming into different parts of an omnidirectional image, the effective Zoomnicam field of view ranges between 360 to 15 degrees.  The second alternative is using Mega-Pixel digital still cameras.  It is our hypothesis that if the resolution on the face is approximately the same, the sensors will have the same performance in facial recognition across these sensors and will be comparable to traditional cameras.

 

Field of view, however, is not the only senor parameter that impacts recognition.  The project evaluated a number of potential sensor parameters, blur, gamma, compression, dynamic range for the impact on face recognition. These synthetic sensor evaluations help direct other aspects of the project and also answered important questions about the impact of blur, gamma, dynamic range and compression on the resulting images.

 

Our analysis and experimentation made it very clear that dynamic range was a significant issue and it probably is a major factor in why outdoor face recognition is so much weaker than indoor.  This led us to design and development of sensors for adaptive dynamic range, and for studying other approaches for improving dynamic range for HID.

 

Finally, there is the need for a method to evaluate the efficacy of the new sensors. Previous HID system evaluation has focused on algorithm evaluation. In algorithm evaluation, the only parameters varied are the algorithms themselves. Sensor evaluation introduces significantly more degrees of freedom in HID system input and, more importantly, makes it impossible to test with identical inputs. To address these issues, we developed a new evaluation paradigm defined with statistical confidence measures.  In addition we developed techniques to predict when face-recognition system were going to fail, then found ways, at the system level, to help reduce those failures.

 

We briefly review the project results in each of these areas.  For details see the papers in the reference sections, which also relate our work to others in the field.

 

 

2. Omni-directional and Super wide field-of-view Sensors for HID

For flexible imaging we have undertaken 3 data collections, (omni-directional, zoomnicam and long-distance), with the resulting data submitted as part of the DARPA HBASE, and also available directly from Dr. Boult.  These are in addition the much larger photo-head dataset, which is used both in this project and the Columbia lead Vision in Bad Weather project.    Combined these were almost a Terabyte of test data exploring sensor design, resolution, lighting distance and weather effects.   We briefly review those sensors, the experiments and the results.

 



The first of our non-traditional sensor projects sought to address the question of how well an omni-directional camera could be use for face recognition.  To allow it to operate over a wider range of distance we chose to use a 3.1 Megapixel camera, the Nikon 990.  The omni-directional images were obtained using Remote Reality’s OneShot lens attached to the Nikon 990.   Dr. Boult and students developed Linux software that controlled the Nikon 990 camera being used and combined video rate person tracking (using its analog TV output) with high-resolution image capture to provide face images suitable for recognition.  While the camera supported un-compressed TIFFs, the project always used the high-resolution jpeg format.   This software simplified our data collection process and was used in four different data collections at NIST.  In

all but one collection the subject stood at fixed distances to support repeatable measurements.


 

 


In the collections there are multiple images of each subject at each setting with over 8000 images in total.  Variations included view angle, lighting (artificial and natural), distance and time. Standard camera images of each subject were also available.

 

 


Two examples of the analysis are shown above, with the example subject's gallery image shown in the middle.  The left show face recognition performances at off angle viewing of 10 degrees as the number of additional lights are added.  The images were taken with a fixed aperture and shutter speed so that the brightness variations are not masked by automatic gain controls.  The three graphs are for variations of the matching algorithms in the FactIt SDK, with the algorithm F13 the algorithm using full template matching and the other two using smaller (faster) templates.    The error bars show 95% confidence intervals computed using the BRR we presented in [Micheal-Boult-01].   With two lights, consistent with what is used in the standard images and gallery, the recognition rate for the unwarped omni-directional images are approximately 90%, consistent with off axis regular images of comparable resolution.  The second examples considered the impact   the graph on the right shows the impact of distance on recognition from omni-directional cameras in ambient lighting conditions.    It is clear the images are quite dark, yet the recognition rates are reasonable.  It is important to remember these are omni-directional images, with the face image cropped from that, so one might interpret the results as suggesting the system could recognize 60% of all the people within 12ft of the camera, and 70% of those within 6ft.  Combining the results with the omni-directional imaging with lighting results its clear that in a well-lit area the system can increase that to near 90% recognition.    The results show that omni-directional sensors have potential for human identification at a distance.


The second flexible imaging sensor explored was an omni-directional system capable of zooming, or a zoomnicamera. Design and implementation of the zoomnicam prototype was done at Columbia.  The unit uses a Sony DFW-V500 color zoom and a relay lens to make the imaging approximately telecentric and allow it to focus on the nearby parabolic mirror. The mirror was mounted on a xy-stage with 4” motion, though most motions were much smaller and hence much faster than a traditional Pan/Tilt system.   Dr. Nayar and the team at Columbia developed a controller for the stage and calibration software that allowed unwarping the resulting image to a perspectively correct image.    The unit was transfer to Dr. Boult and students at Lehigh where the control software was extended to support facial image collections and experiments. 

 

 

 

 

 


 

Again real-time tracking was possible from the video output, but the focus of this project was the facial recognition.  The Zoomnicam data was collected on two different dates, imaging stationary subjects at 4 or 5 distances producing a total collection of over 2000 image from 85 subjects with 72 overlapping between the collections.   In addition to the zoomnicamera images, all subjects had same and different day traditional camera images taken as part of parallel collections at NIST and most also were imaged by the mega-pixel omni-directional system.    This figure shows a standard camera image (upper left) from 3 feet and then a range of zoomnicameras at distances from 6 to 15 feet.   Note how the zoomincam images also have strong directional lighting effects, effects that would impact the recognition rate of standard face recognition. 

 

 

 

 

 

 

 

 



 


Using the results of these data collections were have been addressing the facial recognition quality of these sensors.   The experimental analysis was to test the hypothesis that the zoomnicamera, when it was zoomed to provide a similar resolution as the MegaPixel omni-camera would have essentially the same recognition rate.  Since the first experiments showed the omnicamera in a wide-range of settings, a smaller set of experiments were done with the zoomnicameras.  The next graph summarizes these results.  The error bars show the results of our STRAT/BRR technique for performance analysis, allowing us to draw conclusions across different data sets.  Clearly the two curves are not statistically different.  The upper curve shows the results when the zoomnicamera data was used as both probe and gallery images.  Multiple images were taken so it is different images in the probe/gallery, but they have similar lighting. This suggests that lighting is a much stronger factor than the difference in sensors.

 

 

 

 

 


Ultra-wide field of view is not the only issue for sensors for HID.  We can think of the omni-directional sensor as a special case of defining mapping from the scene to an image.  Dr. Nayar and students at Columbia developed a general approach to catadioptric system design that allows them to solve for the mirror shape for any given image-to-scene mapping.   Using such a flexible system one might desire an Anamorphic imaging system where the mirror is such that faces at expected distances all have the same size. Such an example is show here where the faces in the bottom for of people are show for a perspective image and for an anamorphic system.  This is only one of infinitely many constraints that might be imposed on the scene to image mapping and then realized using the general imaging theory developed.

 

 

 

 

 

 

 

 

 


 



3.Surveillance Issues in HID

In the final year of the Flexible imaging project Dr. Boult and students instituted a new direction, building on our earlier work in visual surveillance we began looking at issues for recognition of particular human activities.  The work builds on our geo-spatial detection and tracking work, which began as part of the DARPA VSAM program.   With regard to wide-area sensing and detailed assessment Dr. Boult, as part of the Army SmartSensorWeb(SSW) program, extended his system for tracking using a 360FOV camera. Not only did it detect and track targets, it geo-located their position and used that to pass-off targets to a PTZ that could then be used for detailed assessment.  The omni-directional video supports tracking multiple simultaneously and is very useful for crowded areas

While detection and tracking still needs research, especially for following individuals within crowds, we postulate that it is not the most significant problem.  More important is developing an approach that allows one to specify what activity is of interest and recognize complex activities.  Well known models used for event and anomaly detection, such as Hidden Markov Models (HMMS) or stochastic grammars, both require lots of training data and are very difficult to use, even for expert computer users.  In a recent study, we had graduate students in CS and EE learn to use an HMM system, then use it to specify/learn simple events.   After training, those who could figure out how to specify given events took more than 25min, on average, to specify a single event of interest. Furthermore, 13 of the 20 students unable develop HMM for the relatively simple events within an hour.

We proposed [Yu-Boult-2003] a radically new approach, UI-GUI:  Image Understanding of Graphical User Interfaces, which offers a new solution and significant promise for human activity recognition.   The approach ties the “event recognition” to the GUI display of targets and sensory data.  The definition of an event is done using what the end-user sees in the GUI (so it still depends on good low-level processing), and combines different icons in spatio-temporal patterns. This figure is an example from our video tracking, with target type and localization is displayed graphically.  The rule (the box) looks for someone being dropped off.  Furthermore, the IUGUI supports effective in-the-field sensor integration – if the results from new senor can be displayed on the users screen it can be used as part of an IU-GUI rule.

Using the IUGUI approach, the same 20 students as above were trained to use the system, in under 5 min, and specified each of their activities of interest in, on average, 11 seconds. That is 10000 times faster than using HMMs.  Here we show an ROC curve of performance, and clearly of the UI-GUI was also significantly better than HMM.  While preliminary, these early experiments show the approach is very significant potential for activity recognition.

4. Synthetic Sensor Evaluations

We also pursued a collection of “simulation” experiments were simulated sensor effects were applied to a subset of the FERET data. The subset consists of 256 subjects with 4 images per person, to permit the use of STRAT for estimate of confidence intervals.  The first three of these synthetic experiments examined the impact of blur, gamma, and compression.  Spatial blurring was done using 7x7 windows with an approximate Gaussian of a given standard deviation (sigma) given in pixel.  The results for were not surprising, showing the expected impact of blur, which was statistically significant even for a single pixel sigma. While there have been other informal reports of blur improving results our results did not find such a pattern.   Again each bar in the graph is a 95% confidence interval from STRAT/BRR.

 

 

 

 

 

 In the analysis for gamma, the simulations do not have the ability to change the gamma at capture time, rather they are reprocessed images from good quality images with moderate dynamic range but unknown gamma (the original FERET images used were digitized from film).  Thus it is not a measure of the impact of improved dynamic range, but only the results of brightness variations.   The images were gamma corrected with gamma=1 equal to the original image.

 

 

 

 

 

 

The analysis for the jpeg compression did present some unexpected results. The images uses were uncompressed FERET images that were converted from film.   The results considered compression of the gallery (first number in the pair), the probe  (second number) or both gallery and probe.