People See Shapes, Computers Compute Shapes
Associate Professor and Chair, Department of Computer Science
Flying through the 3D model recovered from images taken during the Mars rovers expedition, students tour the Martian terrain; lawenforcement officers routinely use software to match suspects’ fingerprints; at a Bank United ATM in Houston, an iris scan is used instead of a bank card and PIN to allow customer access; a brain surgeon uses a 3D model of a patient’s brain for preoperative study; dancers in New York and San Francisco meet and practice together in a 3D virtual studio. These are some illustrations of computer vision: computer systems that use images as input and make inferences about the real world (a 3D model, identification of a criminal, identification of a bank customer, virtualized dance session).Computer vision (CV) is a relatively young field. It emerged during the1970s, marked by the work of David Marr (1982). The grand objective ofcomputer vision is the creation of machines that can see. The field is closely related to artificial intelligence,visual psychology and neuroscience. The problem of duplicating the human visioncapabilities in machines proved to beharder than initially expected. The success stories are in industrial or other applications, with well-specified restricted domains, under clearly defined assumptions and conditions. While, at the beginning, the problems CV addressed were top-down and idea driven, they are more recently bottom-up and driven by applications such as: homeland security and defense surveillance, medical imaging and diagnostics, computer-assisted surgery, autonomous robotics, unmanned vehicles, driver assistance, traffic control, entertainment, education and training, social networking and human computer interaction. A typical computer vision program addresses one or more ofthe vision tasks, for example, an ability to detect features and objects in images; to recognize object classes (like people and buildings) and specific objects (like Abraham Lincoln or the Empire State Building); to make estimates of object attributes, position and location in a 3D image; to compare or match a target object to a collection of objects in a database; or to detect events and track objects in video. Although a large amount of CV research and technology development addresses 2D domain (detecting faces in photographs, matching 2D contours and shapes in images), the inference of 3D information from images is crucial for certain applications that require action in 3D.We live in a 3D world, and extracting 3D information is of great importance, particularly for medical and robotics applications. The 3D-related tasks include reconstructing the 3D objects, surfaces, terrains and scenes represented in the images, making measurements in 3D, reconstructing 3D shapes, and matching and deforming 3D shapes, as well as autonomous vehicle and robot navigation.
My recent research is focused on the latter type of problems: the development of algorithms and computer programs that recover, detect, model and measure 3D objects and shapes. In developing the theory and algorithms, the research relies heavily on theoretical computer science, linear algebra, projective and differential geometry, and probability and statistics. In this presentation, I omit the mathematical treatment and theory and instead try to provide a glimpse into the types of problems we address and the results obtained.
Figure 1. From 2D images to3D objects; (a) the inputimages; (b) the estimated 3Dpoint cloud; (c) the estimatedlocal orientation, at eachpoint a little disk represent a“fish-scale,” i.e., a tangent planeto the unknown surface at thepoint, and the red stick at eachdisk represents the surfacenormal, i.e., a vectorperpendicular to the surface;(d) a 3D mesh modelapproximating the original 3Dsurface of the object; and (e) aview of the textured 3D meshmodel. The model can beviewed from any direction; aview different from those in theinput images is shown (e).[GRASP Laboratory Demo,(Sara and Bajcsy, 1998),University of Pennsylvania].
From 2D Images to 3D Models
Figure 1 illustrates the typical problem of 3D model recovery from 2D images. Given 2D images as input, the goal is to come up with a 3D model of the surfaces of the objects represented in the images (thus we talk about surface recovery or reconstruction). Notice that this problem tries to invert the process that has taken place during the picture taking: while the images represent 2D projections of the 3D scene observed, computer vision tries to go back and recover the 3D information from the 2D images. The process of 3D reconstruction involves several steps, shown in Figure 1. Exploiting visible features in multiple images, (a), computer programs estimate the real 3D points that correspond to those matched features. The resulting set of unorganized 3D points is referred to as a 3D point cloud, (b). For each point in the cloud, local orientation is estimated (a small tangent plane, called a fishscale, is attached to each point in the cloud, describing the way the original, unknown surface is facing), (c). The next step is the recovery of the surface topology, i.e., for each point identifying its closest neighbors. Finally the 3D surface model is approximated, i.e., the geometry is captured, (d).
Figure 2. Recovering the surface shape: (a) an image of a face; (b) a subsample of the 3D point cloud recovered; (c) the oriented cloud (every point has a normal vector, showing the alignment of the local surface patch); (d) and (e) the principal curvature directions (the direction of the minimal and maximal curvature at the points); (f) and (g) the mean curvature surface (the mean curvature is related to the pressure that must be used in order to recover the shape of the face from a flat sheet). The mean curvature and the principal curvatures encode the shape of the object. [Results from Kamberov and Kamberova 2002) obtained by the Conformal Method for Quantitative Shape Extraction.]
Estimating the Topology and Geometry of 3D Point Clouds
3D point clouds may be obtained from images (as discussed above), as well as from laser scanning systems or various medical imaging sensors. Although textured mesh models, Figure 1, (e), are desirable for human viewing, gaming or the entertainment industry, they are not sufficient for many engineering, medical and scientific applications where accurate and precise geometric measurements need to be made from the models. To address the problem, we study the recovery of the orientation, topology and geometry of the unknown surfaces, working directly on the cloud, without the use of intermediate surface fitting and approximate global surface models or meshes. This is a collaborative work with George Kamberov (Stevens Institute of Technology). We have developed new methods for estimating the surface topology (Kamberov and Kamberova, 2004b; Kamberov, Kamberova and Jain, 2005) by defining a proximity likelihood function that expresses the likelihood that two point clouds with attached normals (or fish-scales, as defined by Sara and Bajcsy, 1998) are neighbors. The standard approaches, developed by others, use the straight line distance between points in 3D to define proximity. Using such straight line distance would work only in simple cases, but for clouds that come from more curved surfaces, it can lead to gross errors. For example, two points on the surface of a human brain cortex could be very close in 3D with respect to a straight line distance, and yet be very far apart from each other on the surface of the cortex due to deep folds of the brain. If an algorithm identifies such points as neighbors, instead of recovering a typical cortex surface with many folds, an egg-like smooth surface would be computed instead. In defining the proximity likelihood function (Kamberov and Kamberova, 2004b; Kamberov, Kamberova and Jain, 2005), we use not only the straight line (Euclidean) distance, but also the deviation of the surface normals and an implicit measure of the unknown surface distance. We select as neighbors of a point, those for which the proximity likelihood is maximized. A 3D point cloud with normals at each point and neighborhood is called an oriented topologized point cloud (Figure 1, c). Using a new differential geometric result (presented in Kamberov et al., (2002), we developed in Kamberov and Kamberova (2002) the Conformal Method for Quantitative Shape Extraction for oriented topologized point clouds. This approach allows the computation of multiple estimates of the shape invariants (mean and Gauss curvatures) and robust statistics to estimate the curvatures at the surface points. The computations are local (per neighborhood), a global model is not recovered, and the algorithms are highly parallelizable. At each surface point, we obtain quantitative estimates of the mean curvature, the Gauss curvature, and the principal curvature directions. See Figure 2.
In a performance evaluation study (Kamberov and Kamberova, 2004a), we compared with the same data set the performance of the Conformal Method for Quantitative Shape Extraction (CMQSE) against the other most popular and praised approaches. As demonstrated in Surazhsky et al. (2003), none of the previous approaches is universal even on meshes, i.e., depending on the type of the unknown surface, and the parameter being estimated (for example, mean or Gauss curvature), different methods of the surveyed ones dominate the rest. Thus, there is a problem: to get best results, the system must know the type of the surface to be recovered (locally, planar or spherical, or elliptic, or parabolic, etc.), but recovering the surface is the primary objective, even if its type is not known apriori. We demonstrate empirically in Kamberov and Kamberova (2004a) that CMQSE is admissible, i.e., on all of the published test data sets, independent of which shape invariant (mean or Gauss curvature) is needed, CMQSE is either better or comparable to the best performer of all methods surveyed in Surazhsky et al. (2003). The main advantages of the CMQSE stem from the use of the modern shape theory Kamberov et al. (2002), the use of stable computations, the availability of multiple estimates of the shape invariants per point, and the use of robust statistics for computing the final estimates (Kamberov and Kamberova, 2002).
Figure 3. Scene reconstruction from unorganized sets of images. Several of the images, (a), left, and the recovered 3D point cloud, (a), right. The cloud is non-uniform and noisy (there are errors). The cloud, (a), right, is shown from different viewpoints and at various levels of resolution. From the cloud (a), right, using the pipeline represented schematically in Figure 4, the connected components are extracted, (b). Each component has its own automatically recovered level of resolution. In (b), we show the two largest components. [Results from Kamberov et al., 2006.]
Images to Geometry Pipeline for Large-Scale, Complex Scene Reconstruction
Presently, it is possible to extract 3D point clouds from complex 3D scenes (Figure 3). What is needed are automatic and computationally stable methods to extract geometry from the noisy, non-uniform clouds. These methods must handle automatically under-sampled regions; for example, they have to deal with surface boundaries and resolve the ambiguities imposed by objects that are or appear to be tangent to each other. Furthermore, given the expected low quality of the data, it is crucial to have methods to evaluate the consistency of the recovered geometry, even in the absence of ground truth. Collaborating with Radim Sara and other researchers at The Center for Machine Perception, Czech Technical University, and George Kamberov, Stevens Institute of Technology, we developed the Images to Geometry Pipeline (Kamberov and Kamberova, 2004b; Kamberov et al. (2006), a system for automatically recovering a 3D cloud from an unorganized set of images, partitioning the cloud into components, and recovering the component geometry. This work was motivated by many applications where, due to communications, synchronization, equipment failures, and environmental factors, an intelligent system is forced to make decisions from an unorganized set of images, offering partial views of the scene. The system is organized as a pipeline consisting of three modules: the image-data pipeline, the orientationtopology module, and the geometry pipeline (See Figure 5). The input data are photographs of the scene to be reconstructed, taken with a hand-held compact digital camera, which is free to move in the scene and to zoom in and out. The images are input into the data pipeline. To deal with the complexity of the data and noise, the intermediate point sets recovered in the data pipeline use millions, redundant, noisy points; close by points are clustered using statistics of nearby points represented by covariance ellipsoids. Each cluster of points is replaced by the corresponding covariance ellipsoid center; thus a refined cloud of ellipsoid centers is obtained. Each covariance ellipsoid defines a tangent plane (a “fish-scale” ) at the corresponding refined point. The fish-scales (3D points with tangent planes) are input into the orientationtopology module. Each tangent plane defines a unique line through the fishscale center orthogonal (perpendicular) to the plane. The next step is to define the cloud orientation, consistently to choose the “out” surface direction for the lines through the fish-scale centers, and thus associate with each cloud point a unique normal vector. The estimation of the normals is a key step in geometry reconstruction – it is also notoriously prone to errors. A number of authors have studied the reliability of orientation estimates. Many authors have used integrability on graph surfaces to help reconstruction and resolve ambiguities. Using the conformal geometry approach to surface theory (Kamberov et al., 2002), we develop an integrability constraint satisfied by the orientation of a smooth surface in 3D, not necessarily graphs (Kamberov and Kamberova, 2007). This General Integrability Constraint (GOC) is used at the orientation-topology stage of the pipeline (See Figure 4). The GOC also serves as a foundation to provide an approach to quantify the performance of geometry reconstruction techniques from 3D point clouds and meshes. Furthermore, the GOC can be used to enhance different reconstruction methods. The output of the orientationtopology module is a consistently oriented, topologized point cloud, each point of which is equipped with a normal and a neighborhood of points closest to it in terms of surface distance; the points are classified as isolated, curve, and surface points, and the latter points are divided into interior and boundary points. The connected components of the topologized cloud are identified; thus, the scene is segmented into objects. The data acquisition method implied by our scenario could lead to reconstructions of exceptional geometric complexity. The point clouds could be sparse and nonuniform, and thus, components may be recovered at different scales; both positions and tangent plane estimates are expected to be noisy. Because of noise, but also because of the freedom to collect and combine different views, we might have objects that touch each other; such touching objects should be separated. Furthermore, many of the reconstructed surfaces will not be closed; they will have boundary points. Ours is the only existing, completely automatic approach for recovering scene geometry from unorganized sets of images.
Figure 4. Schematic representation of the Image to Geometry Pipeline (Kamberov et al., 2006), incorporating the General Integrability Constraint from Kamberov and Kamberova (2007).
The recovery of the geometry is very important for the registration and matching of surfaces. At present we are addressing this problem and also the problem of surface deformations. Those have important medical applications, and we are pursuing collaborations with the Hofstra University School of Medicine in partnership with North Shore-LIJ Health System.
Figure 5. The image to geometry pipeline: (a) a subset of an unorganized set of input images; (b) the recovered 3D point cloud; by using the proximity likelihood function [3-4], we are able to refine the orientation and topology of the cloud and break it into components; (c) the two largest components; each component is processed independently using the conformal method for quantitative shape extraction  to compute at shape of the component. [Results from Kamberov, Kamberova and Jain (2005) and Kamberov et al. (2006).]
Over the years, I have involved students in this research, including Hofstra students Thomas Re, Scott Klertzkin and Russel Kole, and high school student David Golub, participant in Hofstra Summer Science Research Program. David Golub was a semifinalist at the Intel Science Talent Search in 2006.
This research was partially supported by Hofstra Faculty Research and Development Grants (2001-2007) and a Hofstra Presidential Award (2004).
Kamberov, G., and Kamberova, G. (2002). Recovering surfaces from the restoring force. Proc. of The European Conference in Computer Vision, Copenhagen, Demark, June 2002. Lecture Notes in Comp. Science, Volume 2351, pp. 598-612, Springer- Verlag, Berlin, Heidelberg.
Kamberov, G., and Kamberova, G. (2004a). Conformal method for quantitative shape extraction: performance evaluation.” Proc. of International Conference on Pattern Recognition, IEEE Proceedings Series, Cambridge, UK, August, 2004.
Kamberov, G., and Kamberova, G. (2004b). Topology and geometry of unorganized point clouds. Proc. of 2nd Intl. Symp. 3D Data processing, Visualization and Transmission, Thessaloniki, Greece, IEEE Proceedings Series, in cooperation with Eurographics and ACM SIGGRAPH.
Kamberov, G., Kamberova, G., and Jain, A. (2005). 3D shape from unorganized 3D point clouds. Proc. of 1st Intl. Symposium on Visual Computing, Lake Tahoe, NV, Springer Lecture Notes in Computer Science.
Kamberov, G., Kamberova, G., Chum, O., Obdrzalek, S., Martinec, D., Kostkova, J., Pajdla, T., Matas, J., and Sara, R. (2006). 3D geometry from uncalibrated images. Proc. of 2nd Intl. Symposium on Visual Computing, Lake Tahoe, NV, Springer Lecture Notes in Computer Science, Vol. 4292 II: 802-813.
Kamberov, G., and Kamberova, G. (2007). Geometric integrability and consistency of 3D point clouds. Proc. of 11th IEEE Intl. Conf. Comp. Vis., Rio de Janeiro, 2007, IEEE Proceedings Series.
Kamberov, G., Norman, P., Pedit, F., and Pinkall, U. (2002). Quaternions, Spinors, and Surfaces. American Mathematical Society, ISBN 0821819283.
Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: Freeman, ISBN 0-7167-1567-8.
Sara, R., and Bajcsy, R. (1998). Fish-scales: Representing fuzzy manifolds. Proc. of International Conference on Computer Vision, pp. 811-817, IEEE, Narosa Publishing House, January 1998.
Surazhsky, T., Magid, E., Soldea, O., Elber, G., and Rivlin, E. (2003). A comparison of Gaussian and mean curvatures estimation methods on triangular meshes. Proc. of IEEE International Conference on Robotics & Automation.