Abstract. In this paper, we describe a novel algorithm to group, label, identify and perform optical tracking of marker sets, which are grouped into two specific configurations, and whose projective invariant properties will allow obtaining a unique identification for each predefined marker pattern. These configurations are formed by 4 collinear and 5 coplanar markers. This unique identification is used to correctly recognize various and different marker patterns inside the same tracking area, in real time. The algorithm only needs image coordinates of markers to perform the identification of marker patterns. For grouping the dispersed markers that appear in the image, the algorithm uses a “divide and conquer” strategy to segment the image and give some neighborhood reference among markers.
Keywords:Point-based Feature, Point Set Matching, Projective Invariants, Optical Tracking.
For many years, tracking systems have been essential and intensively used tools in the implementation of virtual and augmented reality applications. Their purpose is to track different objects or markers, either individually or as a group, within a predefined area. Among the diverse technologies used to develop these tracking devices we can mention mechanical, magnetic, sound, optical and hybrid ones. Among these, optical technologies have been one of the mostly used due to some advantages they have in relation to others, such as: their sensors require no wires, they are less sensitive to noise, and they allow incrementing simultaneously the number of markers tracked within an area without significantly affecting the system, especially hardware.
Optical tracking systems often have a well structured and standardized hardware and software architecture that can be summarized as follows:
Concerning hardware, this type of tracking is usually composed of video capturing devices varying from a simple web camera to robust cameras for industrial vision. Another hardware element are markers, which can range from small retro-reflective spheres, IR (infrared) leds, up to dots or features of the scene itself, such as corners, lines and other characteristic details the system might use as tracking markers.
Regarding software, optical tracking systems present a more standardized operation flow based on the implementation of different computer vision techniques with the purpose of extracting, recognizing and tracking markers.
Among computer vision techniques used to implement optical tracking systems, image processing is the first technique implemented, with the objective of extracting the 2D coordinates that represent markers’ positions within the image captured from the tracking area. Such coordinates constitute the principal information on which the system will operate.
The following stages are the stereo matching of the marker sets appearing in the images (when more than one camera is used) and the 3D reconstruction of the markers based on information about the cameras and the 2D coordinates provided by image processing. After 3D reconstruction, identification of markers generates expensive combinatory operations because of wrong results achieved during matching stage.
The matching stage is usually made with stereo correspondence and epipolar geometry. Due to the lack of a previous standardized identification of markers included in 2D coordinates within the image, false matches and subsequent wrongly reconstructed 3D points are generated. To deal with this problem, other heuristics are required to discard these false markers, such as using 3D metrics to find the correct combination of markers that belong to a specific pattern. These metrics may be implemented based on distances, angles or graphs representing marker distribution.
It is precisely in this stage prior to matching that the algorithm proposed in this paper will be used to reach a way of individually grouping and identifying in 2D the marker sets that form a specific pattern to be tracked. The objective of the proposed algorithm is to be a support tool to reduce the number of cases of false markers generated in the matching stage and subsequently in the 3D reconstruction stage.
This paper is organized as follows: in the next section previous work is discussed. In section 3 the configuration of the hardware system used is presented. In section 4 the theoretical part of the computer vision techniques used to implement the algorithm is described, as well as the algorithm’s process flow. In section 5 some results are presented, and in section 6 we draw conclusions and present proposals for future work.
2 Related Work
The marker matching and pattern identification process has been widely investigated in the area of pattern recognition while being used in computer vision. Often, robust and well developed techniques in the pattern recognition area are “exported” to be used in computer vision. In the case of the proposed algorithm, we have used two techniques employed in pattern recognition, here applied in the implementation of marker pattern matching in an optical tracking system. These two techniques on which the algorithm is based are the theory about projective invariant properties described in [3a],[8a],[10a],[14a] and the “divide and conquer” strategy exemplified by the use of a QuadTree to segment an image where the leaves will be the image coordinates of the markers, which are spread over the image.
The use and efficiency of these techniques have been presented in separate works. The implementation of projective invariant properties for pattern recognition was presented in [2a],[11a],[10a],[13a]. Image segmentation based on the “divide and conquer” strategy was presented in [6a],[7a]. In these works one can see that both techniques have good individual performance, but so far no work presented an integration of these techniques. This was a further motivation behind the present work, which intends to show that these techniques can yield good results by working together.
Some variations on the implementation of these techniques will be presented in section 4, besides some additional tools proposed to improve continuity in pattern tracking and identification, and in the individual markers that compose the patterns.
To run and test the algorithm, the implementation of a basic optical tracking system was required. This system is based on the standard architecture for optical tracking systems, i.e. an architecture based on cameras with infrared light spots and filters that irradiate a scene containing markers covered with retro-reflective material with the purpose of highlighting the markers in relation to other objects present in the scene. This type of marker is called passive marker. The idea behind this architecture was used in our system, which also uses cameras with infrared filters but without the light spots – instead, the markers are small 5mm and 2mm incandescent lights powered by a set of batteries for each pattern. As was previously defined, the tracking patterns used will have a predefined format with the following configurations:
Pattern I: formed by 4 markers placed in a collinear manner.
Pattern II: formed by 5 markers placed in a coplanar manner, with the additional characteristic that one marker will be surrounded by the 4 other markers.
These formats can be seen in Fig. 1 and will be further explained in section 4.3, where we will succinctly describe the theory behind the invariant properties of projection and permutation.
In this section we will briefly summarize the theory behind the techniques used in the implementation of the proposed algorithm, and present a general description of this algorithm.
The algorithm’s purpose is to group, label, individually identify each pattern, and track it optically in real time, having as operational data only the image coordinates of the markers that compose the tracking patterns. To achieve this goal, the algorithm has two steps: firstly an offline step used for training and generation of a unique identifier to each pattern to be tracked, and secondly an online step that will be the basic optical tracking system, running in real time, used to test the algorithm.
The main techniques used in each step will be presented as follows. Then, the process flow of each step of the algorithm will be presented.
The image processing used in the algorithm has as main goal analyzing and extracting a 2D representation for each marker displayed on the video images captured by the tracking system cameras. This 2D representation indicates the position of the markers in image coordinates. This sub-process is composed by a set of techniques that analyze the image sequentially, according to the following action flow:
Capture the video images with the devices used (cameras).
Convert each image to a single grayscale channel and apply a threshold filter to make it binary.
Apply a connected-component algorithm to identify circular areas that will be the representations projected on the marker images.
Extract the center of each connected area as the image coordinate for the marker candidate inside the image.
To implement these techniques in the proposed algorithm, the implementation made in the OpenCV [4a] library was employed. This technique is widely used in diverse implementations of optical tracking systems, both academic [5a] and commercial [1a],[15a], and is successfully executed applied to the restrictive characteristics defined in the hardware configuration used.
The QuadTree implementation has the purpose of grouping, in quadrants, the total set of markers spread through the image. This grouping is based on the principle that markers belonging to the same pattern should be very close to one another. The key step to take advantage of this marker quasi-grouping is reading and traversing the QuadTree to extract very close subgroups of markers. This way, candidate subgroups of markers will be formed and tested, and they will contain the tracking patterns inside the tracked image.
The process of traversing the QuadTree in each analyzed frame is basically done by moving down through the four branches of the QuadTree. As one moves downwards, a leaf or other a parent node is found. If a parent node is found then this node is verified to see if it contains the minimum number of markers that form one of the predefined patterns. The process of traversing the QuadTree and generating groups follows this action flow:
Generate a QuadTree for each group of markers detected in the image processing stage for each frame captured by the camera.
Traverse each branch of the QuadTree.
If the tested node is a parent node and if it only has leaf nodes, then
If the number of children is larger or equal to the maximum number of markers containing our patterns, i.e. 4 markers for pattern I or 5 for pattern II, then
Generate subgroups by combining patterns as C4n or C5n, depending of what pattern we are testing, where “n” is the number of leaf nodes the parent has, provided that “n” >=4 or “n” >=5 correspondingly.
Else, return to the parent node.
Else, move down through the 4 leaves of the node running the previous step recursively.
Each time the flow reaches a node containing the minimum number of markers and generates the combinations to compose the marker sets, then test the correspondence of each of these sets against the patterns defined in the training stage.
If a group is correctly matched to a given pattern, then the markers forming the matching group are removed from the QuadTree structure.
The unmatched markers return to the parent node identified as unmatched, and will be part of the same matching process with other unmatched children nodes of the same parent.
As can be seen, matching is a recursive process made at each branch, firstly top-down but becoming bottom-up as the markers are matched in each branch analyzed.
The role of the QuadTree is merely as a tool that implicitly provides a neighborhood reference among the markers. An example of this implementation in our system is showed in Fig. 1.
Fig. 1. Grouping for image coordinates of our patterns done with a QuadTree.
The implementation of projective invariant properties has been discussed especially in the area of pattern recognition [3a],[10a]. It is based on the invariant property of cross ratio, particularly in the case of perspective projection. The cross ratio property states that if we have 4 collinear points (A,B,C,D) we can define a cross ratio value based on these points’ distances according to the following relationship:
Cross ratio (A,B,C,D) =
This property has been expanded in order to cover not only collinear patterns but also patterns with coplanar markers. In this case, the cross ratio is obtained based on the areas of the triangles generated using combinations of the vertices that constitute a coplanar pattern, as shown in [9a]. This extended variation of cross ratio projective invariant properties applied to coplanar points was developed until it became invariant to possible changes in the labels that compose coplanar patterns. This new approach and its application are shown in [2a]. In [3a] there is another extension to generate projective invariant properties in cases with over 5 coplanar points.
In these works, the identifier of the projective invariant for a set of 5 points is defined as a vector with 5 ranges, with a minimum and maximum value for each vector position. This vector is obtained by applying the P2-Invariant technique [2a] over the 5-point sample, where each vector position is related to each marker in the set. According to [3a] this allows not only group matching but also individual matching for each marker in the 5-point set. The only drawback is the need to generate the whole 5-range vector for each candidate group of markers to be matched to a given pattern. In [10a] were presented an improvement in relation to [3a]. In the case of 5 coplanar points, it demonstrates that the 5-interval vector can be reduced to a range similar to the case of 4 collinear points defined in [3a]. This new approach proposes the generation of a vector with 2 to 6 possible value ranges to define the pattern’s projective invariant property. Differently from [3a], however, [10a] states that one could compute only the first two value ranges in the 6-range vector, and use these values to match the 2-position vector generated for a candidate set of 5 points against the values computed for a predefined pattern.
In fact, the first 2 ranges in the vector correspond to the values of the base projective invariants, and the other ranges are normalizations based on functions that use the 2 base values as data. This new approach also adds a new restriction: to generate this single vector with 2 value ranges, in the tracked pattern a specific marker from the group must be well recognized in all views while the pattern is being tracked. This is required because the value of the projective invariant vector is generated based on the cross ratio of the triangles formed by the combination of the vertices, where that specific marker is part of the 4 triangles used to compute the cross ratio. Another overall restriction is that no 3-marker set in a group of 5 be collinear, as this would generate a null triangle area.
Finally, based on the results of the works described above, we have defined the configuration of the patterns used in the algorithm proposed here. Pattern I has two invariant features to perspective projection: collinearity and, as a consequence, cross ratio. Pattern II, with 4 points around a 5th point, favors compliance with the need to identify a specific marker in the 5-marker set – in this case, the central marker. This characteristic allows computing the value(s) of the properties that are invariant to projection and permutation in a set of 5 markers, and allows this specific configuration to be used as filter to discard false candidates that do not comply with the restriction of 4 markers surrounding a 5th one.