A New GPU Bundle Adjustment Method
for Large-Scale Data
Zheng Maoteng, Zhou Shunping, Xiong Xiaodong, and Zhu Junfeng
Abstract
We developed a fast and effective bundle adjustment method
for large-scale datasets. The preconditioned conjugate gradi-
ent (
PCG
) algorithm and
GPU
parallel computing technology
are simultaneously applied to deal with large-scale data
and to accelerate the bundle adjustment process. The whole
bundle adjustment process is modified to enable parallel
computing. The critical optimization on parallel task assign-
ment and
GPU
memory usage are specified. The proposed
method was tested using 10 datasets. The traditional Leven-
berg Marquardt (
LM
) method, advanced
PCG
method
8
, Wu’s
method
16
and the proposed
GPU
parallel computing method
are all compared and analyzed. Preliminary results have
shown that the proposed method can process a large-scale
dataset with about 13,000 images in less than three minutes
on a common computer with
GPU
device. The efficiency of the
proposed method is about the same with Wu’s method while
the accuracy is better.
Introduction
Bundle adjustment (
BA
) is an inevitable and essential pro-
cedure in the geosciences especially in the structure from
motion (
SFM
) and
3D
model reconstruction application area.
It has been developed and applied since the 1950s. The first
block adjustment program based on bundles was developed
in 1960s
1
. In the next few decades, more works were dedi-
cated to acquire good results in bundle adjustment
2, 3, 4, 5
. The
traditional methods are mostly based on Levenberg Marquardt
(
LM
) algorithm
6, 7
. In the early stages of bundle adjustment,
the main concern was on the accuracy and robustness of the
system. However, with the general increase of data sizes,, the
efficiency problem has gradually become the main concern of
the community. The bottlenecks are the memory requirements
and computational complexity.
To deal with large scale data, preconditioned conjugate gra-
dient (
PCG
) based method was selected to solve large normal
equation as discussed in the work undertaken by
8
. The selec-
tion of different preconditioners has been well-studied in the
applied mathematics community. Some stable and effective
preconditioners have been proposed, such as block Jacobi
11,
12, 13, 14
, symmetric successive over-relaxation (
SSOR
)
13, 14
, QR
factorization
13
and others. In a previous study, we used an ef-
ficient block-based sparse matrix compression (
BSMC
) method
combined with the
PCG
algorithm to cope with the large-scale
data
8
. However, the
BSMC
method still stores the compressed
normal matrix. As the size of the normal matrix continues to
increase, the
BSMC
method will eventually become inappli-
cable. Hence, in this paper, we develop a new method where
the normal matrix is not stored in the memory at all, but is
computed in real time using
GPU
high performance parallel
computing.
GPU
parallel computing is one of the most effective
methods to improve the computational efficiency. Actually,
a lot of researchers have already reported their
GPU
parallel
computing systems, for instance
9, 10, 12, 15, 16, 17, 18, 20, 21, 22, 23, 24, 25, 26,
27, 28, 29
. But some of these still store the Jacobian which needs a
large memory space
9, 10, 15, 17
. The bundle adjustment procedure
needs extensive modification for parallel computing especially
for
GPU
parallel computing. The optimization of
GPU
paral-
lel computing with respect to the special architecture of
GPU
hardware is critical to the improvement of the computational
efficiency
19
. It includes balanced parallel task assignment,
optimal
GPU
memory usage, suitable device code (uses as low
logic operations as possible), and so on. Yet, most of previ-
ous works did not specify their methods on the optimization
of
GPU
parallel computing
10, 15
. The balancing of parallel task
and the
GPU
memory usage can directly affect the speed of
GPU
parallel computing due to the latency of
GPU
global memory.
Our work is mainly focused on these two perspectives.
Related work
To improve the efficiency of bundle adjustment, several
researchers have tried to develop effective and practical meth-
ods in both algorithmic and technical perspectives.
Endeavors on Algorithm Improvements
Bundle adjustment algorithms have been studied extensively
and there are a large number of published articles in the
field. Lourakis
et al
.
15
argued that this method can be thought
as the combination of Steepest descent and Gauss-Newton
method, and proposed a new method named
DL
(Dog Leg)
Algorithm and claimed that
DL
is superior to
LM
especially
for large scale problems. However, both the
LM
and
DL
algo-
rithm need to store and solve the normal equations, thus, it
is still constrained by the bottleneck of memory requirement
and computational complexity as previously mentioned. Ni
et al
. (
b1c7d8ead6d374fa3128a464.pdf
) developed an out-of core
bundle adjustment method for large-scale
3D
reconstruction.
They decoupled the original problem into several sub-maps
that have their own local coordinate systems and can be
optimized in parallel, and therefore can reconstruct a very
large-scale system in a computationally efficient manner. But
the initialization problem has not been directly addressed,
and the traditional
LM
algorithm is still applied
16
. Cholesky’s
factorization-based solver for normal equation is commonly
used in
LM
. For bundle adjustment with massive data, the
iteration method, such as Preconditioned Conjugate Gradient,
Zheng Maoteng and Zhou Shunping are with the National
Engineering Research Center for Geographic Information
Systems, China University of Geosciences (Wuhan), No. 388
Lumo Road, Wuhan, China (
).
Xiong Xiaodong and Zhu Junfeng are with Mirauge3D
Technology Inc., No. 9 Guang’an Road, Fengtai District,
Beijing, China.
Photogrammetric Engineering & Remote Sensing
Vol. 83, No. 9, September 2017, pp. 633–641.
0099-1112/17/633–641
© 2017 American Society for Photogrammetry
and Remote Sensing
doi: 10.14358/PERS.83.9.633
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
September 2017
633