07-20 July FULL - page 444

depth-estimation algorithm using a structure tensor to com-
pute directions of feature pixels in an epipolar plane image
(
EPI
). Yu
et al.
(2013) analyzed the 3D geometry of lines in a
light-field image and encoded the line constraints to improve
the disparity map. Tao
et al.
(2013) combined defocus and
correspondence cues to estimate scene depth using EPIs,
and optimized the depth map using a Markov random field
(
MRF
). Tosic and Berkner (2014) estimated depth by defining a
description of
EPI
texture and mapping this texture to scale-
depth space. Sabater
et al.
(2015) proposed a depth-estimation
algorithm based on block matching using subaperture images
without demosaicking. Compared with these algorithms, Jeon
et al.
(2015), Zhang Liu, and Dai (2015), and Yang
et al.
(2019)
improved depth accuracy by achieving the subpixel displace-
ment estimation of subaperture images using the phase-shift
theorem. However, none of these algorithms consider occlu-
sion, and they would provide oversmooth results at occlusion
boundaries. Kim
et al.
(2013) computed reliable depth estima-
tion around object boundaries using densely sampled light
fields and implicitly handled occlusions. Chen
et al.
(2014)
introduced a bilateral consistency metric on the surface cam-
era to estimate depth in the presence of occlusions. However,
the occlusion problem is solved there in images with a wide
baseline in the two algorithms, which is not applicable to im-
ages acquired by a single light-field camera at one position.
When a pixel is occluded, the photo-consistency assump-
tion no longer holds, since some viewpoints will be blocked
by the occluder. Enforcing photo-consistency on the oc-
cluded pixels will lead to an incorrect depth result, causing
oversmoothing around the sharp occlusion boundaries. In
order to solve the occlusion problem, T.-C. Wang
et al.
(2016)
proposed a single-occluder occlusion theory and derived the
occluder consistency between the spatial and angular patches
for the occluded pixels—i.e., when refocused to the correct
depth, the angular patch can be separated into occluded and
unoccluded views by a line which has the same orientation as
the occlusion edge. The algorithm selects the occluded pixels
by dilating the edge detected by the Canny edge detector in
the center-view image, extracts the unoccluded views of the
occluded pixels according to the occluder consistency, and
improves depth estimation by computing the cost only in
unoccluded views. However, the proposed occluder consis-
tency is unsuitable for multi-occluder occlusion because the
occluded and unoccluded views in the angular patch can-
not be simply divided into two regions by a straight line in a
multi-occluder situation.
Zhu, Wang, and Yu (2017) derived the occluder consis-
tency between the spatial and angular patches for multi-
occluder occlusion—i.e., the corresponding views of the
occluder are the occluded views, and the corresponding
views of the background are unoccluded views. The occluded
and unoccluded views in the angular patch of an occluded
pixel correspond to the regions that the spatial patch of the
pixel is divided into. In order to obtained the unoccluded
views of each pixel at occlusion boundaries, that algorithm
divides the spatial patch of the pixel into two regions using
k
-means clustering according to the occluder consistency,
with the occluded and unoccluded views in the angular patch
of the pixel corresponding to the two regions. For each pixel
around an occlusion edge, the algorithm finds two edge pixels
closest to the pixel, and the unoccluded views of the pixel
are obtained by finding the intersection of the unoccluded
views of the two edge pixels; this is called
voting strategy
.
However, the selection of unoccluded views in this method is
unsatisfactory in a complex-textured region for two reasons.
First, the
k
-means clustering requires specifying the number
of clusters in advance, which may be different from the actual
number of clusters, which would make the clustering results
inaccurate. Second, the voting strategy to obtain unoccluded
views for the pixels around the occlusion boundaries is not
very effective in regions with complex textures. A consensus
on depth estimation in computer vision is that more effec-
tive views lead to more accurate depth estimation. The key to
getting an accurate depth map is to select correct unoccluded
views. However, it is difficult to effectively select unoccluded
views in complex-textured regions based on prior methods.
Therefore, we propose an algorithm to accurately select the
unoccluded views in the angular patch.
In addition, the method identifying the occluded pixels
by dilating the edge pixels in the algorithms by T.-C. Wang
et
al.
(2016) and Zhu
et al.
(2017) results in many unoccluded
pixels being included among the selected occluded pixels, so
that the step of selecting unoccluded views from the angular
patch for occluded pixels has also been done for the selected
unoccluded pixels—which is unnecessary, because all views
in the angular patch of the unoccluded pixels are unoccluded
views. Moreover, since the selected unoccluded views are
only a portion of all the unoccluded views, using them to es-
timate the depth of the unoccluded pixels in these algorithms
decreases the depth accuracy. Therefore, effectively identify-
ing the occluded pixels is very important.
Different from the methods of T.-C. Wang
et al.
(2016) and
Zhu
et al.
(2017), Schilling
et al.
(2018) used EPIs to handle
occlusion. By integrating the occlusion handling, their meth-
od improved performance for object borders and smooth sur-
face reconstruction. Besides the conventional methods, deep-
learning methods have been used in depth estimation in light
fields. Shin
et al.
(2018) achieved fast and accurate depth
estimation based on a fully convolutional neural network and
proposed a data-augmentation method to overcome the lack
of training data. Tsai
et al.
(2020) proposed an attention-based
view-selection network for light-field depth estimation and
improved accuracy by using the views more effectively and
reducing redundancy within views.
In this article, we explicitly take occlusion into account.
By effectively identifying the occluded pixels and accurately
selecting the unoccluded views in complex-textured regions,
we obtain accurate depth for multi-occluder occlusion bound-
aries. Our main contributions are the following:
• We present an algorithm to effectively identify occluded
pixels, improving depth accuracy.
• We propose an algorithm to accurately select the unoc-
cluded views in the angular patch, obtaining more accurate
unoccluded views compared with prior methods.
• We propose an algorithm to accurately estimate depth
which can preserve occlusion boundaries.
In the next section, we introduce the single-occluder and
multi-occluder occlusion models. Then an accurate depth-
estimation method for multi-occluder occlusion is proposed
and elucidated: First the occluded pixels are effectively
identified, and then the unoccluded views for occluded
pixels are accurately selected. Third, the initial depth map is
improved by computing the cost volumes in the unoccluded
views. Finally, we refine the depth with
MRF
regularization. In
the section after that, we demonstrate the advantages of our
proposed method compared with state-of-the-art algorithms
quantitatively and qualitatively.
Light-Field Occlusion Theory
In this section, the single-occluder and multi-occluder occlu-
sion models are introduced. T.-C. Wang
et al.
(2016) devel-
oped a light-field single-occluder occlusion model based on
the physical image formation and proved the occluder con-
sistency for single-occluder occlusion. Each pixel on the oc-
clusion edge is assumed to be occluded by only one occluder.
444
July 2020
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
391...,434,435,436,437,438,439,440,441,442,443 445,446,447,448,449,450,451,452,453,454,...458
Powered by FlippingBook