PE&RS May 2015 - page 378

corresponds to one processor in the
MPI
environment. Proces-
sors are identified by non-negative integer, ranks. Those pro-
cessors are divided into two types, i.e., master and slave(s),
in the adopted parallel mechanism. One processor commonly
acts as a master and the others as slaves. Assuming there are
k
processors, the rank of master is
0
and the rank range of slaves
is 1 − (
k
− 1). The processing tasks of blocks into which the
output bounds are partitioned are also labeled as numbers.
The numbering order is row-first. There are (
n
×
m)
processing
tasks whose numbers range from 0 to (
n
×
m
− 1) if a row and
a column are decomposed into
m
and
n
blocks, respectively.
These processing tasks are only distributed to and processed
by slaves. In the course of parallel processing, task
i
is as-
signed to processor
(
i%
(
k
− 1) + 1). In other words, given tile
size and bounds of output, the rank of slave determines which
tasks are dispatched to the slave. As a result, the processing
tasks of blocks are equally distributed to the slaves.
The procedure and interactions between the master and
slaves is shown in Figure 6. One slave corresponds to one
block at a time. After one block task is completed by a slave,
the next block corresponding to the slave will be processed
by the slave to continue the procedure. The slave processors
execute the recursive procedure to generate final results for
each block, and then send the results to the master. The recur-
sive procedure consists of several operations of fetching and
exporting, reading the necessary data from the input file(s),
and a series of steps. The master receives the results sent by
multiple slave processors and writes the results to the output
file until all blocks in the result image are finished. In these
parallel tests, the numbers of input files used in different algo-
rithms are not the same. Filtering, de-correlation stretch, and
geometric correction only need one input file (Figure 6a). The
number of input files for image fusion and
DEM
extractions is
two (Figure 6b), while the number for image mosaics can be
more than two (Figure 6c). In general, there is one output file
for these algorithms, whose form is either 3
D
points (Figure
6d) or image (Figure 6e).
This parallel strategy is similar to those used in the lit-
erature (Nicolescu and Jonker, 2002; Plaza
et al.
, 2006); the
only difference is that parallelism is applied to computation
as well as file reading in this paper, while the parallelism in
the literature (Nicolescu and Jonker, 2002; Plaza
et al.
, 2006)
is only applied to computing. The idea behind the adopted
mechanism is driven by high parallelism which is achieved
using two types of effort. First, many slaves read and process
blocks simultaneously. Therefore, the consumed processing
times of each slave are able to overlap one another. Second,
because of simultaneous execution of master and slaves,
writing by master and computing by slaves are concurrent,
which is shown in Figure 6. As a result, the computational
time overlaps the disk
I/O
time such that the entire runtime
decreases. In short, the mechanism is capable of concurrently
processing multiple blocks. The improvements on processing
performance are demonstrated in the next Section.
Another advantage is that memory consumption is small
in the adopted parallel mechanism. The parallel procedure
only reads blocks, whose number is equal to the number of
slaves, into the memory and writes a block into the result file
at a time. Compared to the situation where the entire image
is imported or exported with one step, the consumption of
memory tremendously decreases in our parallel mechanism.
Sometimes, because of the large file, it is impossible to load or
write a whole image by one step.
Parallel Performance and Analysis
Each tested algorithm coupled with the above parallel mecha-
nism is implemented using C++ programming language,
and runs on two multi-core computers with Windows
and
Linux operating systems, respectively. Several datasets are
selected to demonstrate the performance of the implemented
algorithms. Parallel performance using different numbers of
processors set by us is given in
The Performance of Paral-
lel Processing Section.
Based on the achieved performance,
factors determining parallel performance are discussed in the
Factors Determining Parallel Performance
Section.
Datasets
Preprocessing
The tested dataset for filtering is a pan-sharpened SPOT5 im-
age of which the image size is 28,820 × 28,155 pixels and the
file size is 3.02 GB. The image has four bands. The file size of
the filtered image is the same as that of the input. A hyper-
spectral image is used in the experiment for de-correlation
stretch. The dataset is described as follows: (a) Sensor: OMIS,
(b) Image size: 512 × 4,000 pixels, (c) Bands: 128, and (d) File
size: 500
MB
. A display of the images before and after de-cor-
relation stretch is shown in Plate 1.
Geometric Correction
The input dataset is as follows: (a) Radarsat-2 strip model im-
age, (b)
SLC
image with two channels, (c) Image size: 9,712 ×
11,349, (d) Data type: signed 16-bit, and (e) File size: 422
MB
.
The corrected image is a 556
MB
-sized file with two channels
corresponding to the magnitude and phase value, respec-
tively. The images before and after geometric correction are
shown in Figure 7.
Image Fusion
To show the fusion effect, the high-resolution QuickBird
images in a scene are fused by the
BR
algorithm. Information
about the images is shown in Table 1. The two segments of
the panchromatic image with 0.6-meter resolution, the mul-
tispectral image with 2.4-meter resolution; the fusion results
are shown in Plate 2.
Image Mosaics
In this experiment, nine TM images from different times
around Beijing and Tianjin, China were selected as input
images. The file sizes of each image, which contain only 3
bands, range from 177
MB
to 202
MB
. The result is shown in
Plate 3, where the file size is 1.11
GB
, and the image size is
20,678 × 19,151 pixels.
DEM Extractions
A 2.5-meter Cartosat-1 dataset provided by the program
“Benchmarking and quality analysis of
DEM
generated from
high and very high resolution optical stereo satellite data (Re-
inartz
et al.
, 2010)” organized by the ISPRS Working Group
I/4 (2008 to 2012) is used in this experiment. The selected
region is an area in Catalonia, Spain that includes medium
undulated terrain as well as steep mountainous terrain. The
stereo cover in the dataset is approximately an image size of
2,000 pixels × 2,000 pixels. The 3
D
point clouds automatical-
ly extracted by the tested algorithm are displayed in Plate 4.
The Performance of Parallel Processing
One workstation on which Windows-7
operating system
(OS) is installed is used in the experiments, and the configu-
rations of the computer are shown in Table 2. The other, a
customized workstation with Linux OS, was also used in the
experiments. The configurations of this computer are shown
in Table 3. The workstation has lower capabilities than the
above Windows OS computer.
The index used to evaluate the performance of parallel
computing, i.e., speedup
S(p)
, is used in the following experi-
ments and subsequent analysis (Wilkinson and Allen, 1999).
The index demonstrates the increase in execution speed of
the parallel vs. serial algorithms by Equation 2:
378
May 2015
PHOTOGRAMMETRIC ENGINEERING & REMOTE SENSING
339...,368,369,370,371,372,373,374,375,376,377 379,380,381,382,383,384,385,386,387,388,...422
Powered by FlippingBook