Wireless sensor network have recently been the focus of many research
efforts and emerged as an important new area in wireless technology.
As chapter 1 already discuss about the problem of data aggregation,
this chapter presents an overview of data aggregation. This chapter
first discusses in-network aggregation and then different approaches
that are widely used for data aggregation.
2.1
In-Network Aggregation
In a typical sensor network scenario, different node collect data
from the environment and then send it to some central node or sink
which analyze and process the data and then send it to the
application. But in many cases, data produced by different node can
be jointly processed while being forwarded to the sink node. So
in-network aggregation deals with this distributed processing of data
within the network.
Data aggregation techniques explore how the data is to be routed in
the network as well as the processing method that are applied on the
packets received by a node. They have a great impact on the energy
consumption of nodes and thus on network efficiency by reducing
number of transmission or length of packet. Elena Fosolo et al in [7]
defines the in-network aggregation process as follows: “In-network
aggregation is the global process of gathering and routing
information through a multi-hop network, processing data at
intermediate nodes with the objective of reducing resource
consumption (in particular energy), thereby increasing network
lifetime.”
There are two approaches for in-network aggregation: with size
reduction and without size reduction. In-network aggregation with
size reduction refers to the process of combining & compressing
the data packets received by a node from its neighbors in order to
reduce the packet length to be transmitted or forwarded towards sink.
As an example, consider the situation when a node receives two
packets which have a spatial correlated data. In this case it is
worthless to send both packets. Instead of that one should apply any
function like AVG, MAX, MIN and then send a single packet. This
approach considerably reduces the amount of bits transmitted in the
network and thus saving a lot of energy but on the other hand, it
also reduces the precision of value of data received. In-network
aggregation without size reduction refers to the process merging data
packets received from different neighbors in to a single data packet
but without processing the value of data. As an example, two packets
may contain different physical quantities (like temperature &
humidity) and they can be merged in to a single packet by keeping
both values intact but keeping a single header. This approach
preserves the value of data and thus transmit more bits in the
network but still reduce the overhead by keeping single header.
This of the two approaches to use depends on many factors like the
type of application, data rate, network characteristics and so on.
There is also a trade-off between energy consumption and precision of
data for the two approaches.
Most of the work done till now on in-network aggregation mainly deals
with problem of forwarding packets from source to sink, to facilitate
aggregation therein. Actually the main idea behind were to enhance
existing routing protocols such that they can efficiently aggregate
data. So till now, most of the data aggregation techniques fall under
three categories. They are tree-based approaches, multi-path
approaches, and cluster-based approaches. There also some hybrid
approaches that combines any of the three techniques above. So, all
the three approaches will be described in coming sections with giving
details of some of the main techniques by different authors.
2.2
Tree-Based Approach
The simplest way to aggregate data is to organize the nodes in a
hierarchical manner and then select some nodes as the aggregation
point or aggregators. The tree-based approach perform aggregation by
constructing an aggregation tree, which could be a minimum spanning
tree, rooted at sink and source nodes are considered as leaves. Each
node has a parent node to forward its data. Flow of data starts from
leaves nodes up to the sink and therein the aggregation done by
parent nodes. The way this approach operates has some drawbacks. As
we know like any wireless network the wireless sensor networks are
also not free from failures. In case of packet loss at any level of
tree, the data will be lost not only for a single level but for whole
related sub-tree as well. In spite of high cost for maintaining tree
structure in dynamic networks and scarce robustness of the system,
this approach is very much suitable for designing optimal aggregation
technique and energy-efficient techniques.
S. Madden et al. in [14] proposed a data-centric protocol which is
based on aggregation tress, known as Tiny Aggregation (TAG) approach
[14]. TAG works in two phases: distribution phase and collection
phase. In distribution phase, TAG organizes nodes in to a routing
tree rooted at sink. The tree formation starts with broadcasting a
message from sink specify level or distance from root. When a node
receive this message it sets its own level to be the level of message
plus one and elect parent as node from which it receives the message.
After that, node rebroadcast this message with its own level. This
process continues until all nodes elect their parent. After tree
formation, sink send queries along structure to all nodes in the
network. TAG uses database query language (SQL) for selection and
aggregation functions. In collection phase, data is forwarded and
aggregated from leaves nodes to root. A parent node has to wait for
data from all its child node before it can send its aggregate up the
tree. Apart from the simple aggregation function provided by SQL (eg:
COUNT, MIN, MAX, SUM, and AVG), TAG also partitions aggregates
according to the duplicate sensitivity, exemplary and summary, and
monotonic properties. Though TAG periodically refresh tree structure
of network but as most of the tree-based schemes are inefficient for
dynamic network, so TAG may be.
C. Intanagonwiwat et al. in [3] proposed a reactive data-centric
protocol for applications where sink ask some specific information by
flooding, known as directed diffusion paradigm. The main idea behind
directed diffusion paradigm is to combine data coming from different
source and en-route them by eliminating redundancy, minimizing the
number of data transmission; thus maximizing network lifetime.
Directed diffusion consists of several elements: interests, data
messages, gradients, and reinforcements.
Figure 2.1 Simplified
schematic for directed diffusion. (a) Interest propagation. (b)
Initial gradients setup. (c) Data delivery along reinforced path [3].
The base station (BS) requests data by broadcasting an interest
message which contains a description of a sensing task. This
interest message propagates through the network hop-by-hop and each
node also broadcast interest message to its neighbor. As interest
message propagates throughout the network, gradients are setup
by every node within the network. The gradient direction is set
toward the neighboring node from which the interest is received. This
process continues until gradients are setup from source node to base
station. Loops are not checked at this stage but removed at later
stage. After this path of information flow are formed and then best
path are reinforced to prevent further flooding according to a local
rule. Data aggregation took place on the way of different paths from
different sources to base station or sink. The base station
periodically refresh & resend the interest message as soon as it
start to receives data from sources to provide reliability. The
problem with directed diffusion is that it may not be applied to
applications (e.g. environmental monitoring) that require continuous
data delivery to base station. This is because query driven on demand
data model may not help in this regard. Also matching data to queries
might require some extra overhead at the sensor nodes. Mobility of
sink nodes can also degrade the performance as path from sources to
sinks cannot be updated until next interest message is flooded
throughout the network. To cope up with above issue if introduce
frequent flooding then also too much overhead of bandwidth and
battery power will be introduced. Furthermore, exploratory data
follow all possible paths in the network following gradients which
lead to unnecessary communications overhead.
M. Lee et al. in [2] proposed a new low-control-overhead data
dissemination scheme, which they called as pseudo-distance data
dissemination (PDDD), for efficiently disseminating data from all
sensor nodes to mobile sink. Some assumption have been made, they
are: (1) all source nodes maintain routes to mobile sink node, (2) no
periodically messaging for topological changes due to mobile sink
node, (3) all link are bi-directional and no control messages are
lost, (4) mobile sink nodes have unlimited battery power, so no need
to care about battery efficiency of sink node, and (5) network
partitioning is not considered. Data dissemination process is
influenced by directed diffusion [3]. Though mobile sink periodically
broadcast interest message, sensor nodes do not send exploratory data
and do not wait reinforcement message because each sensor node
already has routes to the sink node. After getting interest message,
adjacent nodes set a parent-child relationship using pseudo-distance
of each node and finally a partial ordered graph (POG) has been
build. Optimal data dissemination is achieved in terms of path length
by forwarding packets to a parent node until topology is unchanged.
Then each sensor node is assigned a level for a corresponding sink
node with pseudo-distance. In order to overcome the shortcoming of
POG, author used totally ordered graph (TOG) in place of POG. The
problem identified in this approach is that due to mobility of sink
node all sensor nodes have to maintain routes and for any change in
topology nodes have to again change route accordingly which led to
energy waste.
Marc Lee et al. in [8] proposed an energy-aware spanning tree
algorithm for data aggregation, referred as E-Span. E-Span is a
distributed protocol in which source node that has highest residual
energy is chosen as root. Other source nodes choose their parent
based on residual energy and distance to the root. The protocol uses
configuration message to exchange information of node i.e., residual
energy and distance to the root. Each node performs single-hop
broadcast operation to send packets. Single-hop broadcast refers to
the operation of sending a packet to all single-hop neighbors [8].
Figure
2.2 E-Span protocol [8].
2.3
Multi-path Approach
One of the main drawbacks of tree-based approach is the scarce
robustness of the system. To overcome this drawback, a new approach
was proposed by many researchers. Instead of sending partially
aggregated data to a single parent node in aggregation tree, a node
could send data over multiple paths. The idea behind is that each
node can send the data to its possibly multiple neighbors by
exploiting the wireless medium characteristic. Hence data will flow
from sources to sink along multiple paths and aggregation can be
performed by each intermediate node. Clearly schemes using this
approach will make the system robust but with some extra overhead.
One of the aggregation structures that fit well with this approach is
ring topology, where network is divided in to concentric circles with
defining levels according to the hop distance from sink.
S. Nath et al. in [15] presented a data aggregation technique using
multi-path approach, known as synopsis diffusion. Synopsis diffusion
works in two phases: distribution of queries and data retrieval
phase. During distribution of queries phase, a node sends a query in
the network. The network nodes then form a set of rings around the
querying node. The node which is i hop away from querying node is
considered is ring Ri. In the second phase, aggregation
starts from outermost ring and propagate level by level towards the
sink. Here a source node can have multiple paths towards sink.
Figure
2.3 Examples of aggregation paths over a ring structure [7].
L. Gatani et al. in [5] describe a new strategy for data gathering in
wireless sensor network that consider both issues: energy efficiency
and robustness. Authors first say that single path to connect each
node to the base station is simple & energy-saving approach but
expose a high risk of disconnection due to node/link failures. But
multi-path approach would require more nodes to participate with
consequent waste of energy. Authors present a clever use of
multi-path only when there is loss of packet which is implemented by
smart caching of data at sensor nodes. Authors also argue that in
many practical situation data may be gathered only from a particular
region, so they use a different approach that relies on a spanning
tree and provides alternative paths only when a malfunctioning is
detected. Algorithm adopts a tree-based approach for forwarding
packets through the network. In the ideal situation when no failures
occur, this is certainly the best choice, as the minimum number of
nodes is engaged in the transmission phase. In the presence of link
or node failures, the algorithm will discover alternative paths, so
as ensure the delivery of as many packets as possible within the time
constraints. The problem with this approach is that it may cause the
arising of hot spots and nodes along preferred paths will consume
their energy resources quickly, possibly causing disconnection in the
network.
2.4
Cluster-Based Approach
We talked about hierarchical organization of the network in
tree-based approach. Another scheme to organize the network in
hierarchical manner is cluster-based approach. In cluster-based
approach, whole network is divided in to several clusters. Each
cluster has a cluster-head which is selected among cluster members.
Cluster-heads do the role of aggregator which aggregate data received
from cluster members locally and then transmit the result to sink.
The advantages and disadvantages of the cluster-based approaches is
very much similar to tree-based approaches.
K. Dasgupta et al. in [16] proposed a maximum lifetime data
aggregation (MLDA) algorithm which finds data gathering schedule
provided location of sensors and base-station, data packet size, and
energy of each sensor. A data gathering schedule specifies how data
packet are collected from sensors and transmitted to base station for
each round. A schedule can be thought of as a collection of
aggregation trees. In [6], they proposed heuristic-greedy
clustering-based MLDA based on MLDA algorithm. In this they
partitioned the network in to cluster and referred each cluster as
super-sensor. They then compute maximum lifetime schedule for the
super-sensors and then use this schedule to construct aggregation
trees for the sensors.
W. Choi et al. in [1] present a two-phase clustering (TPC) scheme.
Phase I of this scheme creates clusters with a cluster-head and each
node within that cluster form a direct link with cluster-head. Phase
I of this scheme is similar to various scheme used for clustering but
differ in one way that the cluster-head rotation is localized and is
done based on the remaining energy level of the sensor nodes which
minimize time variance of sensors and this lead to energy saving from
unnecessary cluster-head rotation. In phase II, each node within the
cluster searches for a neighbor closer than cluster-head which is
called data relay point and setup up a data relay link. Now the
sensor nodes within a cluster either use direct link or data relay
link to send their data to cluster head which is an energy efficient
scheme. The data relay point aggregates data at forwarding time to
another data relay point or cluster-head. In case of high network
density, TPC phase II will setup unnecessary data relay link between
neighbors as closely deployed sensor will sense same data and this
lead to a waste of energy.
Figure 2.4 Illustration of
Two Phase Clustering [1].
H. Cam et al. in [4] present energy efficient and secure pattern
based data aggregation protocol which is designed for clustered
environment. In conventional method data is aggregated at
cluster-head and cluster-head eliminate redundancy by checking the
content of data. This protocol says that instead of sending raw data
to cluster-head, the cluster members send corresponding pattern codes
to cluster-head for data aggregation. If multiple nodes send the same
pattern code then only one of them is finally selected for sending
actual data to cluster-head. For pattern matching, authors present a
pattern comparison algorithm.
Figure
2.5 Data transmission using ESPDA [4].
2.5
Simulation Tools
We have plenty of simulation tools or simulators for simulating
wireless networks. The simulators which are most popular are NS-2,
OPNET, OMNet++, J-Sim, GlomoSim, Qualnet, TOSSIM and so on. Since
wireless sensor networks are special type of wireless networks, most
of the simulators available are not enough supported for simulating a
wireless sensor network scenario. The literature shows that the
simulators which are mostly used for wireless sensor network are
NS-2, J-Sim, GlomoSim, OPNET, TOSSIM, PROWLER and even MATLAB is also
used.
NS-2 is the most popular and powerful simulator. NS-2 is an
object-oriented discrete time event simulator and its modular design
made it to be extensible. The detail of NS-2 is provided in appendix
A. To simulate sensor network, there had been made attempt to put
some add-ons. The most appreciable extension to NS-2 for wireless
sensor networks was developed in 2004 by Ian Downward of Naval
Research Laboratory (NRL) [18]. In this extension, they had created
phenomenon packet which trigger event for sensor nodes. They also
designed sensor agent, sensor application and other application to
support wireless sensor network. But there has been no improvement or
work done since 2006. This is the reason why all research is going by
using NS-2 without any extension support. Other simulators like
J-Sim, GlomoSim, OPNET, OMNet++ are exceptionally used by
researchers. Another simulators which is MATLAB based, known as
PROWLER, is sometimes used for simulating routing protocol or some
MAC issues in wireless sensor network. MATLAB is also used for
simulating some physical layer issues. But still there is no
efficient simulator which is purely dedicated to wireless sensor
network. More details for comparative study on simulator for wireless
sensor network could be found in [19].