Data Aggregation in Cluster-based Wireless Sensor Networks

Tuesday, February 9, 2016

Chapter 3. Energy-aware Balanced In-Network Aggregation

After the detailed description of wireless sensor networks, problem of data aggregation for maximizing network lifetime, and existing aggregation techniques in chapter 1 and 2, we now present our proposed protocol for solving data aggregation problem. This chapter first gives you an introduction of our protocol “Energy-aware Balanced In-Network Aggregation (E-BINA)” and then briefly describes the system and energy model that has been considered for designed protocol. After this a detailed description of the algorithm will be presented. As there are some issues involved with every protocol, so with our proposed protocol also. These issues will be presented in brief and finally we summarize this chapter.

3.1 Introduction

The three broad categories of data aggregation techniques that we have described in previous chapter are: tree-based approach, multi-path approach, and cluster-based approach. Focusing on cluster-based approach, we found that the existing protocols assume a sensor network which is divided in to several clusters. Depending upon the protocol operation, each cluster-head receives the data packets from some cluster-member or from all cluster-member nodes directly and then cluster-head perform aggregation operation.

Taking the advantageous features of tree-based approach, we have designed our protocol which takes the merits of both cluster-based and tree-based approach. E-BINA assumes a cluster-based wireless sensor network and applies tree-based approach inside each cluster. When a cluster is formed and cluster-head selected, it consider cluster-head as root and construct an aggregation tree over cluster-member nodes. The process of aggregation tree construction requires the sensor nodes to reduce their transmission power as sensor nodes now have to send their data packets to the neighbor node which is selected as parent. Energy consumed in wireless transmission is directly proportional to the square of the distance between nodes in communication [13]. Since cluster-member node now sends their data packets to the neighbor node instead of cluster-head, the transmission distance is reduced and hence the energy consumption of the sensor node. Likewise, overall energy consumption of sensor nodes in a cluster is reduced and so for the whole sensor network. Hence overall network lifetime will be increased.

Energy-aware Balanced In-Network Aggregation (E-BINA) protocol is energy-aware as it has taken the residual energy of sensor node in to consideration while constructing the aggregation tree. The protocol also balances the network load by selecting different parent for a node according to the energy level remain in the sensor node during the aggregation tree construction process. Each parent node performs aggregation of data packet that it receives from its child nodes and hence the protocol justifies the in-network aggregation concept.

3.2 System & Energy Model

Consider a homogeneous network of n sensor nodes and a base station or sink node distributed over a region. The location of the sensors and the base station are fixed and known priori. Each sensor produces some information as it monitors its vicinity. We assume that the whole network is divided in to several clusters; each cluster has a cluster-head (CH). The clustering and the selection of cluster-head (CH) can be done by using any existing protocol like LEACH, such that cluster-head (CH) is maximum k-hop away from any node in cluster. We also assume that after the formation of cluster the transmission power of all nodes is adjusted in such a way that they can perform single hop broadcast. Single hop broadcast refers to the operation of sending a packet to all single-hop neighbors [8].

Our energy model for the sensors is based on the first order radio model described in [17]. A sensor consumes Eelec = 50nJ/bit to run the transmitter or receiver circuitry and Eamp = 100pJ/bit/m² for the transmitter amplifier. Thus, the energy consumed by a sensor i in receiving a l-bit data packet is given by,

E_Rxi = E_elec . l (1)

while the energy consumed in transmitting a data packet to sensor j is given by,

E_Txi,j = E_elec . l + E_amp . d_i,j² . l (2)

where di,j is the distance between nodes i and j.

3.3 Protocol Description

In a cluster-based wireless sensor network, our algorithm is designed to provide energy-aware in-network data aggregation in a cluster. Each cluster uses this algorithm independently. In a cluster, the nodes can be categorized as: one cluster-head (CH) and other cluster member node.

Function of cluster-head (CH)

Receive a query from base station.
Cluster-head (CH) sends configuration packets to all single-hop neighbors.
Receive data packets from all single hop neighbors.
Finally aggregate the data packets received and route it to base station.

Function of cluster member

Receive configuration packets from neighbor nodes.
Update and forward configuration packets to all single-hop neighbors.
Receive data packets from neighbor nodes.
Aggregate all data packets by applying redundancy factor and send it to selected parent node.

The algorithm works in two phases: Configuration packet flow and Data packet flow that are described below.

Figure 3.1 A typical scenario of data aggregation in a cluster.

3.3.1 Configuration Packet Flow

Initially cluster-head broadcast configuration packet to all its neighbors. Configuration packet contains the following fields:

Node Id  location of node that each node know in prior

Hop Distance  distance from cluster-head in terms of hop count (set zero for CH)

Residual Energy  current energy in node

Each node upon receiving the broadcast configuration packet that is originated from cluster-head adds the sender of the packet in the list of its possible parents with its node id, hop distance, residual energy. After this the node again broadcast the configuration packet to all its neighbors by updating node id to its own id, incrementing hop distance by one and its own residual energy. This process continues until all the nodes in cluster receive configuration packet. All nodes that broadcast the configuration packet do so by predefined and common signal strength that is know to all the nodes.

3.3.2 Data Packet Flow

When all nodes receives configuration packets, each node now select the parent to which it can forward the data packet. The parent selection procedure is shown in fig. 3.2.

Each node looks in to the list of all its possible parents. The neighbor node which has least hop distance, ie closest to cluster-head, is selected as parent by a node. In case when two neighbor nodes have the least but equal hop distance, the node checks the residual energy of two neighbor nodes. The neighbor node that has greater residual energy is now selected as parent. In both the cases, node also calculate the difference of residual energy of two neighbor nodes, which have least hop distance, and checks whether this difference is less than the threshold or not. If it is then the node selects the parent as usual. But if it is not then the node selects other neighbor node as its parent.

Figure 3.2 Parent selection procedure.

This allows a node that has more available resources to be selected as a parent node. This also balances the consumption of energy of nodes in the cluster and leads to die out of nodes nearly at same time.

After selecting the parent node, each node now forwards its data to its parent. When a parent node receives multiple data packets from its neighbor nodes, it performs aggregation operation by eliminating redundancy in the data. Each parent node checks the equation below:

| V_Ni– V_Nj | < K (3)

where, V_Ni data value of node i

V_Nj  data value of node j

K  redundancy factor

If this equation satisfies, the parent node perform aggregation by applying any aggregation functions like MIN, MAX, and AVG on the values of data packet and send only one packet while discarding other packets. But if this equation do not satisfies, the parent performs aggregation by simply concatenating two data packet in to one keeping value of both packets intact.

The selection of value for redundancy factor (K) has a trade-off between precision and energy consumption. If the application wants more precision, it should select a low value for redundancy factor otherwise a high value. Selecting high value for K means sending only one value thus less number of bits needs to be transmitted and hence low energy consumption.

3.4 Issues

E-BINA significantly reduces the energy consumption of all nodes in the cluster by reducing the transmission power of all nodes. The one issue that arises in our designed protocol is that after the formation of cluster and selection of cluster-head, all nodes have to reduce their transmission power. All nodes have to reduce their transmission power in such a way that they could only reach their single-hop distance neighbors. This operation requires some kind of synchronization among all nodes. The nodes have to program before to perform the above task. For this, the programming task needs little extra effort. Now when cluster-head received all data packets and aggregated them, it has to now increase its transmission power so that it can transmit the final aggregated data up in the cluster-head hierarchy towards the sink. Another issue that remains with any tree based approach is robustness of the system. In case of failure of any intermediate node in the tree hierarchy during operation will lead to the loss of data.

Though E-BINA requires all nodes to adjust their transmission power after the deployment and requires extra effort for programming before, it conserves a significant amount of energy. So in the presence of the above issue, E-BINA outperforms when we try to maximize the network lifetime.

3.5 Summary

E-BINA takes the advantageous features of cluster-based and tree-based approaches.
Instead of sending data directly to cluster-head, nodes form an aggregation tree in each cluster.
In aggregation tree, the selection of parent is based on two factors: hop distance and residual energy of node.
Energy model is based on first order radio model.
Protocol requires all nodes to adjust their transmission power.
Protocol defines two types of packets: configuration packet and data packet.
Each node eliminates redundancy in the data by satisfying equation (3).
In-network aggregation is performed during data flow towards cluster-head.
Issue: Adjustment of transmission power after deployment requires extra programming effort.

Tuesday, February 2, 2016

Chapter 2. Data Aggregation : An Overview

Wireless sensor network have recently been the focus of many research efforts and emerged as an important new area in wireless technology. As chapter 1 already discuss about the problem of data aggregation, this chapter presents an overview of data aggregation. This chapter first discusses in-network aggregation and then different approaches that are widely used for data aggregation.

2.1 In-Network Aggregation

In a typical sensor network scenario, different node collect data from the environment and then send it to some central node or sink which analyze and process the data and then send it to the application. But in many cases, data produced by different node can be jointly processed while being forwarded to the sink node. So in-network aggregation deals with this distributed processing of data within the network.

Data aggregation techniques explore how the data is to be routed in the network as well as the processing method that are applied on the packets received by a node. They have a great impact on the energy consumption of nodes and thus on network efficiency by reducing number of transmission or length of packet. Elena Fosolo et al in [7] defines the in-network aggregation process as follows: “In-network aggregation is the global process of gathering and routing information through a multi-hop network, processing data at intermediate nodes with the objective of reducing resource consumption (in particular energy), thereby increasing network lifetime.”

There are two approaches for in-network aggregation: with size reduction and without size reduction. In-network aggregation with size reduction refers to the process of combining & compressing the data packets received by a node from its neighbors in order to reduce the packet length to be transmitted or forwarded towards sink. As an example, consider the situation when a node receives two packets which have a spatial correlated data. In this case it is worthless to send both packets. Instead of that one should apply any function like AVG, MAX, MIN and then send a single packet. This approach considerably reduces the amount of bits transmitted in the network and thus saving a lot of energy but on the other hand, it also reduces the precision of value of data received. In-network aggregation without size reduction refers to the process merging data packets received from different neighbors in to a single data packet but without processing the value of data. As an example, two packets may contain different physical quantities (like temperature & humidity) and they can be merged in to a single packet by keeping both values intact but keeping a single header. This approach preserves the value of data and thus transmit more bits in the network but still reduce the overhead by keeping single header.

This of the two approaches to use depends on many factors like the type of application, data rate, network characteristics and so on. There is also a trade-off between energy consumption and precision of data for the two approaches.

Most of the work done till now on in-network aggregation mainly deals with problem of forwarding packets from source to sink, to facilitate aggregation therein. Actually the main idea behind were to enhance existing routing protocols such that they can efficiently aggregate data. So till now, most of the data aggregation techniques fall under three categories. They are tree-based approaches, multi-path approaches, and cluster-based approaches. There also some hybrid approaches that combines any of the three techniques above. So, all the three approaches will be described in coming sections with giving details of some of the main techniques by different authors.

2.2 Tree-Based Approach

The simplest way to aggregate data is to organize the nodes in a hierarchical manner and then select some nodes as the aggregation point or aggregators. The tree-based approach perform aggregation by constructing an aggregation tree, which could be a minimum spanning tree, rooted at sink and source nodes are considered as leaves. Each node has a parent node to forward its data. Flow of data starts from leaves nodes up to the sink and therein the aggregation done by parent nodes. The way this approach operates has some drawbacks. As we know like any wireless network the wireless sensor networks are also not free from failures. In case of packet loss at any level of tree, the data will be lost not only for a single level but for whole related sub-tree as well. In spite of high cost for maintaining tree structure in dynamic networks and scarce robustness of the system, this approach is very much suitable for designing optimal aggregation technique and energy-efficient techniques.

S. Madden et al. in [14] proposed a data-centric protocol which is based on aggregation tress, known as Tiny Aggregation (TAG) approach [14]. TAG works in two phases: distribution phase and collection phase. In distribution phase, TAG organizes nodes in to a routing tree rooted at sink. The tree formation starts with broadcasting a message from sink specify level or distance from root. When a node receive this message it sets its own level to be the level of message plus one and elect parent as node from which it receives the message. After that, node rebroadcast this message with its own level. This process continues until all nodes elect their parent. After tree formation, sink send queries along structure to all nodes in the network. TAG uses database query language (SQL) for selection and aggregation functions. In collection phase, data is forwarded and aggregated from leaves nodes to root. A parent node has to wait for data from all its child node before it can send its aggregate up the tree. Apart from the simple aggregation function provided by SQL (eg: COUNT, MIN, MAX, SUM, and AVG), TAG also partitions aggregates according to the duplicate sensitivity, exemplary and summary, and monotonic properties. Though TAG periodically refresh tree structure of network but as most of the tree-based schemes are inefficient for dynamic network, so TAG may be.

C. Intanagonwiwat et al. in [3] proposed a reactive data-centric protocol for applications where sink ask some specific information by flooding, known as directed diffusion paradigm. The main idea behind directed diffusion paradigm is to combine data coming from different source and en-route them by eliminating redundancy, minimizing the number of data transmission; thus maximizing network lifetime. Directed diffusion consists of several elements: interests, data messages, gradients, and reinforcements.

Figure 2.1 Simplified schematic for directed diffusion. (a) Interest propagation. (b) Initial gradients setup. (c) Data delivery along reinforced path [3].

The base station (BS) requests data by broadcasting an interest message which contains a description of a sensing task. This interest message propagates through the network hop-by-hop and each node also broadcast interest message to its neighbor. As interest message propagates throughout the network, gradients are setup by every node within the network. The gradient direction is set toward the neighboring node from which the interest is received. This process continues until gradients are setup from source node to base station. Loops are not checked at this stage but removed at later stage. After this path of information flow are formed and then best path are reinforced to prevent further flooding according to a local rule. Data aggregation took place on the way of different paths from different sources to base station or sink. The base station periodically refresh & resend the interest message as soon as it start to receives data from sources to provide reliability. The problem with directed diffusion is that it may not be applied to applications (e.g. environmental monitoring) that require continuous data delivery to base station. This is because query driven on demand data model may not help in this regard. Also matching data to queries might require some extra overhead at the sensor nodes. Mobility of sink nodes can also degrade the performance as path from sources to sinks cannot be updated until next interest message is flooded throughout the network. To cope up with above issue if introduce frequent flooding then also too much overhead of bandwidth and battery power will be introduced. Furthermore, exploratory data follow all possible paths in the network following gradients which lead to unnecessary communications overhead.

M. Lee et al. in [2] proposed a new low-control-overhead data dissemination scheme, which they called as pseudo-distance data dissemination (PDDD), for efficiently disseminating data from all sensor nodes to mobile sink. Some assumption have been made, they are: (1) all source nodes maintain routes to mobile sink node, (2) no periodically messaging for topological changes due to mobile sink node, (3) all link are bi-directional and no control messages are lost, (4) mobile sink nodes have unlimited battery power, so no need to care about battery efficiency of sink node, and (5) network partitioning is not considered. Data dissemination process is influenced by directed diffusion [3]. Though mobile sink periodically broadcast interest message, sensor nodes do not send exploratory data and do not wait reinforcement message because each sensor node already has routes to the sink node. After getting interest message, adjacent nodes set a parent-child relationship using pseudo-distance of each node and finally a partial ordered graph (POG) has been build. Optimal data dissemination is achieved in terms of path length by forwarding packets to a parent node until topology is unchanged. Then each sensor node is assigned a level for a corresponding sink node with pseudo-distance. In order to overcome the shortcoming of POG, author used totally ordered graph (TOG) in place of POG. The problem identified in this approach is that due to mobility of sink node all sensor nodes have to maintain routes and for any change in topology nodes have to again change route accordingly which led to energy waste.

Marc Lee et al. in [8] proposed an energy-aware spanning tree algorithm for data aggregation, referred as E-Span. E-Span is a distributed protocol in which source node that has highest residual energy is chosen as root. Other source nodes choose their parent based on residual energy and distance to the root. The protocol uses configuration message to exchange information of node i.e., residual energy and distance to the root. Each node performs single-hop broadcast operation to send packets. Single-hop broadcast refers to the operation of sending a packet to all single-hop neighbors [8].

Figure 2.2 E-Span protocol [8].

2.3 Multi-path Approach

One of the main drawbacks of tree-based approach is the scarce robustness of the system. To overcome this drawback, a new approach was proposed by many researchers. Instead of sending partially aggregated data to a single parent node in aggregation tree, a node could send data over multiple paths. The idea behind is that each node can send the data to its possibly multiple neighbors by exploiting the wireless medium characteristic. Hence data will flow from sources to sink along multiple paths and aggregation can be performed by each intermediate node. Clearly schemes using this approach will make the system robust but with some extra overhead. One of the aggregation structures that fit well with this approach is ring topology, where network is divided in to concentric circles with defining levels according to the hop distance from sink.

S. Nath et al. in [15] presented a data aggregation technique using multi-path approach, known as synopsis diffusion. Synopsis diffusion works in two phases: distribution of queries and data retrieval phase. During distribution of queries phase, a node sends a query in the network. The network nodes then form a set of rings around the querying node. The node which is i hop away from querying node is considered is ring R_i. In the second phase, aggregation starts from outermost ring and propagate level by level towards the sink. Here a source node can have multiple paths towards sink.

Figure 2.3 Examples of aggregation paths over a ring structure [7].

L. Gatani et al. in [5] describe a new strategy for data gathering in wireless sensor network that consider both issues: energy efficiency and robustness. Authors first say that single path to connect each node to the base station is simple & energy-saving approach but expose a high risk of disconnection due to node/link failures. But multi-path approach would require more nodes to participate with consequent waste of energy. Authors present a clever use of multi-path only when there is loss of packet which is implemented by smart caching of data at sensor nodes. Authors also argue that in many practical situation data may be gathered only from a particular region, so they use a different approach that relies on a spanning tree and provides alternative paths only when a malfunctioning is detected. Algorithm adopts a tree-based approach for forwarding packets through the network. In the ideal situation when no failures occur, this is certainly the best choice, as the minimum number of nodes is engaged in the transmission phase. In the presence of link or node failures, the algorithm will discover alternative paths, so as ensure the delivery of as many packets as possible within the time constraints. The problem with this approach is that it may cause the arising of hot spots and nodes along preferred paths will consume their energy resources quickly, possibly causing disconnection in the network.

2.4 Cluster-Based Approach

We talked about hierarchical organization of the network in tree-based approach. Another scheme to organize the network in hierarchical manner is cluster-based approach. In cluster-based approach, whole network is divided in to several clusters. Each cluster has a cluster-head which is selected among cluster members. Cluster-heads do the role of aggregator which aggregate data received from cluster members locally and then transmit the result to sink. The advantages and disadvantages of the cluster-based approaches is very much similar to tree-based approaches.

K. Dasgupta et al. in [16] proposed a maximum lifetime data aggregation (MLDA) algorithm which finds data gathering schedule provided location of sensors and base-station, data packet size, and energy of each sensor. A data gathering schedule specifies how data packet are collected from sensors and transmitted to base station for each round. A schedule can be thought of as a collection of aggregation trees. In [6], they proposed heuristic-greedy clustering-based MLDA based on MLDA algorithm. In this they partitioned the network in to cluster and referred each cluster as super-sensor. They then compute maximum lifetime schedule for the super-sensors and then use this schedule to construct aggregation trees for the sensors.

W. Choi et al. in [1] present a two-phase clustering (TPC) scheme. Phase I of this scheme creates clusters with a cluster-head and each node within that cluster form a direct link with cluster-head. Phase I of this scheme is similar to various scheme used for clustering but differ in one way that the cluster-head rotation is localized and is done based on the remaining energy level of the sensor nodes which minimize time variance of sensors and this lead to energy saving from unnecessary cluster-head rotation. In phase II, each node within the cluster searches for a neighbor closer than cluster-head which is called data relay point and setup up a data relay link. Now the sensor nodes within a cluster either use direct link or data relay link to send their data to cluster head which is an energy efficient scheme. The data relay point aggregates data at forwarding time to another data relay point or cluster-head. In case of high network density, TPC phase II will setup unnecessary data relay link between neighbors as closely deployed sensor will sense same data and this lead to a waste of energy.

Figure 2.4 Illustration of Two Phase Clustering [1].

H. Cam et al. in [4] present energy efficient and secure pattern based data aggregation protocol which is designed for clustered environment. In conventional method data is aggregated at cluster-head and cluster-head eliminate redundancy by checking the content of data. This protocol says that instead of sending raw data to cluster-head, the cluster members send corresponding pattern codes to cluster-head for data aggregation. If multiple nodes send the same pattern code then only one of them is finally selected for sending actual data to cluster-head. For pattern matching, authors present a pattern comparison algorithm.

Figure 2.5 Data transmission using ESPDA [4].

2.5 Simulation Tools

We have plenty of simulation tools or simulators for simulating wireless networks. The simulators which are most popular are NS-2, OPNET, OMNet++, J-Sim, GlomoSim, Qualnet, TOSSIM and so on. Since wireless sensor networks are special type of wireless networks, most of the simulators available are not enough supported for simulating a wireless sensor network scenario. The literature shows that the simulators which are mostly used for wireless sensor network are NS-2, J-Sim, GlomoSim, OPNET, TOSSIM, PROWLER and even MATLAB is also used.

NS-2 is the most popular and powerful simulator. NS-2 is an object-oriented discrete time event simulator and its modular design made it to be extensible. The detail of NS-2 is provided in appendix A. To simulate sensor network, there had been made attempt to put some add-ons. The most appreciable extension to NS-2 for wireless sensor networks was developed in 2004 by Ian Downward of Naval Research Laboratory (NRL) [18]. In this extension, they had created phenomenon packet which trigger event for sensor nodes. They also designed sensor agent, sensor application and other application to support wireless sensor network. But there has been no improvement or work done since 2006. This is the reason why all research is going by using NS-2 without any extension support. Other simulators like J-Sim, GlomoSim, OPNET, OMNet++ are exceptionally used by researchers. Another simulators which is MATLAB based, known as PROWLER, is sometimes used for simulating routing protocol or some MAC issues in wireless sensor network. MATLAB is also used for simulating some physical layer issues. But still there is no efficient simulator which is purely dedicated to wireless sensor network. More details for comparative study on simulator for wireless sensor network could be found in [19].

Tuesday, February 9, 2016

Chapter 3. Energy-aware Balanced In-Network Aggregation

Tuesday, February 2, 2016

Chapter 2. Data Aggregation : An Overview

Search This Blog