Data Aggregation in Cluster-based Wireless Sensor Networks: February 2016

Tuesday, February 23, 2016

REFERENCES

[1] W. Choi, P. Shah, and S. K. Das, “A Framework for Energy-Saving Data Gathering Using Two-Phase Clustering in Wireless Sensor Networks”, in Proceedings of the International Conference on Mobile and Ubiquitous Systems: Networking and Services (MobiQuitous), Boston, 2004, pp. 203-212.

[2] M. Lee, and S. Lee, “Data Dissemination for Wireless Sensor Networks”, in Proceedings of the 10^th IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing (ISORC’07).

[3] C. Intanagonwiwat, R. Govindan, D. Estrin, J. Heidemann, and F. Silva, “Directed Diffusion for Wireless Sensor Networking”, IEEE/ACM Transactions on Networking, Vol. 11, no. 1, Feb 2003.

[4] H. Cam, S. Ozdemir, P. Nair, and D. Muthuavinashiappan, “ESPDA: Energy-Efficient and Secure Pattern-based Data Aggregation for Wireless Sensor Networks”, in Proceedings of IEEE Sensor- The Second IEEE Conference on Sensors, Toronto, Canada, Oct. 22-24, 2003, pp. 732-736.

[5] L. Gatani, G. Lo Re, and M. Ortolani, “Robust and Efficient Data Gathering for Wireless Sensor Networks”, in Proceeding of the 39th Hawaii International Conference on System Sciences – 2006.

[6] K. Dasgupta, K. Kalpakis, and P. Namjoshi, “An Efficient Clustering-based Heuristic for Data Gathering and Aggregation in Sensor Networks”, IEEE 2003.

[7] E. Fasolo, M. Rossi, J. Widmer, and M. Zorzi, “In-Network Aggregation Techniques for Wireless Sensor Networks: A Survey”, IEEE Wireless communication 2007.

[8] M. Lee and V.W.S. Wong, “An Energy-aware Spanning Tree Algorithm for Data Aggregation in Wireless Sensor Networks,” IEEE PacRrim 2005, Victoria, BC, Canada, Aug. 2005.

[9] M. Ding, X. Cheng, and G. Xue, “Aggregation tree construction in sensor networks,” in Proc. of IEEE VTC’03, Vol. 4, Orlando, FL, Oct. 2003.

[10] “The Design Space of Wireless Sensor Networks” by Kay R¨omer and Friedemann Mattern

http://www.vs.inf.ethz.ch/publ/papers/wsn-designspace.pdf

[11] http://en.wikipedia.org/wiki/Wireless_Sensor_Networks

[12] “Handbook of Sensor Networks: Compact Wireless and Wired Sensing Systems” by Mohammad Ilyas and Imad Mahgoub.

[13] “Ad-hoc & Sensor Networks” by Carlos de Morais Cordeiro and Dharma Prakash Agrawal.

[14] S. Madden et al., “TAG: a Tiny Aggregation Service for Ad-hoc Sensor Networks,” OSDI 2002, Boston, MA, Dec. 2002.

[15] S. Nath et al., “Synopsis Diffusion for Robust Aggregation in Sensor Networks,” ACM SenSys 2004, Baltimore, MD, Nov. 2004.

[16] K. Dasgupta et al., “Maximum Lifetime Data Gathering and Aggregation in Wireless Sensor Networks”, In Proc. of IEEE Networks’02 Conference, 2002.

[17] W. Heinzelman, A.P. Chandrakasan, and H. Balakrishnan, “Energy-Efficient Communication Protocols for Wireless Microsensor Networks”, In Proc. Of Hawaiian Intl. Conference on System Science, 2000.

[18] NRL’s Sensor Network Extension to ns-2. http://nrlsensorsim.pf.itd.nrl.navy.mil/

[19] David curren, “A survey of Simulation in Sensor Networks”. www.cs.binghamton.edu/~kang/teaching/cs580s/david.pdf

[20] The Network Simulator NS-2, http://www.isi.edu/nsnam/ns/ , January 2008.

Network Simulator-2

The Network Simulator version 2 (NS-2) is a deterministic discrete event network simulator, initiated at the Lawrence Berkeley National Laboratory (LBNL) through the DARPA funded Virtual InterNetwork Testbed (VINT) project. The VINT project is collaboration between the Information Sciences Institute (ISI) at the University of Southern California (USC), Xerox's Palo Alto Research Center (Xerox PARC), University of California at Berkeley (UCB) and LBNL [20].

NS-2 was initially created in 1989 as an alternative to the REAL Network Simulator. Since then there is significant growth in uses and width of NS project. Although there are several different network simulators available today, ns-2 is one of the most common. NS-2 differs from most of the others by being open source software, supplying the source code for free to anyone that wants it. Whereas most commercial network simulators will offer support and a guarantee but keeping the moneymaking source code for themselves. The release of the source code helps users to create their own functions and subprograms, but also makes it easier to implement them into the ns-2 environment. One of the main benefits for the ns project group releasing the source code is that independent researchers can help in the development of ns-2. It is fairly common that a researcher contributes with the code of a non-implemented protocol or algorithm, after constructing it for his studies.

It should be noted that NS-2 is a research progressive effort and not a kind of commercial software release. The difference is that there are very few people in the ns project group compared to ordinary software, leading to difficulties in supporting all the users. That problem has lead to the solution of having a huge mailing list (http://mailman.isi.edu/mailman/listinfo/ns-users) for anyone interested, as well as a complete archive of all the mails ever been sent to this mailing list. The mailing list is based on the idea of user helping user, taking the load of the ns project group. The mailing list and the archives are a huge help for all users of ns-2, no matter old or new, since usually someone else has had the same problem before.

Another important thing to note is that NS-2 is an ongoing progressive project and hence can not be considered as a complete product. This is the reason why it is free of cost and only offers the mailing lists as support. The people that are in charge of the project heavily rely on the users to find bugs and faults and reporting these when discovered. This also leaves the validating of results to the user, but the user is not alone so help is just an email away. The most commonly used protocols are so well implemented and checked so the main concerns are the new implementations. New implementations usually start out as a research assignment not linked to the ns project group. Since the project group does not have a full company helping them in verification and implementation they have no possibility to do everything themselves thus encouraging any help they can get.

A 1 The NS-2 structure

NS-2 is made up of hundreds of smaller programs, separated to help the user sort through and find what he or she is looking for. Every separate protocol, as well as variations of the same, sometimes has separate files. Though some are simple, but still dependent on the parental class [20].

Figure A.1 The basic structure of NS-2

A 1.1 C++

C++ is the predominant programming language in ns-2. It is the language used for all the small programs that make up the ns-2 hierarchy. C++, being one of the most common programming languages and specially designed for object- oriented coding, was therefore a logical choice what language to be used. This helps when the user wants to either understand the code or do some alterations to the code. There are several books about C++ and hundreds, if not thousands, of pages on the Internet about C++ simplifying the search for help or answers concerning the ns-2 code.

A 1.2 OTcl

Object Tcl (OTcl) is object-oriented version of the command and syntax driven programming language Tool Command Language (Tcl). This is the second of the two programming languages that NS-2 uses. The front-end interpreter in NS-2 is OTcl which link the script type language of Tcl to the C++ backbone of NS-2. Together these two different languages create a script controlled C++ environment. This helps when creating a simulation, simply writing a script that will be carried out when running the simulation. These scripts will be the formula for a simulation and is needed for setting the specifications of the simulation itself. Without a script properly defining a network topology as well as the data-rows, both type and location, nothing will happen. For a more in depth presentation of these scripts, it is better to have a closer look at the introduction and related chapters in the NS-2 manual.

A 1.3 Nodes

A node is exactly what it sounds like, a node in the network. A node can be either an end connection or an intermediate point in the network. All agents and links must be connected to a node to work. There are also different kinds of nodes based on the kind of network that is to be simulated. The main types are node and mobile node, where node is used in most wired networks and the mobile node for wireless networks. There are several different commands for setting the node protocols to be used, for instance what kind of routing is to be used or if there is a desire to specify a route that differs from the shortest one. Most of the commands for node and mobile node can be easily found in the ns documentation. Nodes and the closely connected link creating commands, like simplex link and duplex link, could be considered to simulate the behavior of both the Link Layer.

A 1.4 Agents

An agent is the collective name for most of the protocols you can find in the transport layer. In the ns-2 documentation agents are defined as the endpoints where packets are created and consumed. All the agents defined in ns-2, like tcp, udp etc., are all connected to their parent class, simply called Agent. This is where their general behavior is set and the offspring classes are mostly based on some alterations to the inherent functions in the parent class. The modified functions will overwrite the old and thereby change the performance in order to simulate the wanted protocol.

A 1.5 Applications

The applications in ns-2 are related to the Application Layer in the TCP/IP suite. The hierarchy here works in the similar way as in the agents case. To simulate some of the most important higher functions in network communication, the ns-2 applications are used. Since the purpose of ns-2 is not to simulate software, the applications only represent some different aspects of the higher functions. Only a few of the higher layer protocols has been implemented, since some are quite similar when it comes to using the lower functions of the TCP/IP stack. For instance there is no use adding both a SMTP and a HTTP application since they both use TCP to transfer small amounts of data in a similar way. The only applications incorporated in the release version of ns-2 are a number of different traffic generators for use with UDP and telnet and FTP for using TCP. All the applications are script controlled and when concerning the traffic generators, you set the interval and packet-size of the traffic. FTP can be requested to send a data packet whenever the user wants to, or to start a transfer of a file of an arbitrary size. If starting an FTP transmission and not setting a file-size the transmission will go on until someone calls a stop.

A 1.6 NAM

The Network Animator NAM is a graphic tool to use with ns-2. It requires a nam-tracefile recorded during the simulation and will then show a visual representation of the simulation. This will give the user the possibility to view the traffic packet by packet as they move along the different links in the network. NAM offers the possibility of tracing a single packet during its travel and the possibility to move the nodes around for a user to draw up his network topology according to his own wishes. Since the simulation has already been performed there is no possibility for the user to change the links or any other aspect of the simulation except the representation. The existence of an X-server allows NAM to be able to open a graphical window. Therefore if NAM is to work, there must be a version of X-server running.

Chapter 4. Performance Analysis

In previous chapter we have explained our proposed protocol for data aggregation in cluster-based wireless sensor networks, called E-BINA. Now in this chapter we are going to analyses the performance of our protocol. We will show how our protocol outperforms in terms of energy efficiency in comparison to conventional protocol for data aggregation in cluster-based wireless sensor networks by presenting simulation analysis with different simulation parameters taken and results obtained.

4.1 Simulation Analysis

Though many simulation tools are available for wireless sensor networks as discussed in chapter 2, we have chosen Network Simualtor-2 (NS-2) [20], in particular NS-2.29.3, as our tool to simulate the proposed protocol.

4.1.1 Simulation Setup

A square field of 160m X 160m is taken where 11 nodes are randomly deployed. One node is designated as cluster-head (CH) and one node is designated data source.

Command:

set val(sc) "/root/Desktop/mov1"

set val(cp) ""

#setdest is found in "/root/ns-allinone-2.29/ns-2.29/indep-utils/cmu-#scen-gen/setdest"

exec ./setdest -n 10 -M 0.1 -p 21 -x 160 -y 160 -t 20 > mov1 &

puts "loadin RANDom sceanrio"

source $val(sc)

The snapshot of node scenario in NAM is shown below. The three colors of node show the energy levels of nodes. The initial color of nodes is green. When energy drop to first threshold level the color turns to yellow and when drop to second threshold level the color turns to red. After this level a node is dead and color is red.

Figure 4.1 Random nodes scenario in NAM.

Energy model is ON. Transmit power, Receive power, Idle power, Sleep power, Transition power, and Initial Energy of nodes is set accordingly. Also transmission range is set by controlling the transmit power and receiving threshold of antenna of nodes. All other parameters are taken default values.

Command:

set val(energymodel) EnergyModel ;# energy model is on

set val(initialenergy) 1 ;# Initial energy in Joules

set val(sleeppower) 0.0 ;# sleep power in Watt

set val(tp) 0.002 ;# transition power consumption(Watt)in state transition from sleep to idle (active)

set val(tt) 0.005 ;# transition time(second) use instate transition from sleep to idle (active)

set val(ip) 0.035 ;# idle power

Transmit power, Receive power, transmit power & receiving threshold of antenna of nodes are set with different values to use different transmission range of nodes and show the comparison between E-BINA and conventional protocol.

Case 1: E-BINA

set val(rxPower) 0.395 ;# receive power in Watt

set val(txPower) 0.66 ;# transmit power in Watt

Phy/WirelessPhy set Pt_ 8.5872e-4 ;# 40m

Phy/WirelessPhy set RXThresh_ 3.66152e-10

Case 2: Conventional protocol

set val(rxPower) 1.0 ;# receieve power in Watt

set val(txPower) 2.0 ;# transmit power in Watt

Phy/WirelessPhy set Pt_ 7.214e-3 ;# 100m

Phy/WirelessPhy set RXThresh_ 3.65209e-10

Other values that could be used are:

#Phy/WirelessPhy set Pt_ 1.33826e-3 ;# Transmission range 50m,

#Phy/WirelessPhy set Pt_ 0.281838 ;# 250m

The value of RXThresh_ is obtained by executing threshold.cc defined in "/root/ns-allinone-2.29/ns-2.29/indep-utils/propogation”. Snapshot is given below:

Figure 4.2 Snapshot of console

Some other parameters used are:

Channel Type  Wireless channel

Propagation Model  Two Ray Ground

MAC Type  802.11

Network Interface Type  Phy/WirelessPhy

Interface Queue Type  Queue/DropTail/PriQueue

Antenna Model  Antenna/OmniAntenna

Routing Protocol  AODV

Simulation Time  20 sec

Parameters set for data transfer are:

Cluster-head  node 10 with UDP agent attached

Source node  node 0 with UDP agent attached

Traffic Type  CBR with a rate of 5 packets / second

Packet Size  136 bytes

4.1.2 Simulation Run

Case 1: E-BINA

For E-BINA we set transmission range of 40m such that a node sends its data to its single-hop neighbor and data is forwarded in a multi-hop fashion. Figure 4.3 shows the data transfer between node 0 and node 10. Node 1 and 2 are relay nodes. Since the transmission range is set to 40m, node 0 can only send its data to node 1 and so other nodes.

Figure 4.3 Data transfer between node 0 and 10 through node 1 & 2.

Figure 4.4 shows that after some time energy level of node 2, 1, 0, and 10 dropped to first threshold level and hence the color of nodes turns to yellow.

Figure 4.4 Energy level of nodes drop to first threshold.

Figure 4.5 shows that after some more time energy level of node 1, 2, 0, and 10 dropped to second threshold level and hence the color of nodes turns to red.

Figure 4.5 Energy level of nodes drop to second threshold.

Case 2: Conventional Protocol

In conventional method, all nodes in a cluster send their data directly to cluster-head. For this reason we set transmission range of nodes to be 100m so that source node 0 can send data directly to node 10. To able to have a large transmission range the transmitting and receiving power of nodes are more than double of as in case 1.

The data transfer start between node 0 and node 10 directly. As happened in last case, again after some time energy level of node 0 and node 10 decreased to first threshold level and color of nodes change from green to yellow and then energy level of both nodes go down to second threshold level and nodes turn to red.

4.1.3 Simulation Results

A. Conserving Energy

We determine residual energy of the source node which is defined as the remaining energy of a node and considered that as the metric to prove energy efficiency of our proposed protocol. We used this metric to show the impact of transmission power on energy reduction. Figure 4.9 shows the significant reduction in energy consumption by using E-BINA when compared with conventional protocol. This shows the benefit of sending data in a multi-hop fashion towards cluster-head.

Figure 4.6 Residual energy of source as a function of time.

B. Throughput

We have also measured the throughput of the receiving node i.e. cluster-head node 10 in our scenario for both the cases. Throughput of a node is defined as the average rate of successful message delivery over a communication channel. Figure 4.10 show that E-BINA achieves high throughput in comparison with conventional protocol.

Figure 4.7 Throughput as function of time.

C. Network Density

By considering the changes in the network density, we also study the relationship between the network lifetime and network density. In our experiment we have considered the change in the residual energy of source node i.e. node 0 in the end of simulation. The density of network is calculated via equation [9]:

λ = NπR² / A²

Where, N is sensor number,

R is sensor range,

A is sensor area.

By keeping network area constant and increasing the number of nodes, we have increased network density. Due to increase in the network density, the hop count between source node and sink node also increases. When hop count increases node now transmit data to nearer node with less transmit power and hence consume less energy. Figure 4.8 shows the increase in the residual energy when we increase the hop count. We have taken N as 11, 21, 31, 41, and 51.

Figure 4.8 Effect of network density

D. Packet Delivery Ratio

Besides examining the network lifetime extension roughly via energy saving, we also evaluate the network efficiency influenced by E-BINA. Here, we measure the efficiency in term of data delivery ratio which is defined as the number of received packets divided by the number of sent packets for a certain time period. From our simulation results illustrated in figure 4.9, we find that this ratio does not change much while the network is alive. It shows the stable performance of our protocol. When the network energy is running out, the data delivery ratio collapses rapidly. This phenomenon probably can be taken as a sign of the network death.

Figure 4.9 Packet delivery ratio of the network.

Tuesday, February 9, 2016

Chapter 3. Energy-aware Balanced In-Network Aggregation

After the detailed description of wireless sensor networks, problem of data aggregation for maximizing network lifetime, and existing aggregation techniques in chapter 1 and 2, we now present our proposed protocol for solving data aggregation problem. This chapter first gives you an introduction of our protocol “Energy-aware Balanced In-Network Aggregation (E-BINA)” and then briefly describes the system and energy model that has been considered for designed protocol. After this a detailed description of the algorithm will be presented. As there are some issues involved with every protocol, so with our proposed protocol also. These issues will be presented in brief and finally we summarize this chapter.

3.1 Introduction

The three broad categories of data aggregation techniques that we have described in previous chapter are: tree-based approach, multi-path approach, and cluster-based approach. Focusing on cluster-based approach, we found that the existing protocols assume a sensor network which is divided in to several clusters. Depending upon the protocol operation, each cluster-head receives the data packets from some cluster-member or from all cluster-member nodes directly and then cluster-head perform aggregation operation.

Taking the advantageous features of tree-based approach, we have designed our protocol which takes the merits of both cluster-based and tree-based approach. E-BINA assumes a cluster-based wireless sensor network and applies tree-based approach inside each cluster. When a cluster is formed and cluster-head selected, it consider cluster-head as root and construct an aggregation tree over cluster-member nodes. The process of aggregation tree construction requires the sensor nodes to reduce their transmission power as sensor nodes now have to send their data packets to the neighbor node which is selected as parent. Energy consumed in wireless transmission is directly proportional to the square of the distance between nodes in communication [13]. Since cluster-member node now sends their data packets to the neighbor node instead of cluster-head, the transmission distance is reduced and hence the energy consumption of the sensor node. Likewise, overall energy consumption of sensor nodes in a cluster is reduced and so for the whole sensor network. Hence overall network lifetime will be increased.

Energy-aware Balanced In-Network Aggregation (E-BINA) protocol is energy-aware as it has taken the residual energy of sensor node in to consideration while constructing the aggregation tree. The protocol also balances the network load by selecting different parent for a node according to the energy level remain in the sensor node during the aggregation tree construction process. Each parent node performs aggregation of data packet that it receives from its child nodes and hence the protocol justifies the in-network aggregation concept.

3.2 System & Energy Model

Consider a homogeneous network of n sensor nodes and a base station or sink node distributed over a region. The location of the sensors and the base station are fixed and known priori. Each sensor produces some information as it monitors its vicinity. We assume that the whole network is divided in to several clusters; each cluster has a cluster-head (CH). The clustering and the selection of cluster-head (CH) can be done by using any existing protocol like LEACH, such that cluster-head (CH) is maximum k-hop away from any node in cluster. We also assume that after the formation of cluster the transmission power of all nodes is adjusted in such a way that they can perform single hop broadcast. Single hop broadcast refers to the operation of sending a packet to all single-hop neighbors [8].

Our energy model for the sensors is based on the first order radio model described in [17]. A sensor consumes Eelec = 50nJ/bit to run the transmitter or receiver circuitry and Eamp = 100pJ/bit/m² for the transmitter amplifier. Thus, the energy consumed by a sensor i in receiving a l-bit data packet is given by,

E_Rxi = E_elec . l (1)

while the energy consumed in transmitting a data packet to sensor j is given by,

E_Txi,j = E_elec . l + E_amp . d_i,j² . l (2)

where di,j is the distance between nodes i and j.

3.3 Protocol Description

In a cluster-based wireless sensor network, our algorithm is designed to provide energy-aware in-network data aggregation in a cluster. Each cluster uses this algorithm independently. In a cluster, the nodes can be categorized as: one cluster-head (CH) and other cluster member node.

Function of cluster-head (CH)

Receive a query from base station.
Cluster-head (CH) sends configuration packets to all single-hop neighbors.
Receive data packets from all single hop neighbors.
Finally aggregate the data packets received and route it to base station.

Function of cluster member

Receive configuration packets from neighbor nodes.
Update and forward configuration packets to all single-hop neighbors.
Receive data packets from neighbor nodes.
Aggregate all data packets by applying redundancy factor and send it to selected parent node.

The algorithm works in two phases: Configuration packet flow and Data packet flow that are described below.

Figure 3.1 A typical scenario of data aggregation in a cluster.

3.3.1 Configuration Packet Flow

Initially cluster-head broadcast configuration packet to all its neighbors. Configuration packet contains the following fields:

Node Id  location of node that each node know in prior

Hop Distance  distance from cluster-head in terms of hop count (set zero for CH)

Residual Energy  current energy in node

Each node upon receiving the broadcast configuration packet that is originated from cluster-head adds the sender of the packet in the list of its possible parents with its node id, hop distance, residual energy. After this the node again broadcast the configuration packet to all its neighbors by updating node id to its own id, incrementing hop distance by one and its own residual energy. This process continues until all the nodes in cluster receive configuration packet. All nodes that broadcast the configuration packet do so by predefined and common signal strength that is know to all the nodes.

3.3.2 Data Packet Flow

When all nodes receives configuration packets, each node now select the parent to which it can forward the data packet. The parent selection procedure is shown in fig. 3.2.

Each node looks in to the list of all its possible parents. The neighbor node which has least hop distance, ie closest to cluster-head, is selected as parent by a node. In case when two neighbor nodes have the least but equal hop distance, the node checks the residual energy of two neighbor nodes. The neighbor node that has greater residual energy is now selected as parent. In both the cases, node also calculate the difference of residual energy of two neighbor nodes, which have least hop distance, and checks whether this difference is less than the threshold or not. If it is then the node selects the parent as usual. But if it is not then the node selects other neighbor node as its parent.

Figure 3.2 Parent selection procedure.

This allows a node that has more available resources to be selected as a parent node. This also balances the consumption of energy of nodes in the cluster and leads to die out of nodes nearly at same time.

After selecting the parent node, each node now forwards its data to its parent. When a parent node receives multiple data packets from its neighbor nodes, it performs aggregation operation by eliminating redundancy in the data. Each parent node checks the equation below:

| V_Ni– V_Nj | < K (3)

where, V_Ni data value of node i

V_Nj  data value of node j

K  redundancy factor

If this equation satisfies, the parent node perform aggregation by applying any aggregation functions like MIN, MAX, and AVG on the values of data packet and send only one packet while discarding other packets. But if this equation do not satisfies, the parent performs aggregation by simply concatenating two data packet in to one keeping value of both packets intact.

The selection of value for redundancy factor (K) has a trade-off between precision and energy consumption. If the application wants more precision, it should select a low value for redundancy factor otherwise a high value. Selecting high value for K means sending only one value thus less number of bits needs to be transmitted and hence low energy consumption.

3.4 Issues

E-BINA significantly reduces the energy consumption of all nodes in the cluster by reducing the transmission power of all nodes. The one issue that arises in our designed protocol is that after the formation of cluster and selection of cluster-head, all nodes have to reduce their transmission power. All nodes have to reduce their transmission power in such a way that they could only reach their single-hop distance neighbors. This operation requires some kind of synchronization among all nodes. The nodes have to program before to perform the above task. For this, the programming task needs little extra effort. Now when cluster-head received all data packets and aggregated them, it has to now increase its transmission power so that it can transmit the final aggregated data up in the cluster-head hierarchy towards the sink. Another issue that remains with any tree based approach is robustness of the system. In case of failure of any intermediate node in the tree hierarchy during operation will lead to the loss of data.

Though E-BINA requires all nodes to adjust their transmission power after the deployment and requires extra effort for programming before, it conserves a significant amount of energy. So in the presence of the above issue, E-BINA outperforms when we try to maximize the network lifetime.

3.5 Summary

E-BINA takes the advantageous features of cluster-based and tree-based approaches.
Instead of sending data directly to cluster-head, nodes form an aggregation tree in each cluster.
In aggregation tree, the selection of parent is based on two factors: hop distance and residual energy of node.
Energy model is based on first order radio model.
Protocol requires all nodes to adjust their transmission power.
Protocol defines two types of packets: configuration packet and data packet.
Each node eliminates redundancy in the data by satisfying equation (3).
In-network aggregation is performed during data flow towards cluster-head.
Issue: Adjustment of transmission power after deployment requires extra programming effort.

Tuesday, February 2, 2016

Chapter 2. Data Aggregation : An Overview

Wireless sensor network have recently been the focus of many research efforts and emerged as an important new area in wireless technology. As chapter 1 already discuss about the problem of data aggregation, this chapter presents an overview of data aggregation. This chapter first discusses in-network aggregation and then different approaches that are widely used for data aggregation.

2.1 In-Network Aggregation

In a typical sensor network scenario, different node collect data from the environment and then send it to some central node or sink which analyze and process the data and then send it to the application. But in many cases, data produced by different node can be jointly processed while being forwarded to the sink node. So in-network aggregation deals with this distributed processing of data within the network.

Data aggregation techniques explore how the data is to be routed in the network as well as the processing method that are applied on the packets received by a node. They have a great impact on the energy consumption of nodes and thus on network efficiency by reducing number of transmission or length of packet. Elena Fosolo et al in [7] defines the in-network aggregation process as follows: “In-network aggregation is the global process of gathering and routing information through a multi-hop network, processing data at intermediate nodes with the objective of reducing resource consumption (in particular energy), thereby increasing network lifetime.”

There are two approaches for in-network aggregation: with size reduction and without size reduction. In-network aggregation with size reduction refers to the process of combining & compressing the data packets received by a node from its neighbors in order to reduce the packet length to be transmitted or forwarded towards sink. As an example, consider the situation when a node receives two packets which have a spatial correlated data. In this case it is worthless to send both packets. Instead of that one should apply any function like AVG, MAX, MIN and then send a single packet. This approach considerably reduces the amount of bits transmitted in the network and thus saving a lot of energy but on the other hand, it also reduces the precision of value of data received. In-network aggregation without size reduction refers to the process merging data packets received from different neighbors in to a single data packet but without processing the value of data. As an example, two packets may contain different physical quantities (like temperature & humidity) and they can be merged in to a single packet by keeping both values intact but keeping a single header. This approach preserves the value of data and thus transmit more bits in the network but still reduce the overhead by keeping single header.

This of the two approaches to use depends on many factors like the type of application, data rate, network characteristics and so on. There is also a trade-off between energy consumption and precision of data for the two approaches.

Most of the work done till now on in-network aggregation mainly deals with problem of forwarding packets from source to sink, to facilitate aggregation therein. Actually the main idea behind were to enhance existing routing protocols such that they can efficiently aggregate data. So till now, most of the data aggregation techniques fall under three categories. They are tree-based approaches, multi-path approaches, and cluster-based approaches. There also some hybrid approaches that combines any of the three techniques above. So, all the three approaches will be described in coming sections with giving details of some of the main techniques by different authors.

2.2 Tree-Based Approach

The simplest way to aggregate data is to organize the nodes in a hierarchical manner and then select some nodes as the aggregation point or aggregators. The tree-based approach perform aggregation by constructing an aggregation tree, which could be a minimum spanning tree, rooted at sink and source nodes are considered as leaves. Each node has a parent node to forward its data. Flow of data starts from leaves nodes up to the sink and therein the aggregation done by parent nodes. The way this approach operates has some drawbacks. As we know like any wireless network the wireless sensor networks are also not free from failures. In case of packet loss at any level of tree, the data will be lost not only for a single level but for whole related sub-tree as well. In spite of high cost for maintaining tree structure in dynamic networks and scarce robustness of the system, this approach is very much suitable for designing optimal aggregation technique and energy-efficient techniques.

S. Madden et al. in [14] proposed a data-centric protocol which is based on aggregation tress, known as Tiny Aggregation (TAG) approach [14]. TAG works in two phases: distribution phase and collection phase. In distribution phase, TAG organizes nodes in to a routing tree rooted at sink. The tree formation starts with broadcasting a message from sink specify level or distance from root. When a node receive this message it sets its own level to be the level of message plus one and elect parent as node from which it receives the message. After that, node rebroadcast this message with its own level. This process continues until all nodes elect their parent. After tree formation, sink send queries along structure to all nodes in the network. TAG uses database query language (SQL) for selection and aggregation functions. In collection phase, data is forwarded and aggregated from leaves nodes to root. A parent node has to wait for data from all its child node before it can send its aggregate up the tree. Apart from the simple aggregation function provided by SQL (eg: COUNT, MIN, MAX, SUM, and AVG), TAG also partitions aggregates according to the duplicate sensitivity, exemplary and summary, and monotonic properties. Though TAG periodically refresh tree structure of network but as most of the tree-based schemes are inefficient for dynamic network, so TAG may be.

C. Intanagonwiwat et al. in [3] proposed a reactive data-centric protocol for applications where sink ask some specific information by flooding, known as directed diffusion paradigm. The main idea behind directed diffusion paradigm is to combine data coming from different source and en-route them by eliminating redundancy, minimizing the number of data transmission; thus maximizing network lifetime. Directed diffusion consists of several elements: interests, data messages, gradients, and reinforcements.

Figure 2.1 Simplified schematic for directed diffusion. (a) Interest propagation. (b) Initial gradients setup. (c) Data delivery along reinforced path [3].

The base station (BS) requests data by broadcasting an interest message which contains a description of a sensing task. This interest message propagates through the network hop-by-hop and each node also broadcast interest message to its neighbor. As interest message propagates throughout the network, gradients are setup by every node within the network. The gradient direction is set toward the neighboring node from which the interest is received. This process continues until gradients are setup from source node to base station. Loops are not checked at this stage but removed at later stage. After this path of information flow are formed and then best path are reinforced to prevent further flooding according to a local rule. Data aggregation took place on the way of different paths from different sources to base station or sink. The base station periodically refresh & resend the interest message as soon as it start to receives data from sources to provide reliability. The problem with directed diffusion is that it may not be applied to applications (e.g. environmental monitoring) that require continuous data delivery to base station. This is because query driven on demand data model may not help in this regard. Also matching data to queries might require some extra overhead at the sensor nodes. Mobility of sink nodes can also degrade the performance as path from sources to sinks cannot be updated until next interest message is flooded throughout the network. To cope up with above issue if introduce frequent flooding then also too much overhead of bandwidth and battery power will be introduced. Furthermore, exploratory data follow all possible paths in the network following gradients which lead to unnecessary communications overhead.

M. Lee et al. in [2] proposed a new low-control-overhead data dissemination scheme, which they called as pseudo-distance data dissemination (PDDD), for efficiently disseminating data from all sensor nodes to mobile sink. Some assumption have been made, they are: (1) all source nodes maintain routes to mobile sink node, (2) no periodically messaging for topological changes due to mobile sink node, (3) all link are bi-directional and no control messages are lost, (4) mobile sink nodes have unlimited battery power, so no need to care about battery efficiency of sink node, and (5) network partitioning is not considered. Data dissemination process is influenced by directed diffusion [3]. Though mobile sink periodically broadcast interest message, sensor nodes do not send exploratory data and do not wait reinforcement message because each sensor node already has routes to the sink node. After getting interest message, adjacent nodes set a parent-child relationship using pseudo-distance of each node and finally a partial ordered graph (POG) has been build. Optimal data dissemination is achieved in terms of path length by forwarding packets to a parent node until topology is unchanged. Then each sensor node is assigned a level for a corresponding sink node with pseudo-distance. In order to overcome the shortcoming of POG, author used totally ordered graph (TOG) in place of POG. The problem identified in this approach is that due to mobility of sink node all sensor nodes have to maintain routes and for any change in topology nodes have to again change route accordingly which led to energy waste.

Marc Lee et al. in [8] proposed an energy-aware spanning tree algorithm for data aggregation, referred as E-Span. E-Span is a distributed protocol in which source node that has highest residual energy is chosen as root. Other source nodes choose their parent based on residual energy and distance to the root. The protocol uses configuration message to exchange information of node i.e., residual energy and distance to the root. Each node performs single-hop broadcast operation to send packets. Single-hop broadcast refers to the operation of sending a packet to all single-hop neighbors [8].

Figure 2.2 E-Span protocol [8].

2.3 Multi-path Approach

One of the main drawbacks of tree-based approach is the scarce robustness of the system. To overcome this drawback, a new approach was proposed by many researchers. Instead of sending partially aggregated data to a single parent node in aggregation tree, a node could send data over multiple paths. The idea behind is that each node can send the data to its possibly multiple neighbors by exploiting the wireless medium characteristic. Hence data will flow from sources to sink along multiple paths and aggregation can be performed by each intermediate node. Clearly schemes using this approach will make the system robust but with some extra overhead. One of the aggregation structures that fit well with this approach is ring topology, where network is divided in to concentric circles with defining levels according to the hop distance from sink.

S. Nath et al. in [15] presented a data aggregation technique using multi-path approach, known as synopsis diffusion. Synopsis diffusion works in two phases: distribution of queries and data retrieval phase. During distribution of queries phase, a node sends a query in the network. The network nodes then form a set of rings around the querying node. The node which is i hop away from querying node is considered is ring R_i. In the second phase, aggregation starts from outermost ring and propagate level by level towards the sink. Here a source node can have multiple paths towards sink.

Figure 2.3 Examples of aggregation paths over a ring structure [7].

L. Gatani et al. in [5] describe a new strategy for data gathering in wireless sensor network that consider both issues: energy efficiency and robustness. Authors first say that single path to connect each node to the base station is simple & energy-saving approach but expose a high risk of disconnection due to node/link failures. But multi-path approach would require more nodes to participate with consequent waste of energy. Authors present a clever use of multi-path only when there is loss of packet which is implemented by smart caching of data at sensor nodes. Authors also argue that in many practical situation data may be gathered only from a particular region, so they use a different approach that relies on a spanning tree and provides alternative paths only when a malfunctioning is detected. Algorithm adopts a tree-based approach for forwarding packets through the network. In the ideal situation when no failures occur, this is certainly the best choice, as the minimum number of nodes is engaged in the transmission phase. In the presence of link or node failures, the algorithm will discover alternative paths, so as ensure the delivery of as many packets as possible within the time constraints. The problem with this approach is that it may cause the arising of hot spots and nodes along preferred paths will consume their energy resources quickly, possibly causing disconnection in the network.

2.4 Cluster-Based Approach

We talked about hierarchical organization of the network in tree-based approach. Another scheme to organize the network in hierarchical manner is cluster-based approach. In cluster-based approach, whole network is divided in to several clusters. Each cluster has a cluster-head which is selected among cluster members. Cluster-heads do the role of aggregator which aggregate data received from cluster members locally and then transmit the result to sink. The advantages and disadvantages of the cluster-based approaches is very much similar to tree-based approaches.

K. Dasgupta et al. in [16] proposed a maximum lifetime data aggregation (MLDA) algorithm which finds data gathering schedule provided location of sensors and base-station, data packet size, and energy of each sensor. A data gathering schedule specifies how data packet are collected from sensors and transmitted to base station for each round. A schedule can be thought of as a collection of aggregation trees. In [6], they proposed heuristic-greedy clustering-based MLDA based on MLDA algorithm. In this they partitioned the network in to cluster and referred each cluster as super-sensor. They then compute maximum lifetime schedule for the super-sensors and then use this schedule to construct aggregation trees for the sensors.

W. Choi et al. in [1] present a two-phase clustering (TPC) scheme. Phase I of this scheme creates clusters with a cluster-head and each node within that cluster form a direct link with cluster-head. Phase I of this scheme is similar to various scheme used for clustering but differ in one way that the cluster-head rotation is localized and is done based on the remaining energy level of the sensor nodes which minimize time variance of sensors and this lead to energy saving from unnecessary cluster-head rotation. In phase II, each node within the cluster searches for a neighbor closer than cluster-head which is called data relay point and setup up a data relay link. Now the sensor nodes within a cluster either use direct link or data relay link to send their data to cluster head which is an energy efficient scheme. The data relay point aggregates data at forwarding time to another data relay point or cluster-head. In case of high network density, TPC phase II will setup unnecessary data relay link between neighbors as closely deployed sensor will sense same data and this lead to a waste of energy.

Figure 2.4 Illustration of Two Phase Clustering [1].

H. Cam et al. in [4] present energy efficient and secure pattern based data aggregation protocol which is designed for clustered environment. In conventional method data is aggregated at cluster-head and cluster-head eliminate redundancy by checking the content of data. This protocol says that instead of sending raw data to cluster-head, the cluster members send corresponding pattern codes to cluster-head for data aggregation. If multiple nodes send the same pattern code then only one of them is finally selected for sending actual data to cluster-head. For pattern matching, authors present a pattern comparison algorithm.

Figure 2.5 Data transmission using ESPDA [4].

2.5 Simulation Tools

We have plenty of simulation tools or simulators for simulating wireless networks. The simulators which are most popular are NS-2, OPNET, OMNet++, J-Sim, GlomoSim, Qualnet, TOSSIM and so on. Since wireless sensor networks are special type of wireless networks, most of the simulators available are not enough supported for simulating a wireless sensor network scenario. The literature shows that the simulators which are mostly used for wireless sensor network are NS-2, J-Sim, GlomoSim, OPNET, TOSSIM, PROWLER and even MATLAB is also used.

NS-2 is the most popular and powerful simulator. NS-2 is an object-oriented discrete time event simulator and its modular design made it to be extensible. The detail of NS-2 is provided in appendix A. To simulate sensor network, there had been made attempt to put some add-ons. The most appreciable extension to NS-2 for wireless sensor networks was developed in 2004 by Ian Downward of Naval Research Laboratory (NRL) [18]. In this extension, they had created phenomenon packet which trigger event for sensor nodes. They also designed sensor agent, sensor application and other application to support wireless sensor network. But there has been no improvement or work done since 2006. This is the reason why all research is going by using NS-2 without any extension support. Other simulators like J-Sim, GlomoSim, OPNET, OMNet++ are exceptionally used by researchers. Another simulators which is MATLAB based, known as PROWLER, is sometimes used for simulating routing protocol or some MAC issues in wireless sensor network. MATLAB is also used for simulating some physical layer issues. But still there is no efficient simulator which is purely dedicated to wireless sensor network. More details for comparative study on simulator for wireless sensor network could be found in [19].