XIA Jinzong, DU Yongwen, LÜ Xiaojian, MA Ji
(1. School of Electronics & Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China;
2. China Shipbuilding Industry Corporation No.722 Institute, Wuhan 430000, China)
Abstract:
When using traditional game methods to study information security of the wireless sensor networks, players are mostly based on the assumption of completely rational. Based on bounded rationality, the evolutionary game theory is used to establish the attack-defense model, analyze the strategy selection process of players, solve the evolutionarily stable strategy and design the optimal strategy selection algorithm. Then, considering the strategy dependence, the incentive and punishment mechanism is introduced to improve the replicator dynamic equation. The simulation results show that the model is reasonable and the algorithm is effective, which provides new theoretical support for the security of wireless sensor networks.
Key words:
wireless sensor networks (WSNs);
evolutionary game;
replicators dynamic equation
Wireless sensor networks (WSNs) are commonly used in military reconnaissance, environmental monitoring, smart home and medical health because of their low power consumption, self-organization, cheap, convenient and high node redundancy. People perceive all kinds of information in the real physical world through sensors, which greatly improves human’s ability to understand the objective world[1]. However, with the diversification and complexity of network attacks, traditional measures can no longer meet the need. How to ensure network security has become an urgent problem[2-3]. Intrusion detection system (IDS)[4]is a mechanism used to detect unauthorized actions that attempt to disrupt the normal operation of networks or damage sensor nodes. It can be divided into abnormal detection and misuse detection. Misuse detection is based on rules. By comparing attack pattern with the database to judge the intrusion behavior occurs or not, the abnormal behavior of the network can be found effectively. The expansion of the database is difficult to ensure the efficiency of detection, and the implementation of how to quickly detect is a challenge[5].
Game theory is mainly used to study the decision problems when the direct interaction between decision-makers takes place and how to solve the decision equilibrium[6-8]. It has become a research hotspot by using game theory as a tool on network information security[9-11].
The related works are generally based on the traditional game theory. A static network attack-defense game model is built for analyzing security behaviors based on the opposition and dependence of the players. However, the network is dynamic, and the static game model cannot analyze the dynamic process[12]. The complete information dynamic game model of active defense is put forward, the network security active defense technology is studied, and the process of dynamic network attack-defense against is analyzed. However, the condition of complete information can’t be met in the reality, it is necessary to take information incompleteness into account in the process of the game[13-14].The non-cooperative game is used to obtain the hybrid strategy Nash Equilibrium for the detection rate of anomaly detection and misuse detection in IDS, which balances the detection rate and resource cost of WSNs[15]. The related works generally assume that the participants are completely rational. Due to the constraints of boundary rationality in the society, it is impossible for network attack-defense behaviors to be completely rational. The hypothesis is less applicable to all network attack-defense situations.
Evolutionary game[16-17]combines with the idea of biological evolution, and proposes to use the trial-and-error method to reach the game equilibrium based on the large environment of bounded rationality of players, taking groups as research objects. It has the characteristics of the traditional game, dynamic change and bounded rationality, which is more suitable for the real scene of network attack-defense. Therefore, scholars have begun to combine the evolutionary game with information security, the current research is in its infancy.
The replicator dynamic function is used to describe the network confrontation, but only two strategies are considered in the methods[18]. An evolutionary game model of the Internet of things is constructed under the condition of bounded rationality of network nodes, and the change process of node strategies is analyzed[19]. The model can provide a theoretical reference for security risk prediction and optimal defense strategy deployment of the Internet of things. Starting from the bounded rationality of networks, the non-cooperative evolutionary game is used to construct the attack-defense evolutionary game model and design the optimal defense strategy selection algorithm[20].
In the actual network attack-defense scenario, external attackers can destroy the sensor network security through packet dropping attack, witch attack, black hole attack and flood attack. IDS is detected by matching with the database. The learning mechanism in the evolutionary game is actually a way of finding the equilibrium by trial and error. In the face of a variety of attack methods and database expansion, accelerating participants’ evolutionary game learning convergence can improve the detection efficiency, reduce the energy consumption and increase the lifetime of the network. Therefore, evolutionary game, incentive and punishment mechanism are introduced based on the assumption of bounded rationality. The replicator dynamic equation is used to analyze the variation trend and stability of the strategies, and the optimal defense strategy selection algorithm is proposed. Secondly, the influence of reward and punishment factors on the convergence of strategies is explored to guide the defense of WSNs and provide reference for defense decision of network information security.
Built on the evolutionary game, an evolutionary game model is constructed for wireless sensor networks under the condition of bounded rationality.
Wireless sensor networks attack and defense evolutionary game model can be presented as a six tuple,WADEM=(N,S,P,C,B,U).
N=(ND,NA) is the player space,NDis IDS,NAis the attacker.
S=(SD,SA) is the strategy space,SDrepresents the strategy of defenders, andSArepresents the the strategy of attackers.
P=(p,q) is the game belief space, whereprepresents the probability set of attackers choosing attack strategySA, andqrepresents the probability set of defenders choosing defense strategySD.
C= (Cd,Ca) is the cost of a pure strategy for both attack-defense. Different types of strategies have different costs.
B= (Bd,Ba).Bd(SD) is the payoff of the node when strategySDdefense is successful, andBa(SA) is the payoff of the attacker whenSAstrategy is adopted.
Ω=(α,β) is the reward and punishment factor, whereαis the reward factor for the defender’s defense andβis the punishment factor for the attacker’s attack.
U=(UD,UA) is a set of payoff functions, representing the game payoff.
In WSNs attack-defense confrontation, the attackerNAand the defenderNDhave multiple strategies to choose from. SupposeSD={SD1,SD2,…,SDn},SA={SA1,SA2,…,SAm},m,n∈N+andm,n≥2 . The probability of adopting the strategy is different between the attack and defense, and the probability of adopting the strategy changes with the passage of time under the effect of the learning mechanism. Fig.1 shows the game tree of network attack-defense, in whichpiis the probability of attackers selectSAi,qjis defenders selection probability ofSDj.
When different strategies are used in the attack -defense game, a corresponding payoff will be generated. The value is commonly used in the following payoff matrix, whereAijandBijrepresent the respective payoff when attackers and defenders takeSAiandSDj, respectively.
Fig.1 Network attack-defense game tree
UDS1=p1b11+p2b12+p3b13+…+pmb1m,
UDS2=p1b21+p2b22+p3b23+…+pmb2m,
⋮
UDSn=p1bn1+p2bn2+p3bn3+…+pmbnm,
(1)
In the evolutionary game, those with lower payoff will learn and imitate the strategies chosen by those with a higher payoff.qi(t) is used to indicate that the proportion of people choosing different strategies for the optional strategySDjin the defense strategy set will change over time. Here,qi(t) represents the proportion of people choosing defense strategySDj, and satisfies ∑qi(t)=1.
For a specific defense strategySDj, the proportion of people selected for this strategy is a function of time, and its dynamic change rate can be expressed in the form of the replicator dynamic equation.
(2)
UAS1=q1a11+q2a12+q3a13+…+qma1m,
UAS2=q1a21+q2a22+q3a23+…+qma2m,
⋮
UASn=q1an1+q2an2+q3an3+…+qmanm,
(3)
The dynamic change of the number ratio of different strategies can be represented bypi(t), where ∑pi(t)=1.
The corresponding replicator dynamic equation can be obtained by usingSAias an optional strategy for attackers.
(4)
The solution process is shown as 1)-6).
1) Initialize the parameters of the model, including the number of players, the strategy space, etc.
2) Calculate the payoffCdandCawhen the attacker and defender adopt pure policy, respectively.
3) Probabilities ofpandqare established on the set of alternative strategies.
4) Calculate attackers’ replicator dynamic equation.
Choosing the attack strategy and the payoff of different strategies by using the probabilitypof attackers, the expected payoff and average payoff of different attack strategies of attackers can be calculated by Eq.(3).
Further attackers’ replicator dynamic equation can be obtained by Eq.(4).
5) Calculate defenders’ replicator dynamic equation.
Using the probabilityqof defenders to choose the defense strategy and the payoff of different strategies, the expected payoff and average payoff of different defense strategies of defenders can be calculated by Eq.(1).
6) Calculate the evolutionarily stable strategy.
According to the description, for the convenience of analysis, the optional strategy set for attackers is {SA1,SA2}, and the optional strategy set for defenders is {SD1,SD2}. 2×2 attack-defense game tree is shown as Fig.2.
Fig.2 2×2 attack-defense game tree
In Fig.2,aijandbijrepresent the attack-defense and payoff, respectively. The payoff matrix is shown in the Tables 1-2.
Table 2 Definition of parameters
(5)
(6)
(1-q)[pb12+(1-p)b22].
(7)
In the process of attack-defense in the evolutionary game, defenders will choose different defense strategies to get the different payoff. According to the replicator dynamic, the lower payoff will learn to imitate the high strategy. In the model, the two optional strategies in the set of defense strategies are {SD1,SD2}. The proportion of people choosing the two strategies changes over time and is represented byq(t) and 1-q(t), respectively.
Therefore, the rate of change of defenders can be obtained from the above description and represented by the following replicator dynamic equation.
q(1-q)(p(b11-b21-b12+b22)+b21-b22).
(8)
Similarly, the same can be done for attackers.
qa11+(1-q)a12,
(9)
qa21+(1-q)a22,
(10)
p[qa11+(1-q)a21]+(1-p)[qa21+(1-q)a22].
(11)
Therefore, attacker’s replicator dynamic equation is
p(1-p)[q(a11-a21-a12+a22)+a21-a22].
(12)
Each of these equilibrium points corresponds to an ESS of players. The two players will adjust their strategies according to the size of their own payoff. Even if the ESS is reached, the individuals in the group will still change constantly. Therefore, ESS is a dynamic process in which the internal individual changes constantly while the overall state remains unchanged.
According to the evolutionary game, a stable state must be robust to small perturbations of the system, soY1,Y2,Y3andY4are saddle points of the system, andY5is unstable point. In this paper, evolutionary stable equilibrium exists in the system. The evolutionary stable state can be obtained through game analysis, and the future changes of the strategy selection mechanism can be predicted and evaluated, to form a security defense strategy selection method and guide the defense decision.
To better describe the dynamic process, if the trajectory starting from any small neighborhood of an equilibrium point of a dynamic system eventually evolves to this equilibrium point, the equilibrium point is locally asymptotically stable, and such a dynamic equilibrium point is called ESS.
ESS can be used as a strategy to resist aggression, and is an equilibrium strategy with real stability in the evolutionary game. In this section, the ESS is analyzed in detail according to the replicator dynamics existing in both sides of network attack-defense.
According to the theory of ESS, the group in the ESS will not be affected by the individual mutation. The group will eventually reach the stable state of the evolutionary game again by constantly correcting such minor changes. That is, if an individual in the population chooses another strategy due to an accidental mutation, the entire population can also repair this state by replicating the dynamic learning mechanism and making the population stable to ESS. Mathematically, it is the stability principle of differential equations. When the accidental mutation in the population makes the value ofxhigher than ESS, dx/dtshould be less than 0. Conversely, whenxis less than ESS, dx/dtshould be greater than 0.
Fig.3 Defender phase
Fig.4 Defender phase
Fig.5 Defender phase
Fig.6 Attacker phase
Fig.7 Attacker phase
Fig.8 Attacker phase
The basic idea of obtaining the optimal defense strategy is based on the game tree. Both sides of players have their own set of alternative strategies and different probability. Based on the evolutionary game, replicator dynamic is established. To study the convergence rate of learning mechanism, the influence of the reward and punishment factors set up on the convergence of optimal strategy is studied. And a selection algorithm of optimal defense strategy in evolutionary game is designed.
InputWADEM modelOutputOptimal defense strategy1Initialize the attacker’s strategy space SA2Initialize the defender’s strategy space SD3The attacker chooses the attack strategy SA1 with probability p4The defender chooses the defend strategy SD1 with probability q5Establish reward and punishment factors (α,β)6Set up an attacker replicator dynamic equation A(p,q)7Set up an defender replicator dynamic equation D(q,p)8t=09For the evolutionary game is not over, do10 pt+1=A(pt,qt)11 qt+1=D(pt,qt)12 t++13 Outputs the defender’s strategy set14end
The time complexity of the algorithm isO((m+n)2). At the same time, the space consumption of this algorithm is mainly focused on the payoff and the intermediate value storage of equilibrium solution, whose space complexity isO(mn). The algorithm can not only get the returns of each strategy, but also get the change of the convergence speed of strategy selection through the relationship between the dynamics of replication factors and factors. It provides guidance for the defense analysis of WSNs.
During the experiments, the strategies are generated by multiple atomic strategies, namelySAi={a1,a2,a3, …,ak},SDj={b1,b2,b3,…,bl} referring to China’s national information security vulnerability database (CNIVD) information, relevant WSNs atomic attack strategies and corresponding defense strategies are selected in this experiment, as shown in Tables 3-4.
Table 3 Description of atomic attack strategy
Table 4 Description of atomic defense strategy
5.1 Experiments environment
In order to verify the model and equilibrium solution method, the simulationis carried out by using Matlab2018b. Experiments are divided into two parts. The first part verifies the ESS under actual conditions. The second part explores the influence of rewards and punishment factors on the selection of ESS. Build the strategy setSA={SA1,SA2},SD={SD1,SD2}.
5.2 Evolution of attack and defense
Exp.1:
When the initial states areY1,Y2,Y3andY4, both parties adopt pure strategies for confrontation (see Fig.9). After evolution, the strategy choice of both parties remains unchanged, because the node strategies are the same and there is no other strategy to learn and imitate.
(a) Initial state of Y1
Exp.2:
When all malicious attackers adopt the same attack strategy, WSNs nodes adjust the strategy through self-learning, and finally reach a stable state. As shown in Fig.10, if the initial state of malicious attackers chooseSA1probabilityp=1, after the evolution, the nodes adopt defense strategySD1finally.
Similarly, when attackers chooseSA1with probabilityp=0 (that is, the attacks of the attacker to choosesSA2strategy), through evolution, IDS will adopt defensive strategySD2, it can achieve the goal of minimizing risk, in line with the actual.
(a) Attacker chooses strategy A1
Exp.3:
When all defenders adopt the same defense strategy, attackers learn to adjust the strategy by themselves, and finally reach the optimal stable state. As shown in Fig.11, if WSNs chooseSD1with probabilityq=1 in the initial state, after evolution, all external malicious attackers finally adopt the attack strategy AS1. Similarly, when choosingSD1with probabilityq=0 (that is, the attacker chooses the defense strategy ofSD2), after evolution, attackers always adopt the attack strategySA2,consistent with the analysis above.
(a) Defender chooses strategy D1
Exp.4:
WhenY5=[0.666 7, 0.571 4]Tis selected as the initial state, the strategies of both sides remain unchanged along with the evolution. When the initial state deviations from equilibrium, the two sides will be close to the evolutionary stable equilibrium solution in the short term but not stable. With the increase of tights, strategy choice is cyclical fluctuations, as shown in Figs.12-13, so both attack and defense strategy choice can be thought of as long as the deviation is not balanced. The strategy choice is not stable, so the equilibrium strategy risk coefficient is larger.
Exp.5:
When the initial selection strategy of both parties is not fixed, their strategy changes as shown in the Figs.14-17. The system is always in an unstable state and fluctuates periodically over time. According to the above analysis, the replication dynamic relationship diagram shown in Fig.18 is obtained. When the strategies of both sides change, there is no stable state for both sides, so that the defender can constantly adjust their strategies to match the defensive measures.
Fig.12 Initial state is Y5
Fig.13 Initial state is not Y5
Fig.14 Initial state is in region A (p=0.4, q=0.6)
Fig.15 Initial state is in region B (p=0.7, q=0.6)
Fig.16 Initial state is in region C (p=0.7, q=0.4)
Fig.17 Initial state is in region D (p=0.3, q=0.2)
Fig.18 Change of replicator dynamic
5.3 Effect of rewards and punishment factors on replicator dynamic function
Exp.6:
The addition of the incentive and punishment factors accelerates the learning rate of defenders to the optimal strategy while slows the learning rate of attackers, as shown in Figs.19-20. When attackers choose a strategySA1orSA2, defenders use the same strategy to successfully defend, which will accelerate the defender’s learning from the strategy. Similarly, when defenders use the attack strategySD1orSD2, the attacker’s payoff will decrease with the defense success, so the attacker’s learning rate from the optimal strategy will also slow down, and the defender’s learning rate is positively correlated with the reward and punishment factors.
(a) Defender chooses D1
(a) Attacker chooses A1
Exp.7:
Under the same initial conditions, the addition of reward and punishment factors will affect the replicator dynamic convergence volatility of both sides. As shown in Figs.21-23, when the initial conditions are located at the point nearY5and areaA, respectively, the fluctuation range of the convergence of attacker’s strategy will increase significantly, while the fluctuation range of the convergence range of the defender’s strategy will change little. Results show that defenders can reduce the selection range of the optimal defense strategy and deal with a variety of attack strategies. The factors can accelerate the selection speed of the defender’s optimal defense strategy and narrow the selection range of the hybrid strategy.
Fig.21 Effect on replication dynamics, initial value is near Y5
Fig.22 Effect on attack strategy
Fig.23 Effect on defense strategy
Based on the bounded rationality of both players in the process of misuse detection in the intrusion detection system of wireless sensor networks, the evolutionary game is introduced to describe the strategy change process. In view of the diversification of attack patterns, an evolutionary game model of attack-defense is built, and the evolution trend of pure and mixed strategies are analyzed by using the replicator dynamic equation. A WSNs optimal defense strategy selection algorithm based on evolutionary game is proposed. According to the evolutionary mechanism and the dependence relationship between the strategies in the actual test, the incentive and punishment factors are used to describe the change of the payoff after game and the change of learning rate of the players to the optimal strategy. On the one hand, the incentive and punishment factors accelerate the learning speed of defenders to the optimal strategy. On the other hand, it intervenes and reduces the evolution speed of attackers to the optimal strategy. Moreover, factors accelerate the selection rate of the optimal defense strategy, reduce the selection scope of the optimal defense strategy, and promote the system to reach a stable state more quickly, which contributes to the defense efficiency and stability of the WSNs. Experiments show the rationality of the evolutionary game model and the effectiveness of the mechanism on learning convergence, which is of great significance for the research of WSNs.
However, there are still some differences between the attack-defense game model and the actual scene, so it will be a new challenge for the follow-up research to establish a game model more in line with the actual situation of wireless sensor networks.
推荐访问:based security evolutionary