7.7 示例:检测针对Apache的DDoS攻击

DDoS攻击通常会使企业的互联网业务造成巨大损失——业务中断几个小时甚至几天。这次我们基于KDD 99的样本数据,尝试使用NB算法识别针对Apache的DDoS攻击(见图7-5)。对KDD 99数据的详细介绍请参考第3章的相关内容。完整演示代码请见本书GitHub上的7-5.py。

图7-5 针对Apache的DDoS攻击的数据处理流程

1.数据搜集和数据清洗

KDD 99数据已经完成了大部分的数据清洗工作,KDD99数据集中每个连接用41个特征来描述:


0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,normal.
0,udp,private,SF,105,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,254,1.00,0.01,0.00,0.00,0.00,0.00,0.00,0.00,snmpgetattack.

其中和DDoS相关的特征主要为:

·网络连接基本特征,见表7-1。

·基于时间的网络流量统计特征,见表7-2。

表7-1 KDD 99与DDoS相关的网络连接基本特征

表7-2 KDD 99与DDoS相关的基于时间的网络流量统计特征

·基于主机的网络流量统计特征,见表7-3。

表7-3 KDD 99与DDoS相关的基于主机的网络流量统计特征

加载KDD 99数据集中的数据:


def load_kdd99(filename):
    x=[]
    with open(filename) as f:
        for line in f:
            line=line.strip('\n')
            line=line.split(',')
            x.append(line)
    return x

筛选标记为apache2和normal且是http协议的数据:


if ( x1[41] in ['apache2.','normal.'] ) and ( x1[2] == 'http' ):
    if x1[41] == 'apache2.':
        y.append(1)
    else:
        y.append(0)

2.特征化

挑选与DDoS相关的特征作为样本特征:


x1 = [x1[0]] + x1[4:8]+x1[22:30]+x1[31:40]
v.append(x1)
for x1 in v :
    v1=[]
    for x2 in x1:
        v1.append(float(x2))
    w.append(v1)

3.训练样本

实例化NB算法:


clf = GaussianNB()

4.效果验证

我们使用十折交叉验证:


print  cross_validation.cross_val_score(clf, x, y, n_jobs=-1,cv=10)

测试结果如下,准确率99%左右,相当不错。


[ 0.99925094  0.99875156  0.99950062  0.99950062  0.996004    0.9995005
  0.997003    0.98975768  0.99975019  0.99925056]