Rootkit是一种特殊的恶意软件,它的功能是在安装目标上隐藏自身及指定的文件、进程和网络链接等信息,比较常见的是Rootkit,一般都和木马、后门等其他恶意程序结合使用。这次我们基于KDD 99的样本数据,尝试使用KNN算法识别基于telnet连接的Rootkit行为,检测流程如图5-6所示。针对KDD 99数据的详细介绍请参考第3章的相关内容。完整演示代码请见本书GitHub上的5-4.py。
1.数据搜集和数据清洗
KDD 99数据已经完成了大部分的数据清洗工作,KDD 99数据集中每个连接用41个特征来描述:
35,tcp,ftp,SF,96,533,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,221,3,0.01,0.03,0.00,0.00,0.00,0.00,0.00,0.00,rootkit. 0,tcp,ftp_data,SF,116,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,52,0.20,0.03,0.20,0.00,0.20,0.00,0.02,0.00,rootkit. 15,tcp,ftp,SF,45,214,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,226,4,0.02,0.03,0.00,0.00,0.00,0.00,0.00,0.00,rootkit.
图5-6 基于telnet连接的Rootkit检测流程
其中和Rootkit相关的特征主要为TCP连接的内容特征,详见表5-1。
表5-1 KDD 99 TCP连接内容特征
加载KDD 99数据集中的数据:
def load_kdd99(filename): x=[] with open(filename) as f: for line in f: line=line.strip('\n') line=line.split(',') x.append(line) return x
筛选标记为Rootkit和normal且是telnet协议的数据:
if ( x1[41] in ['rootkit.','normal.'] ) and ( x1[2] == 'telnet' ): if x1[41] == 'rootkit.': y.append(1) else: y.append(0)
2.特征化
挑选与Rootkit相关的特征作为样本特征:
x1 = x1[9:21] v.append(x1) for x1 in v : v1=[] for x2 in x1: v1.append(float(x2)) w.append(v1)
3.训练样本
实例化KNN算法,邻居数设置为3:
clf = KNeighborsClassifier(n_neighbors=3)
4.效果验证
我们使用十折交叉验证。
print cross_validation.cross_val_score(clf, x, y, n_jobs=-1,cv=10)
测试结果如下,准确率约为90%。
[ 0.9 0.9 1. 1. 1. 0.77777778 1. 1. 1. 1. ]