我们先演示一下K近邻的基础用法。完整演示代码请见本书GitHub上的5-1.py。
用Ball Tree算法举个简单的例子,其他算法类似,不再赘述。训练数据集合是二维平面上的若干个点:
>>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
调用Scikit-Learn的KNN算法,邻居数设置为2,进行训练:
>>> from sklearn.neighbors import NearestNeighbors >>> import numpy as np >>> X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]]) >>> nbrs = NearestNeighbors(n_neighbors=2, algorithm='ball_tree').fit(X)
计算结果为:
>>> distances, indices = nbrs.kneighbors(X) >>> indices array([[0, 1], [1, 0], [2, 1], [3, 4], [4, 3], [5, 4]]) >>> distances array([[ 0. , 1. ], [ 0. , 1. ], [ 0. , 1.41421356], [ 0. , 1. ], [ 0. , 1. ], [ 0. , 1.41421356]])
可视化结果为:
>>> nbrs.kneighbors_graph(X).toarray() array([[ 1., 1., 0., 0., 0., 0.], [ 1., 1., 0., 0., 0., 0.], [ 0., 1., 1., 0., 0., 0.], [ 0., 0., 0., 1., 1., 0.], [ 0., 0., 0., 1., 1., 0.], [ 0., 0., 0., 0., 1., 1.]])
KNN用于监督学习也非常简单,如下所示:
>>> X = [[0], [1], [2], [3]] >>> y = [0, 0, 1, 1] >>> from sklearn.neighbors import KNeighborsClassifier >>> neigh = KNeighborsClassifier(n_neighbors=3) >>> neigh.fit(X, y) KNeighborsClassifier(...) >>> print(neigh.predict([[1.1]])) [0] >>> print(neigh.predict_proba([[0.9]])) [[ 0.66666667 0.33333333]]