当前位置: 移动技术网 > IT编程>脚本编程>Python > 机器学习【算法】:KNN近邻

机器学习【算法】:KNN近邻

2018年09月19日  | 移动技术网IT编程  | 我要评论

引言

  本文讨论的knn算法是监督学习中分类方法的一种。所谓监督学习与非监督学习,是指训练数据是否有标注类别,若有则为监督学习,若否则为非监督学习。监督学习是根据输入数据(训练数据)学习一个模型,能对后来的输入做预测。在监督学习中,输入变量与输出变量可以是连续的,也可以是离散的。若输入变量与输出变量均为连续变量,则称为回归;输出变量为有限个离散变量,则称为分类;输入变量与输出变量均为变量序列,则称为标注

  有监督的分类学习

knn算法的基本要素大致有三个:

  1、k 值的选择 (即输入新实例要取多少个训练实例点作为近邻)

  2、距离的度量 (欧氏距离,曼哈顿距离等)

  3、分类决策规则 (常用的方式是取k个近邻训练实例中类别出现次数最多者作为输入新实例的类别)

使用方式:

  k 值会对算法的结果产生重大影响。k值较小意味着只有与输入实例较近的训练实例才会对预测结果起作用,容易发生过拟合;如果 k 值较大,优点是可以减少学习的估计误差,缺点是学习的近似误差增大,这时与输入实例较远的训练实例也会对预测起作用,是预测发生错误。在实际应用中,k 值一般选择一个较小的数值,通常采用交叉验证的方法来选择最有的 k 值。随着训练实例数目趋向于无穷和 k=1 时,误差率不会超过贝叶斯误差率的2倍,如果k也趋向于无穷,则误差率趋向于贝叶斯误差率。

  距离度量一般采用 lp 距离,当p=2时,即为欧氏距离,在度量之前,应该将每个属性的值规范化这样有助于防止具有较大初始值域的属性比具有较小初始值域的属性的权重过大(规范)

  算法中的分类决策规则往往是多数表决,即由输入实例的 k 个最临近的训练实例中的多数类决定输入实例的类别

 

简述 

  knn分类算法,是理论上比较成熟的方法,也是最简单的机器学习算法之一。

  该方法的思路是:如果一个样本在特征空间中的k个最相似(即特征空间中最邻近)的样本中的大多数属于某一个类别,则该样本也属于这个类别。

  knn算法中,所选择的邻居都是已经正确分类的对象。该方法在定类决策上只依据最邻近的一个或者几个样本的类别来决定待分样本所属的类别。

   一个对于knn算法解释最清楚的图如下所示:

  

  蓝方块和红三角均是已有分类数据,当前的任务是将绿色圆块进行分类判断,判断是属于蓝方块或者红三角。

  当然这里的分类还跟k值是有关的:

  如果k=3(实线圈),红三角占比2/3,则判断为红三角;

  如果k=5(虚线圈),蓝方块占比3/5,则判断为蓝方块。

  由此可以看出knn算法实际上根本就不用进行训练,而是直接进行计算的,训练时间为0,计算时间为训练集规模n。

  knn算法在分类时主要的不足是,当样本不平衡时,如果一个类的样本容量很大,而其他类样本容量很小时,有可能导致当输入一个新样本时,该样本的 k 个邻居中大容量类的样本占多数。

 

*算法实现

1、函数

输入: 训练数据集 ,其中为训练实例的特征向量(xi为特征值),为训练实例的类别(yi为xi对应的类别

输出: 新输入实例x所属类别y实例类别

1、根据给定的距离度量,在训练集t中找到与x最近的k个点,涵盖这k个点的邻域记为n_k(x)上图中实体圆圈包含的数据

2、在n_k(x)中根据分类决策规则(如多数表决)决定x所属的类别y(即统计n_k(x)哪个y类别出现的次数多,则取哪个):

          

其中i为指示函数,仅当y_i = c_j时i(*)的值为1,否则为0.(argmax公式表示后面取值最大时,cj的变量值,公式的目的也是求此值;xi为n_k(x)中的值,cj为涵盖领域存在的类别;利用求和公式∑,当cj为c1时,xi∈n_k(x)分别代入,得出值;xi为已知数据,cj值不断变化,求出公式的最大值,得出最终y得取值

2、距离度量方式

较为常用的距离度量方式是欧式距离,定义可以使用其他更为一般的l_p距离或闵科夫斯基(minkowski)距离。

设特征空间\chi 为n为实数向量空间r^nx_i,x_j \in \chi,x_i=(x^{(1)}_i,x^{(2)}_i,x^{(3)}_i,...,x^{(n)}_i)^tx_j=(x^{(1)}_j,x^{(2)}_j,x^{(3)}_j,...,x^{(n)}_j)^tx_i,x_jl_p距离可定义为:

  l_p(x_i,x_j) = (\sum_{l=1}^{n}{|x_i^{(l)}-x_j^{(l)}|^p} )^{1/p}    闵科夫斯基距离

  l_p(x_i,x_j) = (\sum_{l=1}^{n}{|x_i^{(l)}-x_j^{(l)}|^2} )^{1/2}    欧氏距离,p取2

  l_p(x_i,x_j) = \sum_{l=1}^{n}{|x_i^{(l)}-x_j^{(l)}} |        曼哈顿距离,p取1

  l_p(x_i,x_j) = \max \limits_l {|x_i^{(l)}-x_j^{(l)}} |        p取\infty

3、k值选择

   一般会先选择较小的k值,然后进行交叉验证选取最优的k值。k值较小时,整体模型会变得复杂,且对近邻的训练实例点较为敏感,容易出现过拟合。k值较大时,模型则会趋于简单,此时较远的训练实例点也会起到预测作用,容易出现欠拟合,特殊的,当k取n时,此时无论输入实例是什么,都会将其预测为属于训练实例中最多的类别。

4、分类决策规则

  knn学习模型:输入 

  通过学习得到决策函数:输出类别 

假设分类损失函数为0-1损失函数,即分类正确时损失函数值为0,分类错误时则为1。假如给预测类别为,即;同时由式子(1)可知k邻域的样本点对学习模型的贡献度是均等的,则knn学习模型误分类率为

  

若要最小化误分类率,则应

  

所以,最大表决规则等价于经验风险最小化。

k值得选取对knn学习模型有着很大的影响。若k值过小,预测结果会对噪音样本点显得异常敏感。特别地,当k等于1时,knn退化成最近邻算法,没有了显式的学习过程。若k值过大,会有较大的邻域训练样本进行预测,可以减小噪音样本点的减少;但是距离较远的训练样本点对预测结果会有贡献,以至于造成预测结果错误。下图给出k值的选取对于预测结果的影响:

  

前面提到过,k邻域的样本点对预测结果的贡献度是相等的;但距离更近的样本点应有更大的相似度,其贡献度应比距离更远的样本点大。可以加上权值进行修正,则最大表决原则变成:

  

 

应用案例 

对data2.txt中的数据进行运算,取前10%的数据进行验证,90%的数据做训练,根据k值的不同,算出误差率,选出最佳k值;

40920    8.326976    0.953952    3
14488    7.153469    1.673904    2
26052    1.441871    0.805124    1
75136    13.147394    0.428964    1
38344    1.669788    0.134296    1
72993    10.141740    1.032955    1
35948    6.830792    1.213192    3
42666    13.276369    0.543880    3
67497    8.631577    0.749278    1
35483    12.273169    1.508053    3
50242    3.723498    0.831917    1
63275    8.385879    1.669485    1
5569    4.875435    0.728658    2
51052    4.680098    0.625224    1
77372    15.299570    0.331351    1
43673    1.889461    0.191283    1
61364    7.516754    1.269164    1
69673    14.239195    0.261333    1
15669    0.000000    1.250185    2
28488    10.528555    1.304844    3
6487    3.540265    0.822483    2
37708    2.991551    0.833920    1
22620    5.297865    0.638306    2
28782    6.593803    0.187108    3
19739    2.816760    1.686209    2
36788    12.458258    0.649617    3
5741    0.000000    1.656418    2
28567    9.968648    0.731232    3
6808    1.364838    0.640103    2
41611    0.230453    1.151996    1
36661    11.865402    0.882810    3
43605    0.120460    1.352013    1
15360    8.545204    1.340429    3
63796    5.856649    0.160006    1
10743    9.665618    0.778626    2
70808    9.778763    1.084103    1
72011    4.932976    0.632026    1
5914    2.216246    0.587095    2
14851    14.305636    0.632317    3
33553    12.591889    0.686581    3
44952    3.424649    1.004504    1
17934    0.000000    0.147573    2
27738    8.533823    0.205324    3
29290    9.829528    0.238620    3
42330    11.492186    0.263499    3
36429    3.570968    0.832254    1
39623    1.771228    0.207612    1
32404    3.513921    0.991854    1
27268    4.398172    0.975024    1
5477    4.276823    1.174874    2
14254    5.946014    1.614244    2
68613    13.798970    0.724375    1
41539    10.393591    1.663724    3
7917    3.007577    0.297302    2
21331    1.031938    0.486174    2
8338    4.751212    0.064693    2
5176    3.692269    1.655113    2
18983    10.448091    0.267652    3
68837    10.585786    0.329557    1
13438    1.604501    0.069064    2
48849    3.679497    0.961466    1
12285    3.795146    0.696694    2
7826    2.531885    1.659173    2
5565    9.733340    0.977746    2
10346    6.093067    1.413798    2
1823    7.712960    1.054927    2
9744    11.470364    0.760461    3
16857    2.886529    0.934416    2
39336    10.054373    1.138351    3
65230    9.972470    0.881876    1
2463    2.335785    1.366145    2
27353    11.375155    1.528626    3
16191    0.000000    0.605619    2
12258    4.126787    0.357501    2
42377    6.319522    1.058602    1
25607    8.680527    0.086955    3
77450    14.856391    1.129823    1
58732    2.454285    0.222380    1
46426    7.292202    0.548607    3
32688    8.745137    0.857348    3
64890    8.579001    0.683048    1
8554    2.507302    0.869177    2
28861    11.415476    1.505466    3
42050    4.838540    1.680892    1
32193    10.339507    0.583646    3
64895    6.573742    1.151433    1
2355    6.539397    0.462065    2
0    2.209159    0.723567    2
70406    11.196378    0.836326    1
57399    4.229595    0.128253    1
41732    9.505944    0.005273    3
11429    8.652725    1.348934    3
75270    17.101108    0.490712    1
5459    7.871839    0.717662    2
73520    8.262131    1.361646    1
40279    9.015635    1.658555    3
21540    9.215351    0.806762    3
17694    6.375007    0.033678    2
22329    2.262014    1.022169    1
46570    5.677110    0.709469    1
42403    11.293017    0.207976    3
33654    6.590043    1.353117    1
9171    4.711960    0.194167    2
28122    8.768099    1.108041    3
34095    11.502519    0.545097    3
1774    4.682812    0.578112    2
40131    12.446578    0.300754    3
13994    12.908384    1.657722    3
77064    12.601108    0.974527    1
11210    3.929456    0.025466    2
6122    9.751503    1.182050    3
15341    3.043767    0.888168    2
44373    4.391522    0.807100    1
28454    11.695276    0.679015    3
63771    7.879742    0.154263    1
9217    5.613163    0.933632    2
69076    9.140172    0.851300    1
24489    4.258644    0.206892    1
16871    6.799831    1.221171    2
39776    8.752758    0.484418    3
5901    1.123033    1.180352    2
40987    10.833248    1.585426    3
7479    3.051618    0.026781    2
38768    5.308409    0.030683    3
4933    1.841792    0.028099    2
32311    2.261978    1.605603    1
26501    11.573696    1.061347    3
37433    8.038764    1.083910    3
23503    10.734007    0.103715    3
68607    9.661909    0.350772    1
27742    9.005850    0.548737    3
11303    0.000000    0.539131    2
0    5.757140    1.062373    2
32729    9.164656    1.624565    3
24619    1.318340    1.436243    1
42414    14.075597    0.695934    3
20210    10.107550    1.308398    3
33225    7.960293    1.219760    3
54483    6.317292    0.018209    1
18475    12.664194    0.595653    3
33926    2.906644    0.581657    1
43865    2.388241    0.913938    1
26547    6.024471    0.486215    3
44404    7.226764    1.255329    3
16674    4.183997    1.275290    2
8123    11.850211    1.096981    3
42747    11.661797    1.167935    3
56054    3.574967    0.494666    1
10933    0.000000    0.107475    2
18121    7.937657    0.904799    3
11272    3.365027    1.014085    2
16297    0.000000    0.367491    2
28168    13.860672    1.293270    3
40963    10.306714    1.211594    3
31685    7.228002    0.670670    3
55164    4.508740    1.036192    1
17595    0.366328    0.163652    2
1862    3.299444    0.575152    2
57087    0.573287    0.607915    1
63082    9.183738    0.012280    1
51213    7.842646    1.060636    3
6487    4.750964    0.558240    2
4805    11.438702    1.556334    3
30302    8.243063    1.122768    3
68680    7.949017    0.271865    1
17591    7.875477    0.227085    2
74391    9.569087    0.364856    1
37217    7.750103    0.869094    3
42814    0.000000    1.515293    1
14738    3.396030    0.633977    2
19896    11.916091    0.025294    3
14673    0.460758    0.689586    2
32011    13.087566    0.476002    3
58736    4.589016    1.672600    1
54744    8.397217    1.534103    1
29482    5.562772    1.689388    1
27698    10.905159    0.619091    3
11443    1.311441    1.169887    2
56117    10.647170    0.980141    3
39514    0.000000    0.481918    1
26627    8.503025    0.830861    3
16525    0.436880    1.395314    2
24368    6.127867    1.102179    1
22160    12.112492    0.359680    3
6030    1.264968    1.141582    2
6468    6.067568    1.327047    2
22945    8.010964    1.681648    3
18520    3.791084    0.304072    2
34914    11.773195    1.262621    3
6121    8.339588    1.443357    2
38063    2.563092    1.464013    1
23410    5.954216    0.953782    1
35073    9.288374    0.767318    3
52914    3.976796    1.043109    1
16801    8.585227    1.455708    3
9533    1.271946    0.796506    2
16721    0.000000    0.242778    2
5832    0.000000    0.089749    2
44591    11.521298    0.300860    3
10143    1.139447    0.415373    2
21609    5.699090    1.391892    2
23817    2.449378    1.322560    1
15640    0.000000    1.228380    2
8847    3.168365    0.053993    2
50939    10.428610    1.126257    3
28521    2.943070    1.446816    1
32901    10.441348    0.975283    3
42850    12.478764    1.628726    3
13499    5.856902    0.363883    2
40345    2.476420    0.096075    1
43547    1.826637    0.811457    1
70758    4.324451    0.328235    1
19780    1.376085    1.178359    2
44484    5.342462    0.394527    1
54462    11.835521    0.693301    3
20085    12.423687    1.424264    3
42291    12.161273    0.071131    3
47550    8.148360    1.649194    3
11938    1.531067    1.549756    2
40699    3.200912    0.309679    1
70908    8.862691    0.530506    1
73989    6.370551    0.369350    1
11872    2.468841    0.145060    2
48463    11.054212    0.141508    3
15987    2.037080    0.715243    2
70036    13.364030    0.549972    1
32967    10.249135    0.192735    3
63249    10.464252    1.669767    1
42795    9.424574    0.013725    3
14459    4.458902    0.268444    2
19973    0.000000    0.575976    2
5494    9.686082    1.029808    3
67902    13.649402    1.052618    1
25621    13.181148    0.273014    3
27545    3.877472    0.401600    1
58656    1.413952    0.451380    1
7327    4.248986    1.430249    2
64555    8.779183    0.845947    1
8998    4.156252    0.097109    2
11752    5.580018    0.158401    2
76319    15.040440    1.366898    1
27665    12.793870    1.307323    3
67417    3.254877    0.669546    1
21808    10.725607    0.588588    3
15326    8.256473    0.765891    2
20057    8.033892    1.618562    3
79341    10.702532    0.204792    1
15636    5.062996    1.132555    2
35602    10.772286    0.668721    3
28544    1.892354    0.837028    1
57663    1.019966    0.372320    1
78727    15.546043    0.729742    1
68255    11.638205    0.409125    1
14964    3.427886    0.975616    2
21835    11.246174    1.475586    3
7487    0.000000    0.645045    2
8700    0.000000    1.424017    2
26226    8.242553    0.279069    3
65899    8.700060    0.101807    1
6543    0.812344    0.260334    2
46556    2.448235    1.176829    1
71038    13.230078    0.616147    1
47657    0.236133    0.340840    1
19600    11.155826    0.335131    3
37422    11.029636    0.505769    3
1363    2.901181    1.646633    2
26535    3.924594    1.143120    1
47707    2.524806    1.292848    1
38055    3.527474    1.449158    1
6286    3.384281    0.889268    2
10747    0.000000    1.107592    2
44883    11.898890    0.406441    3
56823    3.529892    1.375844    1
68086    11.442677    0.696919    1
70242    10.308145    0.422722    1
11409    8.540529    0.727373    2
67671    7.156949    1.691682    1
61238    0.720675    0.847574    1
17774    0.229405    1.038603    2
53376    3.399331    0.077501    1
30930    6.157239    0.580133    1
28987    1.239698    0.719989    1
13655    6.036854    0.016548    2
7227    5.258665    0.933722    2
40409    12.393001    1.571281    3
13605    9.627613    0.935842    2
26400    11.130453    0.597610    3
13491    8.842595    0.349768    3
30232    10.690010    1.456595    3
43253    5.714718    1.674780    3
55536    3.052505    1.335804    1
8807    0.000000    0.059025    2
25783    9.945307    1.287952    3
22812    2.719723    1.142148    1
77826    11.154055    1.608486    1
38172    2.687918    0.660836    1
31676    10.037847    0.962245    3
74038    12.404762    1.112080    1
44738    10.237305    0.633422    3
17410    4.745392    0.662520    2
5688    4.639461    1.569431    2
36642    3.149310    0.639669    1
29956    13.406875    1.639194    3
60350    6.068668    0.881241    1
23758    9.477022    0.899002    3
25780    3.897620    0.560201    2
11342    5.463615    1.203677    2
36109    3.369267    1.575043    1
14292    5.234562    0.825954    2
11160    0.000000    0.722170    2
23762    12.979069    0.504068    3
39567    5.376564    0.557476    1
25647    13.527910    1.586732    3
14814    2.196889    0.784587    2
73590    10.691748    0.007509    1
35187    1.659242    0.447066    1
49459    8.369667    0.656697    3
31657    13.157197    0.143248    3
6259    8.199667    0.908508    2
33101    4.441669    0.439381    3
27107    9.846492    0.644523    3
17824    0.019540    0.977949    2
43536    8.253774    0.748700    3
67705    6.038620    1.509646    1
35283    6.091587    1.694641    3
71308    8.986820    1.225165    1
31054    11.508473    1.624296    3
52387    8.807734    0.713922    3
40328    0.000000    0.816676    1
34844    8.889202    1.665414    3
11607    3.178117    0.542752    2
64306    7.013795    0.139909    1
32721    9.605014    0.065254    3
33170    1.230540    1.331674    1
37192    10.412811    0.890803    3
13089    0.000000    0.567161    2
66491    9.699991    0.122011    1
15941    0.000000    0.061191    2
4272    4.455293    0.272135    2
48812    3.020977    1.502803    1
28818    8.099278    0.216317    3
35394    1.157764    1.603217    1
71791    10.105396    0.121067    1
40668    11.230148    0.408603    3
39580    9.070058    0.011379    3
11786    0.566460    0.478837    2
19251    0.000000    0.487300    2
56594    8.956369    1.193484    3
54495    1.523057    0.620528    1
11844    2.749006    0.169855    2
45465    9.235393    0.188350    3
31033    10.555573    0.403927    3
16633    6.956372    1.519308    2
13887    0.636281    1.273984    2
52603    3.574737    0.075163    1
72000    9.032486    1.461809    1
68497    5.958993    0.023012    1
35135    2.435300    1.211744    1
26397    10.539731    1.638248    3
7313    7.646702    0.056513    2
91273    20.919349    0.644571    1
24743    1.424726    0.838447    1
31690    6.748663    0.890223    3
15432    2.289167    0.114881    2
58394    5.548377    0.402238    1
33962    6.057227    0.432666    1
31442    10.828595    0.559955    3
31044    11.318160    0.271094    3
29938    13.265311    0.633903    3
9875    0.000000    1.496715    2
51542    6.517133    0.402519    3
11878    4.934374    1.520028    2
69241    10.151738    0.896433    1
37776    2.425781    1.559467    1
68997    9.778962    1.195498    1
67416    12.219950    0.657677    1
59225    7.394151    0.954434    1
29138    8.518535    0.742546    3
5962    2.798700    0.662632    2
10847    0.637930    0.617373    2
70527    10.750490    0.097415    1
9610    0.625382    0.140969    2
64734    10.027968    0.282787    1
25941    9.817347    0.364197    3
2763    0.646828    1.266069    2
55601    3.347111    0.914294    1
31128    11.816892    0.193798    3
5181    0.000000    1.480198    2
69982    10.945666    0.993219    1
52440    10.244706    0.280539    3
57350    2.579801    1.149172    1
57869    2.630410    0.098869    1
56557    11.746200    1.695517    3
42342    8.104232    1.326277    3
15560    12.409743    0.790295    3
34826    12.167844    1.328086    3
8569    3.198408    0.299287    2
77623    16.055513    0.541052    1
78184    7.138659    0.158481    1
7036    4.831041    0.761419    2
69616    10.082890    1.373611    1
21546    10.066867    0.788470    3
36715    8.129538    0.329913    3
20522    3.012463    1.138108    2
42349    3.720391    0.845974    1
9037    0.773493    1.148256    2
26728    10.962941    1.037324    3
587    0.177621    0.162614    2
48915    3.085853    0.967899    1
9824    8.426781    0.202558    2
4135    1.825927    1.128347    2
9666    2.185155    1.010173    2
59333    7.184595    1.261338    1
36198    0.000000    0.116525    1
34909    8.901752    1.033527    3
47516    2.451497    1.358795    1
55807    3.213631    0.432044    1
14036    3.974739    0.723929    2
42856    9.601306    0.619232    3
64007    8.363897    0.445341    1
59428    6.381484    1.365019    1
13730    0.000000    1.403914    2
41740    9.609836    1.438105    3
63546    9.904741    0.985862    1
30417    7.185807    1.489102    3
69636    5.466703    1.216571    1
64660    0.000000    0.915898    1
14883    4.575443    0.535671    2
7965    3.277076    1.010868    2
68620    10.246623    1.239634    1
8738    2.341735    1.060235    2
7544    3.201046    0.498843    2
6377    6.066013    0.120927    2
36842    8.829379    0.895657    3
81046    15.833048    1.568245    1
67736    13.516711    1.220153    1
32492    0.664284    1.116755    1
39299    6.325139    0.605109    3
77289    8.677499    0.344373    1
33835    8.188005    0.964896    3
71890    9.414263    0.384030    1
32054    9.196547    1.138253    3
38579    10.202968    0.452363    3
55984    2.119439    1.481661    1
72694    13.635078    0.858314    1
42299    0.083443    0.701669    1
26635    9.149096    1.051446    3
8579    1.933803    1.374388    2
37302    14.115544    0.676198    3
22878    8.933736    0.943352    3
4364    2.661254    0.946117    2
4985    0.988432    1.305027    2
37068    2.063741    1.125946    1
41137    2.220590    0.690754    1
67759    6.424849    0.806641    1
11831    1.156153    1.613674    2
34502    3.032720    0.601847    1
4088    3.076828    0.952089    2
15199    0.000000    0.318105    2
17309    7.750480    0.554015    3
42816    10.958135    1.482500    3
43751    10.222018    0.488678    3
58335    2.367988    0.435741    1
75039    7.686054    1.381455    1
42878    11.464879    1.481589    3
42770    11.075735    0.089726    3
8848    3.543989    0.345853    2
31340    8.123889    1.282880    3
41413    4.331769    0.754467    3
12731    0.120865    1.211961    2
22447    6.116109    0.701523    3
33564    7.474534    0.505790    3
48907    8.819454    0.649292    3
8762    6.802144    0.615284    2
46696    12.666325    0.931960    3
36851    8.636180    0.399333    3
67639    11.730991    1.289833    1
171    8.132449    0.039062    2
26674    10.296589    1.496144    3
8739    7.583906    1.005764    2
66668    9.777806    0.496377    1
68732    8.833546    0.513876    1
69995    4.907899    1.518036    1
82008    8.362736    1.285939    1
25054    9.084726    1.606312    3
33085    14.164141    0.560970    3
41379    9.080683    0.989920    3
39417    6.522767    0.038548    3
12556    3.690342    0.462281    2
39432    3.563706    0.242019    1
38010    1.065870    1.141569    1
69306    6.683796    1.456317    1
38000    1.712874    0.243945    1
46321    13.109929    1.280111    3
66293    11.327910    0.780977    1
22730    4.545711    1.233254    1
5952    3.367889    0.468104    2
72308    8.326224    0.567347    1
60338    8.978339    1.442034    1
13301    5.655826    1.582159    2
27884    8.855312    0.570684    3
11188    6.649568    0.544233    2
56796    3.966325    0.850410    1
8571    1.924045    1.664782    2
4914    6.004812    0.280369    2
10784    0.000000    0.375849    2
39296    9.923018    0.092192    3
13113    2.389084    0.119284    2
70204    13.663189    0.133251    1
46813    11.434976    0.321216    3
11697    0.358270    1.292858    2
44183    9.598873    0.223524    3
2225    6.375275    0.608040    2
29066    11.580532    0.458401    3
4245    5.319324    1.598070    2
34379    4.324031    1.603481    1
44441    2.358370    1.273204    1
2022    0.000000    1.182708    2
26866    12.824376    0.890411    3
57070    1.587247    1.456982    1
32932    8.510324    1.520683    3
51967    10.428884    1.187734    3
44432    8.346618    0.042318    3
67066    7.541444    0.809226    1
17262    2.540946    1.583286    2
79728    9.473047    0.692513    1
14259    0.352284    0.474080    2
6122    0.000000    0.589826    2
76879    12.405171    0.567201    1
11426    4.126775    0.871452    2
2493    0.034087    0.335848    2
19910    1.177634    0.075106    2
10939    0.000000    0.479996    2
17716    0.994909    0.611135    2
31390    11.053664    1.180117    3
20375    0.000000    1.679729    2
26309    2.495011    1.459589    1
33484    11.516831    0.001156    3
45944    9.213215    0.797743    3
4249    5.332865    0.109288    2
6089    0.000000    1.689771    2
7513    0.000000    1.126053    2
27862    12.640062    1.690903    3
39038    2.693142    1.317518    1
19218    3.328969    0.268271    2
62911    7.193166    1.117456    1
77758    6.615512    1.521012    1
27940    8.000567    0.835341    3
2194    4.017541    0.512104    2
37072    13.245859    0.927465    3
15585    5.970616    0.813624    2
25577    11.668719    0.886902    3
8777    4.283237    1.272728    2
29016    10.742963    0.971401    3
21910    12.326672    1.592608    3
12916    0.000000    0.344622    2
10976    0.000000    0.922846    2
79065    10.602095    0.573686    1
36759    10.861859    1.155054    3
50011    1.229094    1.638690    1
1155    0.410392    1.313401    2
71600    14.552711    0.616162    1
30817    14.178043    0.616313    3
54559    14.136260    0.362388    1
29764    0.093534    1.207194    1
69100    10.929021    0.403110    1
47324    11.432919    0.825959    3
73199    9.134527    0.586846    1
44461    5.071432    1.421420    1
45617    11.460254    1.541749    3
28221    11.620039    1.103553    3
7091    4.022079    0.207307    2
6110    3.057842    1.631262    2
79016    7.782169    0.404385    1
18289    7.981741    0.929789    3
43679    4.601363    0.268326    1
22075    2.595564    1.115375    1
23535    10.049077    0.391045    3
25301    3.265444    1.572970    2
32256    11.780282    1.511014    3
36951    3.075975    0.286284    1
31290    1.795307    0.194343    1
38953    11.106979    0.202415    3
35257    5.994413    0.800021    1
25847    9.706062    1.012182    3
32680    10.582992    0.836025    3
62018    7.038266    1.458979    1
9074    0.023771    0.015314    2
33004    12.823982    0.676371    3
44588    3.617770    0.493483    1
32565    8.346684    0.253317    3
38563    6.104317    0.099207    1
75668    16.207776    0.584973    1
9069    6.401969    1.691873    2
53395    2.298696    0.559757    1
28631    7.661515    0.055981    3
71036    6.353608    1.645301    1
71142    10.442780    0.335870    1
37653    3.834509    1.346121    1
76839    10.998587    0.584555    1
9916    2.695935    1.512111    2
38889    3.356646    0.324230    1
39075    14.677836    0.793183    3
48071    1.551934    0.130902    1
7275    2.464739    0.223502    2
41804    1.533216    1.007481    1
35665    12.473921    0.162910    3
67956    6.491596    0.032576    1
41892    10.506276    1.510747    3
38844    4.380388    0.748506    1
74197    13.670988    1.687944    1
14201    8.317599    0.390409    2
3908    0.000000    0.556245    2
2459    0.000000    0.290218    2
32027    10.095799    1.188148    3
12870    0.860695    1.482632    2
9880    1.557564    0.711278    2
72784    10.072779    0.756030    1
17521    0.000000    0.431468    2
50283    7.140817    0.883813    3
33536    11.384548    1.438307    3
9452    3.214568    1.083536    2
37457    11.720655    0.301636    3
17724    6.374475    1.475925    3
43869    5.749684    0.198875    3
264    3.871808    0.552602    2
25736    8.336309    0.636238    3
39584    9.710442    1.503735    3
31246    1.532611    1.433898    1
49567    9.785785    0.984614    3
7052    2.633627    1.097866    2
35493    9.238935    0.494701    3
10986    1.205656    1.398803    2
49508    3.124909    1.670121    1
5734    7.935489    1.585044    2
65479    12.746636    1.560352    1
77268    10.732563    0.545321    1
28490    3.977403    0.766103    1
13546    4.194426    0.450663    2
37166    9.610286    0.142912    3
16381    4.797555    1.260455    2
10848    1.615279    0.093002    2
35405    4.614771    1.027105    1
15917    0.000000    1.369726    2
6131    0.608457    0.512220    2
67432    6.558239    0.667579    1
30354    12.315116    0.197068    3
69696    7.014973    1.494616    1
33481    8.822304    1.194177    3
43075    10.086796    0.570455    3
38343    7.241614    1.661627    3
14318    4.602395    1.511768    2
5367    7.434921    0.079792    2
37894    10.467570    1.595418    3
36172    9.948127    0.003663    3
40123    2.478529    1.568987    1
10976    5.938545    0.878540    2
12705    0.000000    0.948004    2
12495    5.559181    1.357926    2
35681    9.776654    0.535966    3
46202    3.092056    0.490906    1
11505    0.000000    1.623311    2
22834    4.459495    0.538867    1
49901    8.334306    1.646600    3
71932    11.226654    0.384686    1
13279    3.904737    1.597294    2
49112    7.038205    1.211329    3
77129    9.836120    1.054340    1
37447    1.990976    0.378081    1
62397    9.005302    0.485385    1
0    1.772510    1.039873    2
15476    0.458674    0.819560    2
40625    10.003919    0.231658    3
36706    0.520807    1.476008    1
28580    10.678214    1.431837    3
25862    4.425992    1.363842    1
63488    12.035355    0.831222    1
33944    10.606732    1.253858    3
30099    1.568653    0.684264    1
13725    2.545434    0.024271    2
36768    10.264062    0.982593    3
64656    9.866276    0.685218    1
14927    0.142704    0.057455    2
43231    9.853270    1.521432    3
66087    6.596604    1.653574    1
19806    2.602287    1.321481    2
41081    10.411776    0.664168    3
10277    7.083449    0.622589    2
7014    2.080068    1.254441    2
17275    0.522844    1.622458    2
31600    10.362000    1.544827    3
59956    3.412967    1.035410    1
42181    6.796548    1.112153    3
51743    4.092035    0.075804    1
5194    2.763811    1.564325    2
30832    12.547439    1.402443    3
7976    5.708052    1.596152    2
14602    4.558025    0.375806    2
41571    11.642307    0.438553    3
55028    3.222443    0.121399    1
5837    4.736156    0.029871    2
39808    10.839526    0.836323    3
20944    4.194791    0.235483    2
22146    14.936259    0.888582    3
42169    3.310699    1.521855    1
7010    2.971931    0.034321    2
3807    9.261667    0.537807    2
29241    7.791833    1.111416    3
52696    1.480470    1.028750    1
42545    3.677287    0.244167    1
24437    2.202967    1.370399    1
16037    5.796735    0.935893    2
8493    3.063333    0.144089    2
68080    11.233094    0.492487    1
59016    1.965570    0.005697    1
11810    8.616719    0.137419    2
68630    6.609989    1.083505    1
7629    1.712639    1.086297    2
71992    10.117445    1.299319    1
13398    0.000000    1.104178    2
26241    9.824777    1.346821    3
11160    1.653089    0.980949    2
76701    18.178822    1.473671    1
32174    6.781126    0.885340    3
45043    8.206750    1.549223    3
42173    10.081853    1.376745    3
69801    6.288742    0.112799    1
41737    3.695937    1.543589    1
46979    6.726151    1.069380    3
79267    12.969999    1.568223    1
4615    2.661390    1.531933    2
32907    7.072764    1.117386    3
37444    9.123366    1.318988    3
569    3.743946    1.039546    2
8723    2.341300    0.219361    2
6024    0.541913    0.592348    2
52252    2.310828    1.436753    1
8358    6.226597    1.427316    2
26166    7.277876    0.489252    3
18471    0.000000    0.389459    2
3386    7.218221    1.098828    2
41544    8.777129    1.111464    3
10480    2.813428    0.819419    2
5894    2.268766    1.412130    2
7273    6.283627    0.571292    2
22272    7.520081    1.626868    3
31369    11.739225    0.027138    3
10708    3.746883    0.877350    2
69364    12.089835    0.521631    1
37760    12.310404    0.259339    3
13004    0.000000    0.671355    2
37885    2.728800    0.331502    1
52555    10.814342    0.607652    3
38997    12.170268    0.844205    3
69698    6.698371    0.240084    1
11783    3.632672    1.643479    2
47636    10.059991    0.892361    3
15744    1.887674    0.756162    2
69058    8.229125    0.195886    1
33057    7.817082    0.476102    3
28681    12.277230    0.076805    3
34042    10.055337    1.115778    3
29928    3.596002    1.485952    1
9734    2.755530    1.420655    2
7344    7.780991    0.513048    2
7387    0.093705    0.391834    2
33957    8.481567    0.520078    3
9936    3.865584    0.110062    2
36094    9.683709    0.779984    3
39835    10.617255    1.359970    3
64486    7.203216    1.624762    1
0    7.601414    1.215605    2
39539    1.386107    1.417070    1
66972    9.129253    0.594089    1
15029    1.363447    0.620841    2
44909    3.181399    0.359329    1
38183    13.365414    0.217011    3
37372    4.207717    1.289767    1
0    4.088395    0.870075    2
17786    3.327371    1.142505    2
39055    1.303323    1.235650    1
37045    7.999279    1.581763    3
6435    2.217488    0.864536    2
72265    7.751808    0.192451    1
28152    14.149305    1.591532    3
25931    8.765721    0.152808    3
7538    3.408996    0.184896    2
1315    1.251021    0.112340    2
12292    6.160619    1.537165    2
49248    1.034538    1.585162    1
9025    0.000000    1.034635    2
13438    2.355051    0.542603    2
69683    6.614543    0.153771    1
25374    10.245062    1.450903    3
55264    3.467074    1.231019    1
38324    7.487678    1.572293    3
69643    4.624115    1.185192    1
44058    8.995957    1.436479    3
41316    11.564476    0.007195    3
29119    3.440948    0.078331    1
51656    1.673603    0.732746    1
3030    4.719341    0.699755    2
35695    10.304798    1.576488    3
1537    2.086915    1.199312    2
9083    6.338220    1.131305    2
47744    8.254926    0.710694    3
71372    16.067108    0.974142    1
37980    1.723201    0.310488    1
42385    3.785045    0.876904    1
22687    2.557561    0.123738    1
39512    9.852220    1.095171    3
11885    3.679147    1.557205    2
4944    9.789681    0.852971    2
73230    14.958998    0.526707    1
17585    11.182148    1.288459    3
68737    7.528533    1.657487    1
13818    5.253802    1.378603    2
31662    13.946752    1.426657    3
86686    15.557263    1.430029    1
43214    12.483550    0.688513    3
24091    2.317302    1.411137    1
52544    10.069724    0.766119    3
61861    5.792231    1.615483    1
47903    4.138435    0.475994    1
37190    12.929517    0.304378    3
6013    9.378238    0.307392    2
27223    8.361362    1.643204    3
69027    7.939406    1.325042    1
78642    10.735384    0.705788    1
30254    11.592723    0.286188    3
21704    10.098356    0.704748    3
34985    9.299025    0.545337    3
31316    11.158297    0.218067    3
76368    16.143900    0.558388    1
27953    10.971700    1.221787    3
152    0.000000    0.681478    2
9146    3.178961    1.292692    2
75346    17.625350    0.339926    1
26376    1.995833    0.267826    1
35255    10.640467    0.416181    3
19198    9.628339    0.985462    3
12518    4.662664    0.495403    2
25453    5.754047    1.382742    2
12530    0.000000    0.037146    2
62230    9.334332    0.198118    1
9517    3.846162    0.619968    2
71161    10.685084    0.678179    1
1593    4.752134    0.359205    2
33794    0.697630    0.966786    1
39710    10.365836    0.505898    3
16941    0.461478    0.352865    2
69209    11.339537    1.068740    1
4446    5.420280    0.127310    2
9347    3.469955    1.619947    2
55635    8.517067    0.994858    3
65889    8.306512    0.413690    1
10753    2.628690    0.444320    2
7055    0.000000    0.802985    2
7905    0.000000    1.170397    2
53447    7.298767    1.582346    3
9194    7.331319    1.277988    2
61914    9.392269    0.151617    1
15630    5.541201    1.180596    2
79194    15.149460    0.537540    1
12268    5.515189    0.250562    2
33682    7.728898    0.920494    3
26080    11.318785    1.510979    3
19119    3.574709    1.531514    2
30902    7.350965    0.026332    3
63039    7.122363    1.630177    1
51136    1.828412    1.013702    1
35262    10.117989    1.156862    3
42776    11.309897    0.086291    3
64191    8.342034    1.388569    1
15436    0.241714    0.715577    2
14402    10.482619    1.694972    2
6341    9.289510    1.428879    2
14113    4.269419    0.134181    2
6390    0.000000    0.189456    2
8794    0.817119    0.143668    2
43432    1.508394    0.652651    1
38334    9.359918    0.052262    3
34068    10.052333    0.550423    3
30819    11.111660    0.989159    3
22239    11.265971    0.724054    3
28725    10.383830    0.254836    3
57071    3.878569    1.377983    1
72420    13.679237    0.025346    1
28294    10.526846    0.781569    3
9896    0.000000    0.924198    2
65821    4.106727    1.085669    1
7645    8.118856    1.470686    2
71289    7.796874    0.052336    1
5128    2.789669    1.093070    2
13711    6.226962    0.287251    2
22240    10.169548    1.660104    3
15092    0.000000    1.370549    2
5017    7.513353    0.137348    2
10141    8.240793    0.099735    2
35570    14.612797    1.247390    3
46893    3.562976    0.445386    1
8178    3.230482    1.331698    2
55783    3.612548    1.551911    1
1148    0.000000    0.332365    2
10062    3.931299    0.487577    2
74124    14.752342    1.155160    1
66603    10.261887    1.628085    1
11893    2.787266    1.570402    2
50908    15.112319    1.324132    3
39891    5.184553    0.223382    3
65915    3.868359    0.128078    1
65678    3.507965    0.028904    1
62996    11.019254    0.427554    1
36851    3.812387    0.655245    1
36669    11.056784    0.378725    3
38876    8.826880    1.002328    3
26878    11.173861    1.478244    3
46246    11.506465    0.421993    3
12761    7.798138    0.147917    3
35282    10.155081    1.370039    3
68306    10.645275    0.693453    1
31262    9.663200    1.521541    3
34754    10.790404    1.312679    3
13408    2.810534    0.219962    2
30365    9.825999    1.388500    3
10709    1.421316    0.677603    2
24332    11.123219    0.809107    3
45517    13.402206    0.661524    3
6178    1.212255    0.836807    2
10639    1.568446    1.297469    2
29613    3.343473    1.312266    1
22392    5.400155    0.193494    1
51126    3.818754    0.590905    1
53644    7.973845    0.307364    3
51417    9.078824    0.734876    3
24859    0.153467    0.766619    1
61732    8.325167    0.028479    1
71128    7.092089    1.216733    1
27276    5.192485    1.094409    3
30453    10.340791    1.087721    3
18670    2.077169    1.019775    2
70600    10.151966    0.993105    1
12683    0.046826    0.809614    2
81597    11.221874    1.395015    1
69959    14.497963    1.019254    1
8124    3.554508    0.533462    2
18867    3.522673    0.086725    2
80886    14.531655    0.380172    1
55895    3.027528    0.885457    1
31587    1.845967    0.488985    1
10591    10.226164    0.804403    3
70096    10.965926    1.212328    1
53151    2.129921    1.477378    1
11992    0.000000    1.606849    2
33114    9.489005    0.827814    3
7413    0.000000    1.020797    2
10583    0.000000    1.270167    2
58668    6.556676    0.055183    1
35018    9.959588    0.060020    3
70843    7.436056    1.479856    1
14011    0.404888    0.459517    2
35015    9.952942    1.650279    3
70839    15.600252    0.021935    1
3024    2.723846    0.387455    2
5526    0.513866    1.323448    2
5113    0.000000    0.861859    2
20851    7.280602    1.438470    2
40999    9.161978    1.110180    3
15823    0.991725    0.730979    2
35432    7.398380    0.684218    3
53711    12.149747    1.389088    3
64371    9.149678    0.874905    1
9289    9.666576    1.370330    2
60613    3.620110    0.287767    1
18338    5.238800    1.253646    2
22845    14.715782    1.503758    3
74676    14.445740    1.211160    1
34143    13.609528    0.364240    3
14153    3.141585    0.424280    2
9327    0.000000    0.120947    2
18991    0.454750    1.033280    2
9193    0.510310    0.016395    2
2285    3.864171    0.616349    2
9493    6.724021    0.563044    2
2371    4.289375    0.012563    2
13963    0.000000    1.437030    2
2299    3.733617    0.698269    2
5262    2.002589    1.380184    2
4659    2.502627    0.184223    2
17582    6.382129    0.876581    2
27750    8.546741    0.128706    3
9868    2.694977    0.432818    2
18333    3.951256    0.333300    2
3780    9.856183    0.329181    2
18190    2.068962    0.429927    2
11145    3.410627    0.631838    2
68846    9.974715    0.669787    1
26575    10.650102    0.866627    3
48111    9.134528    0.728045    3
43757    7.882601    1.332446    3
data.txt

代码:

import numpy as np
import operator
import matplotlib.pyplot as plt


def knn_classifier(inx, train_data, labels, k):
    """
    knn_classifier k近邻核心实现代码
    :param inx: 输入待测试实例
    :param train_data: 输入的训练数据集
    :param labels: 训练数据集对应的标签
    :param k: 分为k类
    :return: 返回对应的类别
    """
    train_data_size = train_data.shape[0]  # 获取数据集的
    diff_matrix = np.tile(inx, (train_data_size, 1)) - train_data  # 计算inx与训练数据集中所有数据的差值
    lp_diff_matrix = diff_matrix**2  # 这里采用的是欧氏距离
    lp_distances = lp_diff_matrix.sum(axis=1)
    distances = lp_distances**0.5
    sorted_dist_indicies = np.argsort(distances)  # 按照欧氏距离的大小进行排序
    class_count = {}
    for i in range(k):
        vote_i_label = labels[sorted_dist_indicies[i]]  # 依次存入欧氏距离前k个最小距离的编号
        class_count[vote_i_label] = class_count.get(vote_i_label, 0) + 1  # 对应一个字典
    sorted_class_count = sorted(class_count.items(), key=operator.itemgetter(1), reverse=true)  # 对字典进行排序
    return sorted_class_count[0][0]  # 最高值对应类别即为inx的类别

def file_to_matrix(filename):
    '''
    文件数据转换为矩阵类型
    :param filename:
    :return:
    '''
    f = open(filename)
    lines = f.readlines()
    lines_number = len(lines)
    feature = len(lines[1].split('\t'))
    matrix = np.zeros((lines_number, feature - 1))
    class_label_vector = []
    index = 0
    for line in lines:
        line = line.strip()
        item = line.split('\t')
        matrix[index, :] = item[0:3]
        class_label_vector.append(item[-1])
        index += 1
    return matrix, class_label_vector

def auto_norm(data_set):
    """
    auto_norm 归一化函数,消除量纲不同,不同属性的大小差异过大的影响
    归一化公式 newvalue = (oldvalue - min)/(max - min)  取值0-1
    :param dataset:
    :return:
    """
    min_vals = data_set.min(0)
    max_vals = data_set.max(0)

    ranges = max_vals - min_vals
    # print(type(min_vals))
    m = data_set.shape[0]
    data_set = data_set - np.tile(min_vals, (m, 1))
    data_set = data_set/(np.tile(ranges, (m, 1)))

    return data_set, ranges, min_vals

def dating_class_test():
    """
    dating_class_test 测试约会数据集data2的聚类预测结果
    return: null
    """
    ho_ratio = 0.1
    dating_matrix, dating_labels = file_to_matrix('data2.txt')  # 读入数据
    norm_matrix, ranges, min_vals = auto_norm(dating_matrix)  # 归一化数据
    m = norm_matrix.shape[0]  # 获取行数
    num_test_vecs = int(m * ho_ratio)  # 取出其中的10%作为测试数据
    error_count = 0.0  # 记录错误率
    for i in range(num_test_vecs):
        classifier = knn_classifier(norm_matrix[i, :], norm_matrix[num_test_vecs:m, :], dating_labels[num_test_vecs:m], 4)  #利用knn聚类算法进行预测
        print("the classifier came back with: " +classifier + " the real answer is: " + dating_labels[i])
        if(classifier != dating_labels[i]) :
            error_count += 1.0
    print('the total error rate is: %f' % (error_count/float(num_test_vecs)))


if __name__ == "__main__":
    dating_class_test()

如对本文有疑问, 点击进行留言回复!!

相关文章:

验证码:
移动技术网