当前位置: 移动技术网 > IT编程>脚本编程>Python > python实现求特征选择的信息增益

python实现求特征选择的信息增益

2019年01月08日  | 移动技术网IT编程  | 我要评论

冲田宗次郎,流落异国卖身还债,苯酚红

使用python语言,实现求特征选择的信息增益,可以同时满足特征中有连续型和二值离散型属性的情况。

师兄让我做一个特征选择的代码,我在网上找了一下,大部分都是用来求离散型属性的信息益益,但是我的数据是同时包含二值离散型和连续型属性的,所以这里实现了一下。

代码块

import numpy as np
import math

class ig():
  def __init__(self,x,y):

    x = np.array(x)
    n_feature = np.shape(x)[1]
    n_y = len(y)

    orig_h = 0
    for i in set(y):
      orig_h += -(y.count(i)/n_y)*math.log(y.count(i)/n_y)

    condi_h_list = []
    for i in range(n_feature):
      feature = x[:,i]
      sourted_feature = sorted(feature)
      threshold = [(sourted_feature[inde-1]+sourted_feature[inde])/2 for inde in range(len(feature)) if inde != 0 ]

      thre_set = set(threshold)
      if float(max(feature)) in thre_set:
        thre_set.remove(float(max(feature)))
      if min(feature) in thre_set:
        thre_set.remove(min(feature))
      pre_h = 0
      for thre in thre_set:
        lower = [y[s] for s in range(len(feature)) if feature[s] < thre]
        highter = [y[s] for s in range(len(feature)) if feature[s] > thre]
        h_l = 0
        for l in set(lower):
          h_l += -(lower.count(l) / len(lower))*math.log(lower.count(l) / len(lower))
        h_h = 0
        for h in set(highter):
          h_h += -(highter.count(h) / len(highter))*math.log(highter.count(h) / len(highter))
        temp_condi_h = len(lower)/n_y *h_l+ len(highter)/n_y * h_h
        condi_h = orig_h - temp_condi_h
        pre_h = max(pre_h,condi_h)
      condi_h_list.append(pre_h)

    self.ig = condi_h_list


  def getig(self):
    return self.ig

if __name__ == "__main__":


  x = [[1, 0, 0, 1],
     [0, 1, 1, 1],
     [0, 0, 1, 0]]
  y = [0, 0, 1]


  print(ig(x,y).getig())

输出结果为:

[0.17441604792151594, 0.17441604792151594, 0.17441604792151594, 0.6365141682948128]

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持移动技术网。

如对本文有疑问,请在下面进行留言讨论,广大热心网友会与你互动!! 点击进行留言回复

相关文章:

验证码:
移动技术网