缺少值的距离测量


我建议修正缺少值的距离度量,或者至少添加一些关于如何计算距离的注释(警告等)。
现在计算距离为:
public double calculateDistance(double[] value1, double[] value2) {
双和= 0.0;
Int counter = 0;
For (int I = 0;我< value1.length;我+ +){
如果((! Double.isNaN (value1[我]))& & (! Double.isNaN (value2[我]))){
Double diff = value1[i] - value2[i];
Sum += diff * diff;
计数器+ +;
}
}
If (counter > 0) {
返回Math.sqrt(总和);
其他}{
返回Double.NaN;
}
}
因此,缺失的属性被忽略,这意味着缺失值的距离比非缺失值的距离小。换句话说,对于kNN和其他基于距离的方法,缺少值的实例比其他方法更接近。这将导致不正确的分类结果。
艺术实践的状态是这样实现的
如果((! Double.isNaN (value1[我]))& & (! Double.isNaN (value2[我]))){
Double diff = value1[i] - value2[i];
Sum += diff * diff;
计数器+ +;
其他}{
Double diff = max(i) - min(i);
Sum += diff * diff;
计数器+ +;
}
其中max(i)和min(i)为训练集中给定属性的最大值和最小值,
或者简单地说,如果属性是标准化的,则diff=1。
1
评论