Parameters:
discretisation size, and for bandwidth estimation:
bandwidth
factor, number of neighbours
This code is being
developed right now, please excuse us if some
unconvenient 'features' appear. Feel free to annoy
our young responsible developer
with complaints.
Alternatively,
run as an application: download jar-file and run module with java
-Xbootclasspath/p:causalLearningWithKde.jar
be.ac.vub.tw.statistics.estimators.KdeComparatorPanel (Under
Windows: Start
=> Run => execute cmd
to open command shell, go
to download directory with cd
<directory>).
KDE has basically 2 parameters:
the kernel type and the bandwidth.
The default kernel type is a Gaussian and the bandwidth is then the
standard deviation.
Research has found out that the form of the kernel is not so important,
but the bandwidth is.
We have two methods for estimating a good bandwidth:
The 'constant bandwidth' method takes a constant bandwidth for all
points, it is estimated by multiplying the average distance to its
closest neighbour by a certain factor. We calculate the average
distance to the closest neighbour of the points by simply dividing the
range (= max - min) bij the number of data points - 1. The factor is
called the bandwidth factor and can be set in the applets.
The second method tries to overcome some deficiencies of the previous,
especially if the points are not distributed evenly over the area. It
estimates a good bandwidth for each point, based on the nearness of the
n closest neighbours. Likewise, it divides the range defined by the n
neighbours by n and multiplies this by the bandwidth factor.
The tuning of the BW factor is done by performing several experiments.
It's a trade-off between under- and oversmoothing.