Kernel Density Estimation

   Development conducted in collaboration with Frederik Verbist.
More information: Jan Lemeire (
   Interactive tutorial


Parameters: discretisation size, and for bandwidth estimation: bandwidth factor, number of neighbours

  This code is being developed right now, please excuse us if some unconvenient 'features' appear. Feel free to annoy our young responsible developer with complaints.

  Alternatively, run as an application: download jar-file and run module with java -Xbootclasspath/p:causalLearningWithKde.jar (Under Windows: Start => Run => execute cmd to open command shell, go to download directory with cd <directory>).

KDE has basically 2 parameters: the kernel type and the bandwidth.
The default kernel type is a Gaussian and the bandwidth is then the standard deviation.
Research has found out that the form of the kernel is not so important, but the bandwidth is.
We have two methods for estimating a good bandwidth:
The 'constant bandwidth' method takes a constant bandwidth for all points, it is estimated by multiplying the average distance to its closest neighbour by a certain factor. We calculate the average distance to the closest neighbour of the points by simply dividing the range (= max - min) bij the number of data points - 1. The factor is called the bandwidth factor and can be set in the applets.
The second method tries to overcome some deficiencies of the previous, especially if the points are not distributed evenly over the area. It estimates a good bandwidth for each point, based on the nearness of the n closest neighbours. Likewise, it divides the range defined by the n neighbours by n and multiplies this by the bandwidth factor.
The tuning of the BW factor is done by performing several experiments. It's a trade-off between under- and oversmoothing.

Compare estimators

Applet will be added soon...

generate from distribution applet

last updated: February, 28th 2006 by Jan Lemeire
Parallel Computing Lab, VUB