Analysis of Kernel Density Estimation Accuracy using Quasi-Monte Carlo Methods
Kernel density estimation (KDE) is a non-parametric method used to estimate the probability density function (PDF) of a random variable based on a sample of observations. In traditional Monte Carlo (MC) methods, these observations are generated as independent and identically distributed (IID) points that are randomly scattered throughout the domain. We are interested in analyzing the performance of KDEs performed using quasi-Monte Carlo (QMC) methods, where the observations are generated deterministically as low discrepancy (LD) sequences of points. LD sequences are purposely generated to be uniformly distributed over the domain, leading to faster convergence rates and lower-variance estimates of the PDF. Our objective is to find combinations of parameters and other factors that lead to the highest-accuracy estimates using KDEs, thus providing a comprehensive empirical study of KDEs using QMC methods. Parameters that we study include the bandwidth (smoothing parameter), sample size, number of dimensions, and distribution that we are trying to estimate (uniform, Gaussian, or chi-square). In addition, we compare the accuracies obtained when using MC vs. QMC methods. Our results show that QMC methods are advantageous over MC methods, especially in higher dimensions, since the root mean squared errors for QMC methods are lower than those for MC methods. Furthermore, we find that QMC methods require smaller sample sizes and lower bandwidths to obtain optimal accuracies, meaning that less computation time is required and less information is lost about the function we are trying to estimate. Our results support theoretical work done previously in the field of QMC research, pave the way for future work involving optimizing KDE parameters, and invite new research on QMC methods that can better produce estimates of the density of random variables.
Comments