Sunday, June 10, 2012

A Few Words on Nonparametric Estimation

The main idea behind this Google Summer of Code project is to expand the nonparametric capabilities of statsmodels - a statistical library for Python. Nonparametric estimation requires almost no assumptions about the true distribution. We only require that the "true" distribution is smooth and differentiable.

The most well-known example of a nonparametric estimation is the simple histogram. By looking at the frequency of the data we infer characteristics of its probability distribution (normality, skewness, variance etc.). The field of nonparametric econometrics take this idea a little further by developing theoretically consistent ways of dealing with the bandwidth selection (e.g. number and width of the bins), incorporating multiple variables and estimating joint probability distributions of the type f(X_1,X_2,...X_n), working with mixed data types (continuous, ordered and unordered variables) estimating conditional densities etc.

Of course nonparametric estimation is not a "silver bullet". Not having to specify a priori assumptions about the true state of the world is a luxury and it comes at a price. A major drawback of nonparametric estimation is that consistent results require a great deal more data than the usual parametric methods. Furthermore, some of the bandwidth selection methods are computationally intensive and can take significant amounts of computational time. Working with many variables can also be challenging as one quickly runs into the "curse of dimensionality" - adding continuous variables rapidly increases the need for more data.

That being said, the growing computational capabilities of computers combined with the rapid accumulation of data from all walks of life will make the nonparametric methods an appealing inferential tool.

No comments:

Post a Comment