density
Kernel Estimate of Probability Density Function
Description
Returns x and y coordinates of a non-parametric estimate of the probability
density function of the data.
Usage
density(x, ...)
density.default(x, bw = "nrd0", adjust = 1, kernel = c("gaussian",
"epanechnikov", "rectangular", "triangular", "biweight",
"cosine", "optcosine"), weights = NULL, window = kernel,
width, give.Rkern = FALSE, n = 512, from, to, cut = 3, na.rm = FALSE,
...)
Arguments
x |
the vector of observations from the distribution whose density is to
be estimated.
Missing values (NAs) are allowed if na.rm is TRUE.
|
bw |
the smoothing bandwidth to be used in density estimation.
bw can be either a positive number specifying the bandwidth explicitly, or
it can be a character string.
When bw is specified as a character string, case is ignored. The following table
describes the character string, the function it references, and its description.
| character string | function | description |
| "nrd0" | bw.nrd0 | normal reference density, never returning 0.0 |
| "nrd" | bw.nrd | normal reference density, possibly returning 0.0 |
| "bcv" | bw.bcv | biased cross-validation |
| "ucv" | bw.ucv | unbiased cross-validation |
| "sj" "sj-ste" | bw.SJ | the Sheather-Jones "plug-in" estimator with method "ste" |
| "sj-dpi" | bw.SJ | the Sheather-Jones "plug-in" estimator with method "dpi"
|
These referenced functions use various algorithms for choosing the bandwidth given the data x.
(All of these functions ignore the weights argument).
|
adjust |
the number derived from the bw argument is multiplied by adjust to make the bandwidth.
|
kernel |
a character string giving the type of kernel function used in the
computations. Must be one of: "gaussian", "epanechnikov",
"rectangular", "triangular", "biweight",
"cosine", "optcosine" (one character is sufficient).
|
weights |
a vector of same length as x for computing a weighted density
estimate. The weights must be nonnegative and sum to 1.0.
When weights is NULL (the default), all points in x are equally weighted.
|
width |
For compatibility with S-PLUS, this can be used instead of bw. width
is multiplied by a kernel-dependent quantity to make them compatible.
|
give.Rkern |
a logical flag. If TRUE, the quantity
integral(u^2 * K(u) * du) * integral(K(u)^2 *du)
of the selected kernel function is returned instead of the usual return value.
|
n |
the number of equally-spaced points at which to estimate the density. If n is greater than 512,
it is rounded up to the power of 2.
|
from, to |
the n estimated values of density are equally-spaced between
from and to. The default is the range of the data extended by
bw*cut.
|
cut |
the fraction of the window width by which the x values are to be extended.
The default is 3. cut is ignored if from and to are used.
|
na.rm |
a logical flag. If TRUE, then missing values (NAs) are removed before estimation.
If FALSE (the default), then missing values are not allowed.
|
... |
other arguments for non-default methods.
|
Details
These are kernel estimates. For each x value in the output, the window is
centered on that x and the heights of the window at each datapoint are summed.
This sum, after a normalization, is the corresponding y value in the output: the value at x[i] is
y[i]=1/N*sum(K(x[i]-X))
where K is the kernel function specified
by window and width, X is the input data, and N is the length
of X. In the presence of weights the value is
y[i]=1/sum(weights)*sum(weights*K(x[i]-X)).
For efficiency, the convolution is computed using the discrete Fourier transform.
The bandwidth functions bw.SJ, bw.ucv, and bw.bcv are not yet in TIBCO Enterprise Runtime for R.
Value
returns the R-kernel value when
give.Rkern is
TRUE. Otherwise
returns a list object of class
"density" with the following components:
two components,
x and
y, suitable for giving as an argument
to
approx or to be plotted.
x |
the vector of n points at which the density is estimated.
|
y |
the density estimate at each x point.
|
bw |
the smoothing bandwidth is used in density estimation.
|
n |
the number of non-NA observations used to calculate the estimate.
|
call |
the function call.
|
data.name |
the deparse name of x.
|
has.na |
a logical flag if NA exists in observations.
|
Background
Density estimation is essentially a smoothing operation.
Inevitably there is a trade-off between bias in the estimate and the
estimate's variability: wide windows produce smooth estimates that
may hide local features of the density.
References
Becker, R. A., Chambers, J. M., and Wilks, A. R. 1988. The New S Language: A Programming Environment for Data Analysis and Graphics. Pacific Grove, CA: Wadsworth & Brooks/Cole Advanced Books and Software.
Scott, D. W. 1992. Multivariate Density Estimation. Theory, Practice and Visualization. New York, NY: John Wiley & Sons.
Sheather, S. J. and Jones, M. C. 1991. A reliable data-based bandwidth selection method for kernel density estimation. J. Roy. Statist. Soc. Volume B. 683-690.
Silverman, B. W. 1986. Density Estimation for Statistics and Data Analysis. London, UK: Chapman and Hall.
Venables, W. N. and Ripley, B. D. 2002. Modern Applied Statistics with S. Fourth Edition. New York, NY: Springer.
Wegman, E. J. 1972. Nonparametric probability density estimation. Technometrics. Volume 14. 533-546.
See Also
bw.nrd,
bw.nrd0,
hist,
approx.
Examples
density((cos(1:300)+0.09)^3)
density((cos(1:300)+0.09)^3, bw=0.25)