smooth.spline
Fit a Smoothing Spline
Description
Fits a cubic B-spline
smooth to the input data.
Usage
smooth.spline(x, y = NULL, w = NULL, df, spar = NULL, cv = FALSE,
all.knots = FALSE, nknots = NULL, keep.data = TRUE, df.offset = 0,
penalty = 1, control.spar = list(), tol = 1e-06 * IQR(x))
Arguments
x |
the values of the predictor variable.
There should be at least 4 distinct x values.
x and y can be supplied in a variety of different forms, along the
lines of the function plot. For example, it can be a list with components
x and y, a two-column matrix, a formula of the form y~x where y and x
are the names of numeric vectors, or simply a single vector, taken to be a time series.
|
y |
the response variable, of the same length as x. If NULL, then it is generated
from the x argument as described above.
|
w |
a vector of weights for weighted smoothing, of the same
length as x and y.
If the measurements at different values of x have different variances,
w should be inversely proportional to the variances.
The default is that all weights are equal.
|
df |
The desired degrees of freedom of the smoother. The higher the number,
the more wiggly the fitted curve and the more closely it follows the
data. It should lie between 1 and the number of distinct points in x.
|
spar |
Do not use.
|
cv |
a logical value. If TRUE, indicates that the ordinary cross validation score should be computed. If
FALSE (the default), indicates that the generalized cross validation score should be computed.
|
all.knots |
a logical value. If TRUE, a knot is given at each distinct value in x. If
FALSE (the default), a suitable fine grid of knots is chosen, usually fewer in number than
the number of unique values of x.
|
df.offset |
allows an offset to be added to the df term used in the
calculation of the GCV criterion df=tr(S) + df.offset.
|
penalty |
allows the df quantity used in GCV to be charged a cost = penalty
per degree of freedom.
|
nknots |
a numeric vector, with at least with one element,
is used as knots when all.knots is FALSE.
|
keep.data |
a logical value. If TRUE (the default), it keeps the input x, y, w in the
output data.
|
control.spar |
Do not use.
|
tol |
Do not use.
|
Details
The two arguments df.offset and penalty
are experimental and typically should not be used.
If used, the GCV criterion is
RSS/(n - (penalty*(trace(S)-1) + df.offset +1)).
A cubic B-spline is fit with care taken to ensure that the algorithm
runs linear in the number of data points. For small data vectors (n<50),
a knot is placed at every distinct data point,
and the regression is fit by penalized least squares.
For larger data sets, the number of knots is chosen judiciously
to keep the computation time manageable (if all.knots=F).
The penalty spar can be chosen automatically by cross-validation
(if spar=0), can be supplied explicitly, or supplied implicitly via
the more intuitive df number.
Value
returns an object of class smooth.spline. The returned object
consists of the fitted smoothing spline evaluated at the supplied data,
some fitting criteria and constants, and a structure that
contains the essential information for computing the spline
and its derivatives for any values of x.
The components of the returned list are as follows:
x |
ordered distinct x values.
|
y |
smoothing spline fits corresponding to x.
|
w |
weights used in the fit.
This has the same length as x, and in the case of ties,
consists of the accumulated weights at each unique value of x.
|
data |
a list with component input x, y, z,
visible only when keep.data is TRUE.
|
yin |
y-values used at the unique x values (the weighted averages of input y).
|
lev |
the leverage values, which are the diagonal elements of the smoother matrix S.
|
cv.crit |
the cross validation score (either GCV or CV).
|
pen.crit |
the penalized criterion.
|
crit |
the criterion of smoothing spline fits.
|
df |
the degrees of freedom of the fit estimated by the sum of lev.
If df was supplied as the smoothing parameter,
then the prescribed and resultant values of df should match
within 0.1 percent of the supplied df.
|
spar |
the smoothing parameter used in the fit. (This is useful if df was
used to specify the amount of smoothing.)
|
fit |
a list containing the details of the fits (knot locations, coefficients, and so on) to
be used by predict.smooth.spline.
|
call |
the call that produced the fit.
|
lambda |
the smoothing parameter; the "low" part of control.spar after iteration.
|
iparms |
a numeric vector with the names c("icrit", "ispar", "iter"), "iparms" part of fit. |
Differences between TIBCO Enterprise Runtime for R and Open-source R
This function is based on the S-PLUS implementation and has not been
completely converted to the R parameterization. The spar
parameters do not match: They are on the same scale, but differ
by a constant. Several arguments used in the R implementation are
included here but not used. This implementation might choose a slightly
rougher spline fit than the implementation in open-source R.
References
Chambers, J. M. and Hastie, T .J. (Eds.) 1992. Statistical Models in S. Pacific Grove, CA.: Wadsworth & Brooks/Cole.
Green, P. J. and Silverman, B. W. 1994. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. London, UK: Chapman and Hall.
Hastie, T. J. and Tibshirani, R. J. 1990. Generalized Additive Models. London, UK: Chapman and Hall.
See Also
Examples
ss <- with(Sdatasets::ethanol, smooth.spline(NOx ~ E))
predict(ss, pretty(range(Sdatasets::ethanol$E), 20))