smooth.spline
Fit a Smoothing Spline

Description

Fits a cubic B-spline smooth to the input data.

Usage

smooth.spline(x, y = NULL, w = NULL, df, spar = NULL, cv = FALSE, 
    all.knots = FALSE, nknots = NULL, keep.data = TRUE, df.offset = 0, 
    penalty = 1, control.spar = list(), tol = 1e-06 * IQR(x)) 

Arguments

x the values of the predictor variable. There should be at least 4 distinct x values.

x and y can be supplied in a variety of different forms, along the lines of the function plot. For example, it can be a list with components x and y, a two-column matrix, a formula of the form y~x where y and x are the names of numeric vectors, or simply a single vector, taken to be a time series.

y the response variable, of the same length as x. If NULL, then it is generated from the x argument as described above.
w a vector of weights for weighted smoothing, of the same length as x and y. If the measurements at different values of x have different variances, w should be inversely proportional to the variances. The default is that all weights are equal.
df The desired degrees of freedom of the smoother. The higher the number, the more wiggly the fitted curve and the more closely it follows the data. It should lie between 1 and the number of distinct points in x.
spar Do not use.
cv a logical value. If TRUE, indicates that the ordinary cross validation score should be computed. If FALSE (the default), indicates that the generalized cross validation score should be computed.
all.knots a logical value. If TRUE, a knot is given at each distinct value in x. If FALSE (the default), a suitable fine grid of knots is chosen, usually fewer in number than the number of unique values of x.
df.offset allows an offset to be added to the df term used in the calculation of the GCV criterion df=tr(S) + df.offset.
penalty allows the df quantity used in GCV to be charged a cost = penalty per degree of freedom.
nknots a numeric vector, with at least with one element, is used as knots when all.knots is FALSE.
keep.data a logical value. If TRUE (the default), it keeps the input x, y, w in the output data.
control.spar Do not use.
tol Do not use.

Details

The two arguments df.offset and penalty are experimental and typically should not be used. If used, the GCV criterion is RSS/(n - (penalty*(trace(S)-1) + df.offset +1)).
A cubic B-spline is fit with care taken to ensure that the algorithm runs linear in the number of data points. For small data vectors (n<50), a knot is placed at every distinct data point, and the regression is fit by penalized least squares. For larger data sets, the number of knots is chosen judiciously to keep the computation time manageable (if all.knots=F). The penalty spar can be chosen automatically by cross-validation (if spar=0), can be supplied explicitly, or supplied implicitly via the more intuitive df number.
Value
returns an object of class smooth.spline. The returned object consists of the fitted smoothing spline evaluated at the supplied data, some fitting criteria and constants, and a structure that contains the essential information for computing the spline and its derivatives for any values of x. The components of the returned list are as follows:
x ordered distinct x values.
y smoothing spline fits corresponding to x.
w weights used in the fit. This has the same length as x, and in the case of ties, consists of the accumulated weights at each unique value of x.
data a list with component input x, y, z, visible only when keep.data is TRUE.
yin y-values used at the unique x values (the weighted averages of input y).
lev the leverage values, which are the diagonal elements of the smoother matrix S.
cv.crit the cross validation score (either GCV or CV).
pen.crit the penalized criterion.
crit the criterion of smoothing spline fits.
df the degrees of freedom of the fit estimated by the sum of lev. If df was supplied as the smoothing parameter, then the prescribed and resultant values of df should match within 0.1 percent of the supplied df.
spar the smoothing parameter used in the fit. (This is useful if df was used to specify the amount of smoothing.)
fit a list containing the details of the fits (knot locations, coefficients, and so on) to be used by predict.smooth.spline.
call the call that produced the fit.
lambda the smoothing parameter; the "low" part of control.spar after iteration.
iparms a numeric vector with the names c("icrit", "ispar", "iter"), "iparms" part of fit.
Differences between TIBCO Enterprise Runtime for R and Open-source R
This function is based on the S-PLUS implementation and has not been completely converted to the R parameterization. The spar parameters do not match: They are on the same scale, but differ by a constant. Several arguments used in the R implementation are included here but not used. This implementation might choose a slightly rougher spline fit than the implementation in open-source R.
References
Chambers, J. M. and Hastie, T .J. (Eds.) 1992. Statistical Models in S. Pacific Grove, CA.: Wadsworth & Brooks/Cole.
Green, P. J. and Silverman, B. W. 1994. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. London, UK: Chapman and Hall.
Hastie, T. J. and Tibshirani, R. J. 1990. Generalized Additive Models. London, UK: Chapman and Hall.
See Also
predict.smooth.spline, print.smooth.spline.
Examples
ss <- with(Sdatasets::ethanol, smooth.spline(NOx ~ E))
predict(ss, pretty(range(Sdatasets::ethanol$E), 20))
Package stats version 6.0.0-69
Package Index