t.test
Student's t-test
Description
Performs a one-sample, two-sample, or paired t-tests.
Usage
t.test(x, ...)
## Default S3 method:
t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95,
treatment = NULL, ...)
## S3 method for class 'formula':
t.test(formula, data, subset, na.action, ...)
Arguments
x |
a numeric vector that contains the sample values.
Missing values (NAs) and infinite values (Infs)
are removed before calculation.
|
y |
a numeric vector that contains the sample values.
Missing values (NAs) and infinite values (Infs)
are removed before calculation.
If paired = TRUE, length(x) must equal length(y) and
an observation pair (x[i], y[i]) is removed
if it has at least one NA or Inf value.
|
alternative |
a character string that specifies the alternative hypothesis.
Options are:
"two.sided": the true mean is not equal to mu,
"greater": the true mean is greater than mu,
"less": the true mean is less than mu.
You only need to enter enough of the character string to create a unique match.
Depending on the type of test,
alternative refers to one of the following:
One-sample and paired t-tests:
The true mean of the parent population in relation
to the hypothesized value mu.
Two-sample t-tests:
The difference between the true population mean for x
and that for y, in relation to mu.
|
mu |
a numeric scalar that represents the value of the mean or
the difference of means specified by the null hypothesis.
|
paired |
a logical value.
If TRUE, length(x) must equal length(y) and
the values in x and y are treated
as observation pairs (x[i], y[i]).
|
var.equal |
a logical value.
If TRUE, the function is evaluated under the assumption
that the variances of the parent populations of x and y are equal.
This argument only gets used for two-sample (i.e., unpaired) tests.
|
conf.level |
a numeric value in the range [0, 1]
that specifies the confidence level for the returned confidence interval.
|
treatment |
a vector of any kind with exactly two unique values
and the same length as x.
If supplied, it is used to split x into two samples and y
is not used.
The t-statistic numerator is the difference between the means of the two groups.
This argument is not in R.
|
formula |
a formula of the form v~g that gives
the name of a numeric variable (v) and
the name of a grouping factor (g).
g must have exactly two levels and length equal to that of v.
|
data |
a data frame that contains the variables named
in the formula and subset arguments.
Defaults to the parent frame from which the function was called.
|
subset |
a vector that specifies which subset of the rows of the data should be used.
This can be a logical vector that is replicated to have length
equal to the number of rows of data,
a numeric vector that indicates the row numbers to be included, or
a character vector of the row names that should be included.
By default, all rows are included.
|
na.action |
a function that handles missing values.
See na.action for details.
|
... |
additional arguments.
|
Details
If a value for y is supplied then the value of paired
is used.
If paired == TRUE, a paired t-test is computed.
If paired == FALSE, a two-sample t-test is computed and
the value of var.equal is used.
If var.equal == TRUE then the standard two-sample t-test is
computed.
If var.equal == FALSE then the Welsh modified two-sample t-test is
computed.
One-sample t-test
statistic: t = (mean(x)-mu) / (sqrt(var(x))/sqrt(length(x)))
|
If
x was drawn from a normal population,
t has a t-distribution
with
length(x)-1 degrees of freedom under the null hypothesis.
The null hypothesis in this case is that the mean of the population from which
x is drawn is
mu.
Paired t-test
statistic: t = (mean(d)-mu) / (sqrt(var(d))/sqrt(length(d))) |
where d is the vector of differences x-y
|
Under the null hypothesis,
t follows a t-distribution
with
length(d)-1 degrees of freedom,
assuming normality of the differences
d.
The null hypothesis in this case is that the population mean of the difference
x-y is equal to
mu.
Equal-variance two-sample t-test
statistic: t = (mean(x)-mean(y)-mu) / s1 |
where: |
s1 = sp*sqrt(1/nx+1/ny) |
sp = sqrt(((nx-1)*var(x)+(ny-1)*var(y))/(nx+ny-2)) |
nx = length(x) |
ny = length(y)
|
Assuming that
x and
y come from normal populations with equal variances,
t has a t-distribution with
nx+ny-2 degrees of freedom
under the null hypothesis.
The null hypothesis in this case is that the population mean for
x
minus that for
y is
mu.
Welch modified two-sample t-test
statistic: t = (mean(x)-mean(y)-mu)/s2 |
where: |
s2 = sqrt(var(x)/nx+var(y)/ny) |
nx = length(x) |
ny = length(y)
|
If
x and
y come from normal populations,
the distribution of
t under the null hypothesis
can be approximated by a t-distribution with (non-integral) degrees of freedom:
1/((c^2)/(nx-1)+((1-c)^2)/(ny-1)) |
where c = var(x)/(nx*s2^2)
|
The null hypothesis in this case is that the population mean for
x
minus that for
y is
mu.
The alternative hypothesis in each case indicates
the direction of divergence of the population mean for
x
(or difference of means for
x and
y) from
mu,
that is,
"greater",
"less",
"two.sided".
In all cases, if the distributions are not normal but sample sizes are
large, then t-distributions hold approximately (under certain
regularity conditions). However, large sample sizes are no help if
you use the pooled-variance test and the variances are not equal.
Value
a list of class
"htest" that contains the following components:
statistic |
the t-statistic with names attribute "t".
|
parameter |
the degrees of freedom of the t-distribution associated with statistic with names attribute "df".
|
p.value |
the p-value for the test.
|
conf.int |
the confidence interval (vector of length 2) for the true mean or difference in means.
The confidence level specified by the input argument conf.level
is recorded as its conf.level attribute.
|
estimate |
the sample mean(s) or mean of the differences (vector of length 1 or 2)
that estimate the corresponding population parameters with a names attribute
that describes the estimate, for example "mean of the differences".
|
null.value |
the value of the mean or difference in means specified by the null hypothesis with a names attribute
that describes the null.value, for example "difference in means".
The value for null.value is equal to the value of the input argument mu.
|
alternative |
a character string that returns the alternative hypothesis
("two.sided", "greater", or "less")
as specified in the alternative argument.
|
method |
a character string for the name of the test used in the calculation.
|
data.name |
a character string (vector of length 1) that contains the names of the x and,
if provided, y input vectors.
|
References
Box, G. E. P. (1953),
"Non-normality and Tests on Variances,"
Biometrika,
pp. 318-335.
Hogg, R. V. and Craig, A. T. (1970).
Introduction to Mathematical Statistics, 3rd ed.
Toronto, Canada: Macmillan.
Mood, A. M., Graybill, F. A. and Boes, D. C. (1974).
Introduction to the Theory of Statistics, 3rd ed.
New York: McGraw-Hill.
Snedecor, G. W. and Cochran, W. G. (1980).
Statistical Methods, 7th ed.
Ames, Iowa: Iowa State University Press.
Differences between Spotfire Enterprise Runtime for R and Open-source R
The argument "treatment" is not in open-source R.
See Also
Examples
# Two-sided one-sample t-test.
# Null hypothesis is that the population mean for 'x' is zero.
# Alternative hypothesis states that it is either greater or less than zero.
# Computes a confidence interval for the population mean.
x <- rnorm(12)
t.test(x)
# One-sided paired t-test.
# Null hypothesis is that the population mean "before"
# and the one "after" are the same, or equivalently that
# the mean change ("after" minus "before") is zero.
# Alternative hypothesis is that the mean "after" is less than the one "before",
# or equivalently that the mean change is negative.
# Computes a confidence interval for the mean change.
before <- c(31, 20, 18, 17, 9, 8, 10, 7)
after <- c(18, 17, 14, 11, 10, 7, 5, 6)
t.test(after, before, alternative = "less", paired = TRUE)
# Two-sided two-sample t-test.
# Null hypothesis is that the population means for 'x' and 'y' are the same.
# Alternative hypothesis is that they are not.
# The confidence interval for the difference in true means ('x' minus 'y')
# will have a confidence level of 0.90.
x <- c(7.8, 6.6, 6.5, 7.4, 7.3, 7., 6.4, 7.1, 6.7, 7.6, 6.8)
y <- c(4.5, 5.4, 6.1, 6.1, 5.4, 5., 4.1, 5.5)
t.test(x, y, conf.level = 0.90)
# Two-sided pooled-variance two-sample t-test.
# This assumes that the two populations variances are equal.
# Null hypothesis is that the population mean for 'x' minus that for 'y' is 2.
# Alternative hypothesis is that this difference is not 2.
# Computes a confidence interval for the true difference.
t.test(x, y, mu = 2)
# Formula interface
t.test(Mileage ~ I(Type
subset = (Type != "Sporty"))