abbreviate
Generate Abbreviations

Description

Displays a character vector of abbreviated character strings.

Usage

abbreviate(names.arg, minlength = 4L, use.classes = TRUE, dot = FALSE,
    strict = FALSE, method = c("left.kept", "both.sides"), named = TRUE)

Arguments

names.arg a vector of character strings whose elements are to be abbreviated. Only ASCII character strings are handled properly.
minlength a number specifying the minimum-length abbreviation produced (not including the trailing dot that is added if dot=TRUE). Abbreviations are not guaranteed to be of length minlength The algorithm increases minlength until it successfully produces unique abbreviations when strict is FALSE.
use.classes a logical flag. If TRUE (the default), some special character classes are used to keep what are thought to be more meaningful characters in the abbreviation. See the discussion of the algorithm in the DETAILS section. To see the effect, try the abbreviation of state.name, as demonstrated in the example below, but with use.classes=FALSE.
dot a logical flag. If TRUE, specifies that each abbreviation should be terminated with ".". The default is FALSE.
strict a logical value. If TRUE, specifies that the length of abbreviations are no more than minlength, even if this setting produces duplicated abbreviations. The default is FALSE, specifying that the length of abbreviations can be increased until all abbreviations are unique.
method a character string specifying the method to abbreviate. It is used only when strict is FALSE.

The default "left.kept" method always truncates characters from the right. The "both.sides" method starts by truncating characters from the right, but if this does not produce unique abbreviations, it truncates the duplicates from the left instead.

named a logical value. If FALSE, the return value will not have a names attribute.

Details

The abbreviations are not dependent on the order of the names argument, except when the algorithm produces, and has to resolve, duplicate abbreviations.
The Basic Abbreviation Algorithm
The abbreviation algorithm does not simply truncate. It has a threshold, according to which it will drop, in order:
  1. non-printing characters and white space.
  2. lower case vowels.
  3. lower case consonants and punctuation.
  4. upper case letters and special characters.
If use.classes is FALSE, the algorithm distinguishes between only white space and other characters. Each string is divided into words, separated by white space. The first letter in each word is always kept.
For a given value of the threshold, eligible letters are dropped from the end of each word, until the desired minimum length is reached. If it is not reached, the threshold is raised and the process is repeated.
This algorithm still might not produce unique abbreviations. If it does not, then minlength is increased, and the algorithm is reapplied, but only to those names not distinguished by the previous round.
The end result can be that some of the abbreviations are longer than the requested length, but as few of these as possible, given the algorithm. (See the third example below.)
The method assumes you want identical names to produce identical abbreviations. The result of this tends to be abbreviations not quite like anything you've ever seen before, but usually fairly intuitive when the input names are English text.
Value
returns a character vector containing the abbreviations. Unless the named argument is FALSE, the vector will have a "names" attribute containing the original names argument. This attribute can make subscripting the result convenient. (See the second example.)
Differences between TIBCO Enterprise Runtime for R and Open-source R
In open-source R, the "use.classes" argument is silently ignored and treated as TRUE. In some cases, this causes TIBCO Enterprise Runtime for R and open-source R to produce different results:
Example
abbreviate(c('a foobr','a foobar')) # R: "a foobr" "afoobar" # TERR: "afoobr" "afoobar"
See Also
make.names, nchar, paste, substring, table
Examples
abbreviate(Sdatasets::state.name[1:10])
##  Alabama Alaska Arizona Arkansas California Colorado
##  "Albm"  "Alsk" "Arzn"  "Arkn"   "Clfr"     "Clrd"
## Connecticut Delaware Florida  Georgia
## "Cnnc"      "Dlwr"   "Flrd"   "Gerg"
abbreviate(Sdatasets::state.name, 2)["New Jersey"]
## New Jersey
## "NJ"
ab2 <- abbreviate(Sdatasets::state.name, 2)
table(nchar(ab2))
##  2  3 4
## 32 15 3
ab2[nchar(ab2)==4]
## Massachusetts Mississippi Missouri
## "Mssc"        "Msss"      "Mssr"

abbreviate(Sdatasets::state.name[1:4], 2) ## Alabama Alaska Arizona Arkansas ## "Alb" "Als" "Arz" "Ark"

# abbreviations may not be unique. abbreviate(Sdatasets::state.name[1:4], 2, strict=TRUE) ## Alabama Alaska Arizona Arkansas ## "Al" "Al" "Ar" "Ar" abbreviate(Sdatasets::state.name[1:4], 2, method="both.sides") ## Alabama Alaska Arizona Arkansas ## "Al" "Aa" "Ar" "As"

# compare to the first example. abbreviate(Sdatasets::state.name[1:10], use.classes=FALSE) ## Alabama Alaska Arizona Arkansas California Colorado ## "Alab" "Alas" "Ariz" "Arka" "Cali" "Colo" ## Connecticut Delaware Florida Georgia ## "Conn" "Dela" "Flor" "Geor"

# example for "dot" usage. abbreviate(Sdatasets::state.name[1:4], dot=TRUE) ## Alabama Alaska Arizona Arkansas ## "Albm." "Alsk." "Arzn." "Arkn."

Package base version 6.0.0-69
Package Index