Text Free Import Options
Click the V7 Free Import button in the Advanced delimited importing options dialog box to display the Text Free Import Options dialog box.
Option | Description |
---|---|
Imported file | The name of the selected text file to import is displayed here. |
File size | Specify the number of variables accurately. |
No. of vars | Specify the number of variables in the ASCII (text) file that is being imported. |
No. of cases | Specify the number of cases in the ASCII (text) file that is being imported. If you are uncertain about the exact number of cases, you can overestimate. Statistica detects the actual length of the file during import. Note that each line in the ASCII source file can be up to 4,000 characters in length. This limit only applies to the individual line length and not the total length of a case of data: each imported case can be represented by many lines of data in the source file. |
Import case names | Select this check box if you want case names to be obtained from the first field of each record in the ASCII file. If this variable contains more than 20 characters, only the first 20 are used as the case name. If the field contains numeric values instead of text, the case names are created as text images of these values. |
Start import at row | Specify the row of the text file to begin the import. |
Format statement (to identify types of values) | This option explicitly defines for Statistica the exact contents of the input ASCII file. (This is an important distinction: the format statement is a set of instructions for interpreting the structure of the input file, not a definition of the Statistica variables to be created.) The format statement is entered as a list of formats in the form nX where n is an integer multiplier indicating the number of times the format is to be repeated (no multiplier = 1) and X is the column type (for example, 40F means 40 fields containing numeric (here float) values). For detailed description, see following Column Type Specifiers. |
Separators | You can define the characters used in the input file as delimiters. (The final list of separators to be used is the combination of the set of selected Basic and any Additional separators.) |
Basic | In this drop-down list, select the type of delimiter used in the input file from four predefined sets of separators (CR stands for carriage return, LF stands for line feed, and FF stands for form feed). |
undefined | This setting means that Statistica does not use a predefined set. See Additional button. |
blank chars | This set includes: <space>, <tab>, < FF>, <LF>, and <CR/LF>. |
standard set | This set includes: comma (,), semicolon (;), <space>, <tab>, and <CR/LF>. |
non-numeric | This set includes all characters except: 0-9, period (.), minus (-), and plus (+). |
Additional | Displays the Text Free Separators dialog box, from which you can select the delimiters. |
Treat multiple separators as MD | When you select this check box, Statistica interprets each pair of adjacent separators as an occurrence of missing data (an absent value) and will place the default missing data value (-999999998) in the position between the adjacent separators. If this option is cleared, then multiple separator characters are treated as one separator, and missing data must be explicitly coded into the ASCII file as a unique value (for instance -999999998). This option is particularly useful if, for instance, individual values in the data file are separated by spaces, with a variable number of spaces between values. If spaces are used as separators, then each pair of spaces would be seen as an occurrence of missing data, and the resulting file would be full of missing values. |
Use quotation marks as text boundaries | Select this check box if double (") or single (') quotation marks are used as text boundaries and the specified separator characters appear within the values of text variables in the input file (for instance, "John Jones, Ph.D.," uses the comma both as part of the text and as a separator after it). In this case, when Statistica imports the data, the quotation marks will be recognized only as boundaries around the text values, keeping the text values and the embedded separator character together (the quotation marks are not included as part of the imported text values).
Note: If a text string is to contain quotation marks as part of the string itself (as in the titles of books such as "Moby Dick"), then two methods can be used to import them:
|
Trim leading spaces | Select this check box if the ASCII (text) file you are about to import contains leading blank spaces in some rows, such as shown in the example below where the rows starting with 9 and 8 are offset by a leading blank.
If the Trim leading spaces check box is set, then leading blank spaces are not (erroneously) interpreted as field separators, and in this example, the 4 (rows) by 3 (columns) data matrix is properly imported. |
File contents | Displays the contents of the text file to import. |
OK | Accepts the options selected and imports the text file. |
Cancel | Closes the dialog box without importing the text file. |
Column Type Specifiers
- A - Text (Alphanumeric)
- F - Float (also R - Real)
- D - Double Float (also DR - Double Real)
- I - Integer (also NI - Normal Integer) - values ± 32767
- S - Short Integer (also SI) - values ± 127
- LI - Long Integer (also J) - values ± 2,140,000,000
- L - Logical
The format specified as logical is expected to contain text designators of true and false. The following three conventions are recognized by Statistica when the data are imported:
- TRUE or FALSE if the field length is 5 or more.
- YES or NO if it is 3 or 4 characters long.
- Y, N, or T, F respectively, if it is 1 character long.
Values of TRUE or YES are imported as 1, FALSE or NO is imported as 0.
Text fields (type A) must also include a length value from 1 to 255 immediately following the letter A, indicating the maximum possible length of text in this field.
The slash character (/) can be used to indicate that the remainder of the current line in the input file should be ignored (that is, skip to the next line in the input file). If the multiplier precedes a list of formats enclosed in parentheses, then the list of formats within the parentheses is repeated the number of times specified by the multiplier.
Example: 2(2L a5) is equivalent to L L A5 L L A5 and specifies two Logical variables followed by a Text variable that can hold up to five characters, then two more Logical variables and a final Text variable (again up to five characters).