Spreadsheet

The Spreadsheet object provides the interface for a Statistica Spreadsheet.

This object offers basic data operations, advanced data management functionality (e.g., stacking/unstacking), and exporting (importing is available at the Application level).

The spreadsheet consists of five separate areas that are accessed separately. The first is the data area, where the spreadsheet data are stored. There are the variable headers, located immediately on top of the data area, and lists the variable names. The case headers, located immediately to the left of the data area, contain the case names and formatting. The top left of the spreadsheet contains the Info box, and is used to contain summary information of what the spreadsheet contains. The top of the spreadsheet, immediately above the variable headers, is the header area, and can contain multiple lines of header information about the spreadsheet.

The spreadsheet data area consists of variables (columns) and cases (rows) of data. Note that all references to variables and cases within a spreadsheet are always indexed starting with 1 (e.g., cell (1,1) is the top-left cell in the spreadsheet).

Many spreadsheet calls specify a cell; these include calls to get or set a value, and to check if a cell is missing data. Properties such as these include Value, Text, and TextEx.

There are also calls that return or expect a range of cells; these are represented by a Range object, which references a subset of contiguous cells.

Data Operations

The spreadsheet interface offers numerous data management features, including (but not limited to):

  • Case filtering
  • Case selection
  • Database querying
  • Duplicate data removal
  • Exporting (e.g., SAS, text, Excel)
  • Merging (by case or variable)
  • Missing data replacement
  • Recoding
  • Sorting
  • Subsetting
  • Stacking/unstacking
  • Transformation
  • Variable formulas

Statistica OLE DB Provider

An OLE DB provider for Statistica Spreadsheets is also available, which can be used to query data from a spreadsheet via any OLE DB compatible application.

The following SQL features are available for the Statistica OLE DB provider:

SELECT/FROM: Used for querying a spreadsheet's columns. For example: SELECT Bakers.Name, Bakeries.Name FROM Bakers, Bakeries. Bakers and Bakeries would be the names of the Statistica spreadsheet files, and Name would be the columns in these files to query. Columns can be specified by name and by variable reference (e.g., "Bakers".V1 will return column 1 from the "Bakers" table).

Aliasing is supported for returned columns. For example: SELECT Bakers.Name AS BAKER_NAME.

The table qualifiers can be omitted if not specified (if only one spreadsheet is referenced in the query statement).

Blanks can be used in names by specifying standard quoting.

The following row retrieval/aggregation functions are supported: DISTINCT, AVG, MAX, MIN, SUM, COUNT, and TOP.

WHERE: Used for specifying criteria for which rows to return. The following conditions are available for WHERE clauses: =, <, >, <=, >=, IN, LIKE, IS NULL, BETWEEN, NOT IN, NOT LIKE, IS NOT NULL, NOT BETWEEN.

JOIN: Used for joining multiple spreadsheets. ID columns are specified for each table, and matching values between these columns determines how the multi-table data are merged. Only equal joins are supported. For example: SELECT TABLE1."Weight (1998)" FROM "Cat Clinic" TABLE1 JOIN "Cat Clinic Measurements (2000)" TABLE2 ON (TABLE1."Patient ID" = TABLE2."ID").

ORDER BY: Used for sorting retrieved rows. The sorted column can be referenced by either name or column number. The sort order can be toggled with the ASC and DESC keywords.

GROUP BY: Used for grouping aggregated results.

CASESTATE: Used in WHERE clauses to filter on casestates within the dataset.

The following syntax is allowed for CASESTATE in a WHERE clause:
CASESTATE_casestatevalue
     CASESTATE(casestatevalue)
     CASESTATE = casestatevalue

Values that can be used for CASESTATE are: selected, excluded, label, marked, hidden. These values are not case sensitive.

The following SQL statements can be used (DataSet will be the name of your dataset):
SELECT * FROM DataSet WHERE CASESTATE_selected
     SELECT * FROM DataSet WHERE CASESTATE_excluded
     SELECT * FROM DataSet WHERE CASESTATE_marked
     SELECT * FROM DataSet WHERE CASESTATE_hidden
     SELECT * FROM DataSet WHERE CASESTATE_label
     SELECT * FROM DataSet WHERE CASESTATE(selected)
     SELECT * FROM DataSet WHERE CASESTATE(excluded)
     SELECT * FROM DataSet WHERE CASESTATE(marked)
     SELECT * FROM DataSet WHERE CASESTATE(hidden)
     SELECT * FROM DataSet WHERE CASESTATE(label)
     SELECT * FROM DataSet WHERE CASESTATE = selected
     SELECT * FROM DataSet WHERE CASESTATE = excluded
     SELECT * FROM DataSet WHERE CASESTATE = marked
     SELECT * FROM DataSet WHERE CASESTATE = hidden
     SELECT * FROM DataSet WHERE CASESTATE = label
     SELECT * FROM DataSet WHERE CASESTATE = label OR casestate = marked
     SELECT * FROM DataSet WHERE CASESTATE = hidden OR casestate = excluded