Data Directory Sample

About This Sample

This sample describes how to use the dataAreaPath configuration property in a HOCON configuration file, and how to make use of the Operator.getDataDirectory() method call in the StreamBase Java Client API.

This sample's src/main/configurations folder contains the sbengine.conf file, whose purpose is to define a custom location for the data directory used by all StreamBase fragments that run from the same StreamBase engine. See the StreamBase Configuration Guide for more information regarding how to use configuration files.

The default value for the dataAreaPath property in the sbengine.conf file is a folder named engine-data-area. This folder is placed in the node directory for the node that runs this EventFlow fragment. When nodes are installed and run in StreamBase Studio, node directories are placed in the .nodes folder of the current Studio workspace. The default node directory name is nodename.clustername, where the node name is constructed from a pattern defined in StreamBase Preferences>Cluster or in a Run Configuration.

The default cluster name is your system login name, which is also changeable in Studio's Preferences. The default data directory is further placed in a folder named fragments, and then in a folder named for the fully qualified name of the current EventFlow module, with periods replaced by underlines. Without specifying a dataAreaPath name-value pair in a configuration file, this sample would write its data files into the .nodes/sample_javaoperator_datadir.sbuser/fragments/com_tibco_sb_sample_javaoperator_datadir_datadir0/engine-data-area/ folder of the current Studio workspace.

The datadir.sbapp EventFlow module uses two custom Java operators:

  • DataDirectoryFileWriter writes files as specified on the module's FileIn input stream into the data directory specified in the sbengine.conf file.

  • DataDirectoryObserver reads and lists the files that exist in the specified data directory.

Specifying Data Directory Location

You can specify the dataAreaPath name-value pair with either a relative or absolute path. Relative paths are relative to the default location of engine-data-area as described above. If your configuration file specifies:

dataAreaPath = "myDataDir" 

then your files are stored in the same long path shown above, with myDataDir replacing the name engine-data-area.

If you specify: dataAreaPath = "../myDataDir" then the myDataDir folder is placed as a subfolder under fragments.

If you specify: dataAreaPath = "../../myDataDir" then myDataDir is placed at the root of your node directory, and so on.

This sample's configuration file shows:

dataAreaPath = "../../../../sample_javaoperator-datadir/myDataDir"

which places myDataDir at the root of this Studio project if it was loaded with the Import Samples and Community Content dialog. You may need to use Right-click>Refresh Project on this sample's project folder to see the newly created folder.

Use the following variation if you copied this sample's folder to another location or if you are running outside of Studio:

dataAreaPath = "../../../../javaoperator-datadir/myDataDir"

If you run this sample as part of a StreamBase application with the epadmin command on the command line, you specify a node directory name and location with the nodedirectory= option. If unspecified, the current directory of the epadmin command is used. If you specify a nodedirectory option of ~/tmp/nodedirs for a node, you start with nodename A.sbuser, and the default location for the data directory becomes:

~/tmp/nodedirs/A.sbuser/application/engines/default-engine-for-com.tibco.sb.sample.javaoperator-datadir/engine-data-area/

Finally, you can specify an absolute path such as dataAreaPath = "/tmp/myDataDir" or dataAreaPath = "C:/tmp/myDataDir".

However, any absolute path is likely to be machine-specific or operating system-specific, which limits the ability to run your EventFlow fragment in multiple nodes for High Availability purposes. If your need to use an absolute path, consider using a substitution variable with a default value, so that the data directory path can be specified at deployment time. Substitution variables are described in the StreamBase Configuration Guide.

For example:

dataAreaPath = "${DATADIR:-/tmp/myDataDir}"

Importing This Sample into StreamBase Studio

In StreamBase Studio, import this sample with the following steps:

  • From the top-level menu, select File>Import Samples and Community Content.

  • Enter data dir to narrow the list of options.

  • Select Specifying a custom data directory from the Extending StreamBase category.

  • Click OK.

StreamBase Studio creates a project for the sample.

Running This Sample in StreamBase Studio

To run this sample:

  1. In the Project Explorer view, open the sample you just loaded.

    If you see red marks on a project folder, wait a moment for the project to load its features.

    If the red marks do not resolve themselves after a minute, select the project, right-click, and select Maven>Update Project from the context menu.

  2. Open the src/main/eventflow/packageName folder.

  3. Open the datadir.sbapp file and click the Run button. This opens the SB Test/Debug perspective and starts the module.

  4. Select the Manual Input view. Select FileIn from the Input Stream list. Use FileIn to inject tuples that have two string fields, fileName and fileContent.

    The DataDirectoryFileWriter operator writes to the specified data directory a file with the specified name and contents, first prepending DDFWOperator_ to the file name you specify. The operator outputs a tuple containing the number of files in the data directory and the most recent write in the data directory.

  5. Input a few file names, clicking Send Data after each one.

  6. Select the Trigger input stream to inject a no-fields tuple to force the DataDirectoryObserver operator to read the data directory.

    This operator then emits the number of files in the data directory as well as a list of tuples. Each tuple contains a filename and either the entire first line or the first 50 characters of the corresponding file, whichever is shorter.

  7. When done, press F9 or click the Terminate EventFlow Fragment button.

Sample Location

When you load the sample into StreamBase Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.

Important

Load this sample in StreamBase Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.

Using the workspace copy of the sample avoids permission problems. The default workspace location for this sample is:

studio-workspace/sample_javaoperator-datadir

See Default Installation Directories for the default location of studio-workspace on your system.