HDFS CSV File Reader Input Adapter Sample

About This Sample

This sample demonstrates the use of the Spotfire Streaming CSV File Reader Adapter for Apache Hadoop Distributed File System (HDFS).

The provided StreamBase module reads store sales data from CSV input files and computes total sales for storeID-productID combinations. The HDFS CSV File Reader adapter instance named ReadFile1 reads from file1.csv, which contains sales data for storeA, while ReadFile2 reads from file2.csv containing sales data for storeX.

Both files contain records with missing productID values. ReadFile1 is configured to process such records, inserting null values for the unspecified productID fields. By contrast, ReadFile2 is configured to discard those records.

When you run the sample, look for output that shows the current total sales for each row read. Since ReadFile2 is configured to discard incomplete records, no output is emitted for storeX with null productID values.

Initial Setup

The following files must be placed on your HDFS file system before the sample can run:

  • file1.csv

  • file2.csv

You must also open the csvreader-sample.sbapp file in the src/main/eventflow/packageName folder. Select the Parameters tab and edit the HDFS_FILE_PATH1, HDFS_FILE_PATH2, and HDFS_USER values to represent your HDFS setup to be able to access the required.

Importing This Sample into StreamBase Studio

In StreamBase Studio, import this sample with the following steps:

  • From the top-level menu, select File>Import Samples and Community Content.

  • Enter csv to narrow the list of options.

  • Select HDFS CSV file input adapter from the Large Data Storage and Analysis category.

  • Click Import Now.

StreamBase Studio creates a project for this sample.

Running This Sample in StreamBase Studio

  1. In the Project Explorer view, open the sample you just loaded.

    If you see red marks on a project folder, wait a moment for the project to load its features.

    If the red marks do not resolve themselves after a minute, select the project, right-click, and select Maven>Update Project from the context menu.

  2. Open the src/main/eventflow/packageName folder.

  3. Open the csvreader-sample.sbapp file and click the Run button. This opens the SB Test/Debug perspective and starts the module.

  4. The two CSV reader adapters start reading from their input files and sending tuples downstream. Look for total sales computed values in the Output Streams view, and watch for line rejection warnings from ReadFile2 in the Console view.

  5. When done, press F9 or click the Terminate EventFlow Fragment button.

Sample Location

When you load the sample into StreamBase® Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.

Important

Load this sample in StreamBase® Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.

Using the workspace copy of the sample avoids permission problems. The default workspace location for this sample is:

studio-workspace/sample_adapter_embedded_hdfscsvreader

See Default Installation Directories for the default location of studio-workspace on your system.