Contents
This sample demonstrates the use of the TIBCO StreamBase® CSV File Reader Adapter for Apache Hadoop Distributed File System (HDFS).
The provided StreamBase module reads store sales data from CSV input files and computes total sales for storeID-productID
combinations. The HDFS CSV File Reader adapter instance named ReadFile1
reads from file1.csv
, which contains sales data for storeA, while ReadFile2
reads from file2.csv
containing sales data for storeX.
Both files contain records with missing productID
values. ReadFile1
is configured to process such records, inserting null values for the unspecified productID
fields. By contrast, ReadFile2
is configured to discard those records.
When you run the sample, look for output that shows the current total sales for each row read. Since ReadFile2
is configured to discard incomplete records, no output is emitted for storeX with null productID
values.
The following files must be placed on your HDFS file system before the sample can run:
-
file1.csv
-
file2.csv
You must also open the csvreader-sample.sbapp
file and select the Parameters tab and edit the HDFS_FILE_PATH1
, HDFS_FILE_PATH2
, and HDFS_USER
values to represent your HDFS setup to be able to access the required. The .sbapp
file is located in → →
In StreamBase Studio, import this sample with the following steps:
-
From the top-level menu, select
→ . -
Type
csv
to narrow the list of options. -
Select hdfscsvreader from the Large Data Storage and Analysis category.
-
Click
.
StreamBase Studio creates a project for this sample.
-
In the Project Explorer, open the sample you just loaded.
-
Open the
src/main/eventflow
folder. -
Open the package folder (most samples contain a single package folder. Open the top-level package folder if your sample contains more than one folder).
-
Open the named application file and click the Run button. This opens the SB Test/Debug perspective and starts the application.
If you see red marks, wait a moment for the project in Studio to load its features.
If red marks do not resolve themselves in a moment, select the project and right-click
→ from the context menu. -
The two CSV reader adapters start reading from their input files and sending tuples downstream. Look for total sales computed values in the Output Streams view, and watch for line rejection warnings from
ReadFile2
in the Console view. -
When done, press F9 or click the Stop Running Application button.
When you load the sample into StreamBase Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.
Important
Load this sample in StreamBase Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.
Using the workspace copy of the sample avoids permission problems. The default workspace location for this sample is:
studio-workspace
/sample_adapter_embedded_hdfscsvreader
See Default Installation Directories for the default location of studio-workspace
on your system.