Contents
This sample demonstrates the use of the TIBCO StreamBase® CSV File Reader for Apache Hadoop Distributed File System (HDFS).
The provided StreamBase module reads store sales data from CSV input files and
computes total sales for storeID-productID combinations. The HDFS CSV File Reader
adapter instance named ReadFile1
reads from file1.csv
, which contains sales data for storeA, while ReadFile2
reads from file2.csv
containing sales data for storeX.
Both files contain records with missing productID
values. ReadFile1
is configured to process such records,
inserting null values for the unspecified productID
fields. By contrast, ReadFile2
is configured to discard
those records.
When you run the sample, look for output that shows the current total sales for each
row read. Since ReadFile2
is configured to discard
incomplete records, no output is emitted for storeX with null productID
values.
The two files used in this sample, file1.csv
and
file2.csv
, need to be placed on your HDFS file system
before this sample will be able to run.
You must also open the csvreader-sample.sbapp
and
select the Parameters
tab and edit the HDFS_FILE_PATH1
, HDFS_FILE_PATH2
, and
HDFS_USER
values to represent your HDFS setup to be able
to access the required files.
-
In the Package Explorer view, double-click to open the
csvreader-sample.sbapp
application. Make sure the application is the currently active tab in the EventFlow Editor. -
Click the Run button. This opens the SB Test/Debug perspective and starts the application.
-
The two CSV reader adapters start reading from their input files and sending tuples downstream. Look for total sales computed values in the Application Output view, and watch for line rejection warnings from
ReadFile2
in the Console view. -
When done, press F9 or click the Stop Running Application button.
This section describes how to run the sample in UNIX terminal windows or Windows command prompt windows. On Windows, be sure to use the StreamBase Command Prompt from the Start menu as described in the Test/Debug Guide, not the default command prompt.
-
Open two terminal windows on UNIX, or two StreamBase Command Prompts on Windows. In each window, navigate to your workspace copy of the sample.
-
In window 1, start a dequeuer in advance, using the
-w
option to specify waiting 10 seconds before attempting its dequeue connection:sbc -w 10000 deq
-
In window 2, launch StreamBase Server running
csvreader-sample.sbapp
:sbd csvreader-sample.sbapp
-
In window 1, observe total sales output tuples computed from the two input files. In window 2, look for line rejection warnings from
ReadFile2
. -
To stop the sample, enter Ctrl+C in Window 1 to exit the dequeuer, then enter
sbadmin shutdown
.
In StreamBase Studio, import this sample with the following steps:
-
From the top menu, select
→ . -
In the search field, type
csv file
to narrow the list of samples. -
Select CSV file input adapter from the StreamBase Standard Adapters category.
-
Click OK.
StreamBase Studio creates a single project containing the sample files.
When you load the sample into StreamBase Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.
Important
Load this sample in StreamBase Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.
Using the workspace copy of the sample avoids the permission problems that can occur when trying to work with the initially installed location of the sample. The default workspace location for this sample is:
studio-workspace
/sample_adapter_embedded_hdfscsvreader
See Default Installation
Directories for the location of studio-workspace
on your system.
In the default TIBCO StreamBase installation, this sample's files are initially installed in:
streambase-install-dir
/sample/adapter/embedded/hdfscsvreader
See Default Installation
Directories for the location of streambase-install-dir
on your system. This location
may require administrator privileges for write access, depending on your platform.