Contents
This sample demonstrates the use of the TIBCO StreamBase® Regular Expression File Reader for Apache Hadoop Distributed File System (HDFS) in a StreamBase application that processes a text log file, storing the information extracted from the log in a query table. The log file is typical of a server log: it contains information about user logins, as well as other, spurious information. The format of the log file means it is not particularly suited to direct use in an application. It does not, for example, consist of CSV records that are ready to be turned into tuples; it must be parsed first and useful information extracted. Because the log file is text- and line-oriented, it is well suited to parsing using regular expressions.
Keep in mind that it is difficult to observe all log reading activities. The input adapter begins reading the input file and outputting tuples as soon as the application starts, before StreamBase Studio or external dequeuers have time to connect to the output of the reader. However, the application itself is fully functional, and all tuples read from the input file will be present in the query table.
You must open the sample application, RegexReader.sbapp
and select the Parameters
tab and edit the value to
represent your current HDFS setup and where you would like to store the sample data.
The file used in the RegexReader.sbapp
sample,
samplelog.txt
needs to be placed on your HDFS file
system in the location you specified in the Parameters
tab before this sample will be able to run.
-
In the Package Explorer, double-click to open the
RegexReader.sbapp
application. Make sure the application is the currently active tab in the EventFlow Editor. -
Click the Run button. This opens the SB Test/Debug perspective and starts the application.
-
Select the Manual Input tab.
-
Enter
joe
,fred
,bob
, ormax
for Username, and press . -
The Application Output pane shows the time and IP address for the last queried user, according to the log.
-
When done, press F9 or click the Stop Running Application button.
This section describes how to run the sample in UNIX terminal windows or Windows command prompt windows. On Windows, be sure to use the StreamBase Command Prompt from the Start menu as described in the Test/Debug Guide, not the default command prompt.
-
Open several terminal windows on UNIX, or several StreamBase Command Prompts on Windows. In each window, navigate to the directory where the sample is installed, or to your workspace copy of the sample, as described above.
-
Run
sbd RegexReader.sbapp
. -
In another window, run
sbc dequeue Results
. -
In another window, run
sbc enqueue Queries
. -
In the enqueue window, type
joe
,fred
,bob
, ormax
, and press . The result of your query should appear in the dequeue window. -
Type the following command to terminate the server and dequeuer:
sbadmin shutdown
In StreamBase Studio, import this sample with the following steps:
-
From the top menu, click
→ . -
Select this sample from the Embedded Adapters list.
-
Click OK.
StreamBase Studio creates a single project containing the sample files.
When you load the sample into StreamBase Studio, Studio copies the sample project's files to your Studio workspace, which is normally part of your home directory, with full access rights.
Important
Load this sample in StreamBase Studio, and thereafter use the Studio workspace copy of the sample to run and test it, even when running from the command prompt.
Using the workspace copy of the sample avoids the permission problems that can occur when trying to work with the initially installed location of the sample. The default workspace location for this sample is:
studio-workspace
/sample_adapter_embedded_hdfsregexreader
See Default Installation
Directories for the location of studio-workspace
on your system.
In the default TIBCO StreamBase installation, this sample's files are initially installed in:
streambase-install-dir
/sample/adapter/embedded/hdfsregexreader
See Default Installation
Directories for the location of streambase-install-dir
on your system. This location
may require administrator privileges for write access, depending on your platform.