Contents
Both StreamBase Server and StreamBase Studio are fully capable of processing and displaying Unicode character sets, but neither is configured to do so by default. To see Unicode characters correctly displayed in the Application Output view in Studio, you must configure both Server and Studio independently.
Follow the instructions on this page to enable Unicode support for Server, Studio, and for StreamBase clients written with the StreamBase Client Library.
Java programmers writing any StreamBase extension that writes to a file (including a
Java operator or adapter) must remember to set the Java system property file.encoding
=UTF8 for complete Unicode support for such files. This
is independent of the settings described below.
Configure StreamBase Server to process Unicode characters in streams by setting the
Java system property streambase.tuple-charset
to
UTF-8
for the JVM that runs the server. You can make
this change in the server configuration file, by adding a <sysproperty>
child element of the <java-vm>
element, as shown in the following example:
<java-vm> <sysproperty name="streambase.tuple-charset" value="UTF-8" /> </java-vm>
(Notice that the separator between streambase
and
tuple
is a period, while the separator between
tuple
and charset
is a
hyphen.)
To enforce this change while running applications in Studio, the server configuration
file must be named sbd.sbconf
, and must be placed at
the root of the Studio project, as described in How StreamBase Studio Uses Server Configuration Files.
Configure StreamBase Studio to process and display Unicode characters in streams by
setting the Java system property streambase.tuple-charset
to UTF-8
for
the JVM that runs Studio. In this case, you must make the change using the
environment variable STREAMBASE_STUDIO_VMARGS
.
Important
Remember that the STREAMBASE_STUDIO_VMARGS
variable
overrides and replaces the default vmargs
passed to
Studio. If you use the variable for any purpose, you MUST include memory-setting
values like the following:
STREAMBASE_STUDIO_VMARGS=-Xms512M -Xmx1024M
To use this environment variable correctly, set values for the default arguments
–Xms
and –Xmx
, then add your new setting at the end.
You can optionally re-specify the default JVM garbage collection settings as described in Garbage Collection Policy Settings. See Java VM Memory Settings for a discussion of alternative settings.
Configure this environment variable globally for your system, or temporarily in the UNIX terminal or StreamBase Command Prompt environment from which you run the sbstudio command. Use a command like the following for Windows. This example is shown on two lines for publication clarity, but should be typed as one long line:
set STREAMBASE_STUDIO_VMARGS=-Xms512M -Xmx1024M -Dstreambase.tuple-charset=UTF-8
Use a line like the following example for the Bash shell in Linux:
export STREAMBASE_STUDIO_VMARGS=-Xms512M -Xmx1024M \ -Dstreambase.tuple-charset=UTF-8
For complete Unicode support, you must configure both ends of any communication with StreamBase Server. This applies to StreamBase client applications written with any of the StreamBase Client Libraries.
For clients written with the StreamBase Java API, you must configure the JVM that runs your client code. You can do this in any of three ways:
-
Set the
streambase.tuple-charset
system property toUTF-8
withSystem.setProperty()
in your client code. -
Set the environment variable
STREAMBASE_TUPLE_CHARSET
toUTF-8
in the environment that runs your client application. -
Start the JVM that runs your client code with the
-Dstreambase.tuple-charset=UTF-8
option.
For clients written with the StreamBase C++, .NET, or Python APIs, you must set the
environment variable STREAMBASE_TUPLE_CHARSET=UTF-8
in the
environment that runs your client application.
You can perform expression language operations such as substr()
on Unicode strings. Unicode strings on input streams are
canonicalized to UTF-8 NFC (Normalization Form C, as described in Unicode Normalization
Forms).
With Unicode support enabled as described above, some of the expression language functions that deal with strings have different behavior than in the default configuration with Unicode disabled. For example, for some functions, characters are counted as a number of graphemes with Unicode enabled, but as a number of bytes with Unicode disabled. Each affected function is noted on the Expressions page.
EventFlow applications are saved in an XML format that specifies UTF-8 encoding, which ensures that any Unicode strings you use in expressions in operators are preserved. In addition, all other files created and saved by StreamBase Studio, such as server configuration files, are saved as Unicode-compliant files with UTF-8 encoding.
The UTF-8 encoding of files created by StreamBase Studio occurs by default, and is independent of the Server, Studio, or client configuration settings described in the first three sections of this page.