Both StreamBase Server and StreamBase Studio are fully capable of processing and displaying Unicode character sets, but neither is configured to do so by default. To see Unicode characters correctly displayed in the Output Streams view in Studio, you must configure both Server and Studio independently.
Follow the instructions on this page to enable Unicode support for Server, Studio, and for StreamBase clients written with the StreamBase Client Library.
Java programmers writing any StreamBase extension that writes to a file (including a Java operator or adapter) must remember
to set the Java system property
file.encoding=UTF8 for complete Unicode support for such files. This is independent of the settings described below.
Configure StreamBase Server to process Unicode characters in streams by setting the Java system property
UTF-8 for the JVM that runs the server. You can make this change in the server configuration file, by adding a
<sysproperty> child element of the
<java-vm> element, as shown in the following example:
<java-vm> <sysproperty name="streambase.tuple-charset" value="UTF-8" /> </java-vm>
(Notice that the separator between
tuple is a period, while the separator between
charset is a hyphen.)
To enforce this change while running applications in Studio, the server configuration file must be named
sbd.sbconf, and must be placed at the root of the Studio project.
Configure StreamBase Studio to process and display Unicode characters in streams by setting the Java system property
UTF-8 for the JVM that runs Studio. In this case, you must make the change using the environment variable
Remember that the
STREAMBASE_STUDIO_VMARGS variable overrides and replaces the default
vmargs passed to Studio. If you use the variable for any purpose, you MUST include memory-setting values like the following:
To use this environment variable correctly, set values for the default arguments
–Xmx, then add your new setting at the end.
See Java VM Memory Settings for a discussion of alternative settings.
Configure this environment variable globally for your system, or temporarily in the UNIX terminal or StreamBase Command Prompt environment from which you run the sbstudio command. Use a command like the following for Windows. This example is shown on two lines for publication clarity, but should be typed as one long line:
set STREAMBASE_STUDIO_VMARGS=-Xms512M -Xmx1024M -Dstreambase.tuple-charset=UTF-8
Use a line like the following example for the Bash shell in Linux:
export STREAMBASE_STUDIO_VMARGS=-Xms512M -Xmx1024M \ -Dstreambase.tuple-charset=UTF-8
For complete Unicode support, you must configure both ends of any communication with StreamBase Server. This applies to StreamBase client applications written with any of the StreamBase Client Libraries.
For clients written with the StreamBase Java API, you must configure the JVM that runs your client code. You can do this in any of three ways:
streambase.tuple-charsetsystem property to
System.setProperty()in your client code.
Set the environment variable
UTF-8in the environment that runs your client application.
Start the JVM that runs your client code with the
For clients written with the StreamBase C++, .NET, or Python APIs, you must set the environment variable
STREAMBASE_TUPLE_CHARSET=UTF-8 in the environment that runs your client application.
You can perform expression language operations such as
substr() on Unicode strings. Unicode strings on input streams are canonicalized to UTF-8 NFC (Normalization Form C, as described in
Unicode Normalization Forms).
With Unicode support enabled as described above, some of the expression language functions that deal with strings have different behavior than in the default configuration with Unicode disabled. For example, for some functions, characters are counted as a number of graphemes with Unicode enabled, but as a number of bytes with Unicode disabled. Each affected function is noted on the Expressions page.
EventFlow applications are saved in an XML format that specifies UTF-8 encoding, which ensures that any Unicode strings you use in expressions in operators are preserved. In addition, all other files created and saved by StreamBase Studio, such as server configuration files, are saved as Unicode-compliant files with UTF-8 encoding.
The UTF-8 encoding of files created by StreamBase Studio occurs by default, and is independent of the Server, Studio, or client configuration settings described in the first three sections of this page.