Dealing with Invalid XML Characters

XML specification defines certain characters as invalid in an XML document. Even though all XML data is Unicode, some characters are illegal.

The legal ranges are expressed by this condition:

(c == 0x9) || (c == 0xA) || (c == 0xD) || ((c >= 0x20) && (c <= 0xD7FF)) || ((c >= 0xE000) && (c <= 0xFFFD)) || ((c >= 0x10000) && (c <= 0x10FFFF))

where c is a Unicode code point.

It is important to be aware that BusinessWorks 6.x adheres to this standard and violations cause various errors.

  • Parse Copybook Data contains the Check String Values for Invalid XML Characters setting. When, it is selected, the activity applies a validity check to the text data retrieved from the binary input. If the check fails, the activity produces ParseCopybookDataException fault with msgCode TIBCO-BW-PALETTE-DATACONVERSION-500044.
    Note: If Ignore Invalid Items is selected, an invalid XML character will not cause an error. Instead, the text item containing the character will be excluded from the output.

    In Parse Copybook Data the character validation is only applicable to items represented by XSD type string. Other items undergo type-specific checks that are more restrictive. These checks are not affected by Check String Values for Invalid XML Characters

  • Render Copybook Data contains the Fail on Invalid XML Characters setting. The setting only applies when Render Data as is set to String. If an offending character is detected, RenderActivityData is thrown with msgCode TIBCO-BW-PALETTE-DATACONVERSION-500044.
Warning: De-selecting Check String Values for Invalid XML Characters or Fail on Invalid XML Characters might result in a small performance gain in scenarios that involve a large amount of data. However, de-selecting this setting does not eliminate errors. The errors likely happen later in the application and are more difficult to track. Only de-select these applications if it is guaranteed not to produce the invalid characters.
Note: If your application might produce the invalid characters, you can select XSD Type base64Binary for the copybook items at risk. However, you cannot use string XPath functions.
Note: in Parse Copybook Data, you can reduce the errors by selecting Trim Whitespaces from String Values. The trimming is applied before the character validation check and removes some of the undesirable characters at the beginning and the end of the values.