Cloud Software Group, Inc. EBX®

Documentation > Administration Guide > Installation & configuration

Navigation modeDocumentation > Administration Guide > Installation & configuration

Performance and tuning

Environment

Memory management

Memory monitoring

Indications of EBX® load activity are provided by monitoring the underlying database, and also by the 'monitoring' logging category.

If the numbers for cleared and built objects remain high for a long time, this is an indication that EBX® is swapping on the application server. In that case, the memory allocated to the application server should be increased.

Garbage collector

Tuning the garbage collector can also benefit overall performance. This tuning should be adapted to the use case and specific Java Runtime Environment used.

CPU

The number of CPUs available for the application server must be defined considering the number of concurrent HTTP requests to be served, the complexity (CPU cost) of the implied tasks, and the background activities, including the Java garbage collection.

Large imports and more generally large transactions involving many creates, updates and deletes will be completed faster if the difference between the server load and the number of available processors allows the indexing to be efficiently run in parallel within the transaction. The persistence log category contains the following entries:

'Setting forced synchronous indexing to false (...)' to indicate that the indexing will be performed concurrently;
'Setting forced synchronous indexing to true (...)' to indicate that the indexing will not be performed concurrently.

The difference mentioned above is assessed every ten seconds and is computed using the methods getSystemLoadAverage() and getAvailableProcessors() in the Java class java.lang.management.OperatingSystemMXBean. Both numbers are written at the end of the log entry above.

Using the native LZ4 library

The LZ4 library is used to store data to and retrieve data from the database. To speed up data access, it is required to perform a ebx-lz4.jar native installation.

See Data compression library for more information.

Scanning on server startup

To speed up the web applications server startup, the JAR files scanner should be configured.

Database

Reorganizing database tables

As with any database, inserting and deleting large volumes of data may lead to fragmented data, which can deteriorate performance over time. To resolve the issue, reorganizing the impacted database tables is necessary. See Monitoring and cleanup of the relational database.

A specificity of EBX® is that creating dataspaces and snapshots adds new entries to tables GRS_DTR and GRS_SHR. When poor performance is experienced, it may be necessary to schedule a reorganization of these tables, for large repositories in which many dataspaces are created and deleted.

Data modeling

Aggregated lists

In a data model, when an element's cardinality constraint maxOccurs is greater than 1 and no osd:table is declared on this element, it is implemented as a Java List. This type of element is called an aggregated list, as opposed to a table.

It is important to consider that there is no specific optimization when accessing aggregated lists, in terms of iterations, user interface display, etc. Besides performance concerns, aggregated lists are limited with regard to many functionalities that are supported by tables. See tables introduction for a list of these features.

Attention

For the reasons stated above, aggregated lists should be used only for small volumes of simple data (one or two dozen records), with no advanced requirements for their identification, lookups, permissions, etc. For larger volumes of data (or more advanced functionalities), it is recommended to use osd:table declarations.

Inherited fields

It is possible in a data model to use inherited fields for an advanced inheritance based on relationships opposed to dataset inheritance.

As inherited fields cannot benefit from index optimizations they should be used with caution or avoided since they behave as computed fields to resolve their values. As a consequence all operations (querying, validation, data comparison in resolved mode, etc.) that will be performed on inherited fields won't be optimized and their performances can be strongly impacted.

Data validation

The internal validation framework will optimize the work required during successive requests to update the validation report of a dataset or a table. The incremental validation process behaves as follows:

The first call to a dataset or table validation report performs a full validation of the dataset or the table.
The next call to the validation report will compute the changes performed since the last validation. The validation report will be updated according to these changes.
Validation reports are stored persistently in the TIBCO EBX® repository. This reduces the amount of memory dedicated to validation reports when datasets have a large amount of validation messages. Also, validation reports are not lost when the application server restarts.
Validation reports can be reset manually in the user interface by an administrator user (this option is available from the validation report section in EBX®). As a consequence, resetting validation reports must be used with caution since associated datasets or tables will be fully revalidated during the next call to their validation reports.
See Adaptation.resetValidationReport for more information.

Certain constraints are systematically re-validated, even if no updates have occurred since the last validation. These are the constraints with unknown dependencies. An element has unknown dependencies if:

It specifies a programmatic constraint in the default unknown dependencies mode.
It declares a computed value, or it declares a dynamic facet that depends on an element that is itself a computed value.
It is an inherited field or it declares a dynamic facet that depends on a node that is itself an inherited field.

Note

It is possible to disable the validation of inherited fields to avoid performance issues when validating large tables. See inherited field for more information.

Consequently, on large tables, it is recommended to:

Avoid constraints with unknown dependencies (or at least to minimize the number of such constraints). For programmatic constraints, the developer is able to specify two alternative modes that drastically reduce incremental validation cost: local dependency mode and explicit dependencies. For more information, see Dependencies and validation.
To use constraints on tables instead of programmatic constraints defined at field level. Indeed, if a table defines constraints at field level, then the validation process will iterate over all the records to check if the value of the associated field complies with the constraint. Using constraints on tables gives the opportunity to execute optimized queries on the whole table. See below for recommendations on how to optimize such queries. However, there is a trade-off: A constraint on tables does not benefit from the incremental validation and will always be validated completely. If in general only a few records are created or deleted between validations, a programmatic constraint might be more performant. Benchmarks or a sampling profiler can help to make the decision.
Avoid the use of the facet pattern since its check is not optimized on large tables. That is, if a field defines this facet then the validation process will iterate over all the records to check if the value of the associated field complies with the specified pattern.

The following properties can be used to minimize the impact on performance when logging a validation report:

Set the ebx.validation.report.logContent property to false to avoid logging the individual validation messages. The validation report summary is still logged, including the message count. You can also avoid logging the validations started by the Java API completely. See ValidationSpec.setResultLogged for more information.
Though less effective than the previous recommendation, setting the ebx.validation.report.maxItemDisplayed property to a lower value than the default 100 reduces the amount of validation messages to log, thereby reducing the related computational work.

Accessing tables

Functionalities

Tables are commonly accessed through EBX® UI, data services and also through the Request and Query APIs. This access involves a unique set of functions, including a dynamic resolution process. This process behaves as follows:

Inheritance: Inheritance in the dataset tree takes into account records and values that are defined in the parent dataset, using a recursive process. Also, in a root dataset, a record can inherit some of its values from the data model default values, defined by the xs:default attribute.
Value computation: A node declared as an osd:function is always computed on the fly when the value is accessed. See ValueFunction.getValue.
Filtering: An XPath predicate, a programmatic filter, or a record-level permission rule requires a selection of records.
Sort: A sort of the resulting records can be performed.

Query on tables

Architecture and design

The EBX® engine manages persistent Lucene indexes to improve operation speeds on tables. This is done by:

Ensuring the index rebuild chain is optimized: Think of the index rebuild process as a chain. The chain is made up of a sequence of table revisions, or links, that comprise the table's history. Each of these links is the result of a snapshot or child dataspace creation. Where possible, an index rebuild skips links in the rebuild chain, reducing it to a maximum number of 5 links. This improves the rebuild duration and index structure through effective reuse of base indexes. If some links are skipped in the rebuild chain, the following message is logged in the persistence.log file: "Index rebuild chain size is" + <fullSize> + ", reducing to 5.
Please note that this process reduces rebuild times for intermediate snapshots; however, accessing skipped snapshots might take slightly longer, as their indexes must be rebuild incrementally from the nearest available point.
Dynamically assessing incremental rebuild viability: EBX® estimates the "distance" between table revisions to determine whether to perform an incremental, or a full index rebuild. A full rebuild is initiated when the difference between the base and target index is too large. This leads to faster rebuilds and more efficient indexing. Accessing a snapshot older than this breaking point can take longer, as a new rebuild chain must be computed from scratch. If the rebuild chain requires a full rebuild, the following message is logged in the persistence.log file: "Breaking revision chain at table revision " + aRevision.getId() + ": too different from the rebuild target."

Note

The above message is not logged when the first revision in the chain is rebuilt from scratch, or if table revisions are too minor to make the complexity of an incremental rebuild worth it.
Batching the processing for index rebuilds: Index rebuild operations group deletion and creation actions to optimize efficiency.
Multi-threading rebuilds: Multi-threading enhances rebuild performance through parallelism. You can control the number of threads used during index rebuilds. See Configuring multi-thread index builds for more details.

Attention

Faster access to tables is ensured if indexes are ready and maintained in the OS memory cache. As mentioned above, it is important for the OS to have enough space allocated.

Performance considerations

The query optimizer favors the use of indexes when computing a request result. If a query cannot take advantage of the indexes, it will be resolved in Java memory, and experience poor performance on large volumes. The following guidelines apply:

Attention

Only XPath predicates and SQL queries can benefit from index optimization.
Some fields and some datasets cannot be indexed, as described in section Limitations.
XPath predicates using the osd:label function cannot benefit from index optimization

If indexes have not yet been built, additional time is required to build and persist the indexes, on the first access to the table.

Accessing the table data blocks is required when the query cannot be computed against any index (whether for resolving a rule, filter or sort), as well as for building the index. If the table blocks are not present in memory, additional time is needed to fetch them from the database.

Apart from that, there are other considerations that can impact requests / queries performance:

If the same request / query is going to be executed multiple times with just a variation on the filter value, it is recommendable to use parameters in said filter:
- request.setXPathFilter("./field=$param");
- SELECT * FROM myTable WHERE field=?
If possible, use RequestResult.isSizeGreaterOrEqual rather than RequestResult.getSize

If possible, avoid RequestResult size-related calls. Especially avoid the unnecessary code pattern "check isEmpty before traversing" on a RequestResult:

try (RequestResult result = ...) {
    if (result.isEmpty())
        return; // unnecessary and inefficient!
    for (Adaptation record : result) {
        ...
    }
}
 
// simply do:
 
try (RequestResult result = ...) {
    for (Adaptation record : result) {
        ...
    }
}

It is possible to get information through the memory monitoring and request logging categories.

Accessing and modifying a table

The following access lead to poor performance, and must be avoided:

Access a table after a few modifications, repeatedly. It implies the index state to be refreshed after each modification. The cost of refreshing makes this pattern ineffective. Instead, perform a single query and apply the modification when browsing the results.
If there is an ongoing access to the same table, concurrently to the previous case, it prevents outdated index files to be deleted. As a consequence, the size of the index on disk increases, and the server may run out of disk space in extreme cases. When the concurrent access is closed, the index size is back to normal. This is usually a sign that a Request or a Query is not properly closed.
See also
- RequestResult.close
- QueryResult.close

Other operations on tables

The new records creations or record insertions depend on the primary key index. Thus, a creation becomes almost immediate if this index is already loaded.

Configuring multi-thread index builds

Use the ebx.index.threads property in the ebx.properties file to adjust the number of threads used during index rebuilds. This property defaults to a value of '2', which should be optimal for most production scenarios. Since increasing the thread count can be detrimental to concurrent activity on a server, you might want to restrict this practice to a test environment. However, if a product upgrade, migration, or significant changes to data models require extensive index rebuilds, consider the following during the deployment phase:

If you have a high-performance hardware configuration that allows a high concurrency level, you can target setting ebx.index.threads to a value of CPU count/2 up to a maximum of 10.
Additional threads are only started for large enough tables. The exact value is not fixed, but is generally larger than 200,000 records.

REST built-in and business objects

When using select operations with business objects, EBX® utilizes a temporary folder to handle large response contents. Administrators can set a size limit through the ebx.dataservices.rest.bo.maxResponseSizeInKB configuration parameter. Adjust this setting based on the server's available filesystem space dedicated to temporary content and considering the maximum number of request processing threads.

See ebx.dataservices.rest.bo.maxResponseSizeInKB and Setting temporary files directories for more information.

REST access to history table

The merge information in history table (the merge_info field) has a potentially high access cost. To improve performance and if the client code does not need this field, the includeMergeInfo parameter must be set to false.

See History for more information.

Setting a fetch size

In order to improve performance, a fetch size should be set according to the expected size of the result of the request on a table. If no fetch size is set, the default value will be used.

On a history table, the default value is assigned by the JDBC driver: 10 for Oracle and 0 for PostgreSQL.

Attention

On PostgreSQL, the default value of 0 instructs the JDBC driver to fetch the whole result set at once, which could lead to an OutOfMemoryError when retrieving large amounts of data. On the other hand, using fetchSize on PostgreSQL will invalidate server-side cursors at the end of the transaction. If, in the same thread, you first fetch a result set with a fetchsize, then execute a procedure that commits the transaction, then, accessing the next result will raise an exception.

Performance checklist for other Java customizations

While TIBCO EBX® is designed to support large volumes of data, several common factors can lead to poor performance. Addressing the key points discussed in this section will solve the usual performance bottlenecks.

Expensive programmatic extensions

For reference, the table below details the programmatic extensions that can be implemented.

Use case	Programmatic extensions that can be involved
Validation	programmatic constraints computed values
Table access	record-level permission rules programmatic filters
EBX® content display	computed values UI Components node-level permission rules
Data update	triggers

For large volumes of data, using algorithms of high computational complexity has a serious impact on performance. For example, the complexity of a constraint's algorithm is O(n ² ). If the data size is 100, the resulting cost is proportional to 10 000 (this generally produces an immediate result). However, if the data size is 10 000, the resulting cost will be proportional to 10 000 000.

Another reason for slow performance is calling external resources. Local caching usually solves this type of problem.

If one of the use cases above displays poor performance, it is recommended to track the problem, either by code analysis or by using a Java profiling tool.

Unnecessary index refresh

Refreshing a Lucene index takes time. It should be avoided whenever possible.

When does a refresh happen?

In the context of a transaction, an index refresh occurs when the table has been modified and one of the conditions below occurs:

For a lookup by primary key, the refresh is always triggered if the searched key has been "touched" (created, modified or deleted) in the current Procedure (or TableTrigger).
For a standard Query (or Request), an index refresh is always performed if the table has been modified in the current Procedure (or TableTrigger).

Coding recommendations

To avoid triggering a refresh through a lookup by primary key, the developer must register the Adaptation object returned from the last call to doCreateOccurrence or doModifyContent, and reuse this object instead of performing the lookup.
Avoid any lookup by primary key on a record that has been deleted in the current procedure.
In the case of a query triggering the refresh, the developer must ask the following question: can this query be avoided in my procedure?

Transaction threshold for mass updates

It is generally not advised to use a single transaction when the number of atomic updates in the transaction is beyond the order of 10 ⁵ . Large transactions require a lot of resources, in particular, memory, from EBX® and from the underlying database.

To reduce the transaction size, it is possible to:

Specify the property ebx.manager.import.commit.threshold. However, this property is only used for interactive archive imports performed from the EBX® user interface.
Explicitly specify a commit threshold inside the batch procedure.
Structurally limit the transaction scope by implementing Procedure for a part of the task and executing it as many times as necessary.

On the other hand, specifying a very small transaction size can also hinder performance, due to the persistent tasks that need to be done for each commit.

Note

If intermediate commits are a problem because transactional atomicity is no longer guaranteed, it is recommended to execute the mass update inside a dedicated dataspace. This dataspace will be created just before the mass update. If the update does not complete successfully, the dataspace must be closed, and the update reattempted after correcting the reason for the initial failure. If it succeeds, the dataspace can be safely merged into the original dataspace.

Triggers

If required, triggers can be deactivated using the method ProcedureContext.setTriggerActivation.

Directory integration

Authentication and permissions management involve the user and roles directory.

If a specific directory implementation is deployed and accesses an external directory, it can be useful to ensure that local caching is performed. In particular, one of the most frequently called methods is Directory.isUserInRole.

Documentation > Administration Guide > Installation & configuration

TIBCO EBX® Version 6.2.1. Copyright © 2001-2025. Cloud Software Group, Inc. All rights reserved.

All third party product and company names and third party marks mentioned in this document are the property of their respective owners and are mentioned for identification.

Performance and tuning

Environment

Memory management

CPU

Using the native LZ4 library

Scanning on server startup

Database

Reorganizing database tables

See also

Data modeling

Aggregated lists

Attention

Inherited fields

Data validation

Note

Accessing tables

Functionalities

Query on tables

Architecture and design

Note

Attention

Performance considerations

Attention

Accessing and modifying a table

See also

Other operations on tables

Configuring multi-thread index builds

REST built-in and business objects

REST access to history table

Setting a fetch size

Attention

See also

Performance checklist for other Java customizations

Expensive programmatic extensions

Unnecessary index refresh

Transaction threshold for mass updates

Note

Triggers

Directory integration