TIBCO EBX®
Documentation > Reference Manual > Other
Navigation modeDocumentation > Reference Manual > Other

Performance guidelines

Basic performance checklist

While TIBCO EBX® is designed to support large volumes of data, several common factors can lead to poor performance. Addressing the key points discussed in this section will solve the usual performance bottlenecks.

Expensive programmatic extensions

For reference, the table below details the programmatic extensions that can be implemented.

Use case

Programmatic extensions that can be involved

Validation

Table access

EBX® content display

Data update

For large volumes of data, using algorithms of high computational complexity has a serious impact on performance. For example, the complexity of a constraint's algorithm is O(n 2 ). If the data size is 100, the resulting cost is proportional to 10 000 (this generally produces an immediate result). However, if the data size is 10 000, the resulting cost will be proportional to 10 000 000.

Another reason for slow performance is calling external resources. Local caching usually solves this type of problem.

If one of the use cases above displays poor performance, it is recommended to track the problem, either by code analysis or by using a Java profiling tool.

Directory integration

Authentication and permissions management involve the user and roles directory.

If a specific directory implementation is deployed and accesses an external directory, it can be useful to ensure that local caching is performed. In particular, one of the most frequently called methods is Directory.isUserInRole.

Aggregated lists

In a data model, when an element's cardinality constraint maxOccurs is greater than 1 and no osd:table is declared on this element, it is implemented as a Java List. This type of element is called an aggregated list, as opposed to a table.

It is important to consider that there is no specific optimization when accessing aggregated lists, in terms of iterations, user interface display, etc. Besides performance concerns, aggregated lists are limited with regard to many functionalities that are supported by tables. See tables introduction for a list of these features.

Attention

For the reasons stated above, aggregated lists should be used only for small volumes of simple data (one or two dozen records), with no advanced requirements for their identification, lookups, permissions, etc. For larger volumes of data (or more advanced functionalities), it is recommended to use osd:table declarations.

Checklist for dataspace usage

Dataspaces are an invaluable tool for managing complex data life cycles. While this feature brings great flexibility, it also implies a certain overhead, which should be taken into consideration for optimizing usage patterns.

This section reviews the most common performance issues that can appear in case of an intensive use of many dataspaces containing large tables, and how to avoid them.

Note

Sometimes, the use of dataspaces is not strictly needed. As an extreme example, consider the case where every transaction triggers the following actions:

  1. A dataspace is created.

  2. The transaction modifies some data.

  3. The dataspace is merged, closed, then deleted.

In this case, no future references to the dataspace are needed, so using it to make isolated data modifications is unnecessary. Thus, using Procedure already provides sufficient isolation to avoid conflicts from concurrent operations. It would then be more efficient to directly do the modifications in the target dataspace, and get rid of the steps which concern branching and merging.

For a developer-friendly analogy, referring to a source-code management tool (CVS, SVN, etc.): when you need to perform a simple modification impacting only a few files, it is probably sufficient to do so directly on the main branch. In fact, it would be neither practical nor sustainable, with regard to file tagging/copying, if every file modification involved branching the whole project, modifying the files, then merging the dedicated branch.

Insufficient memory

When a table is accessed, the EBX® Java memory cache is used. It ensures a much more efficient access to data when this data is already loaded in the cache. However, if there is not enough space for working data, swaps between the Java heap space and the underlying database can heavily degrade overall performance.

Such an issue can be detected by monitoring the monitoring log file. If it occurs, various actions can be considered:

Reorganizing database tables

As with any database, inserting and deleting large volumes of data may lead to fragmented data, which can deteriorate performance over time. To resolve the issue, reorganizing the impacted database tables is necessary. See Monitoring and cleanup of the relational database.

A specificity of EBX® is that creating dataspaces and snapshots adds new entries to tables GRS_DTR and GRS_SHR. When poor performance is experienced, it may be necessary to schedule a reorganization of these tables, for large repositories in which many dataspaces are created and deleted.

Memory management

Monitoring

Indications of EBX® load activity are provided by monitoring the underlying database, and also by the 'monitoring' logging category.

If the numbers for cleared and built objects remain high for a long time, this is an indication that EBX® is swapping on the application server.

Tuning hardware resources

Disk space

The master data stored in the database is additionally indexed into persistent Lucene indexes, serving to accelerate all the queries issued by EBX®. This comes at the cost of additional storage space: a rule of thumb is to plan for 10 times the space occupied in the relational database.

Disk latency

In order to maintain good overall performance, it is particularly important for the disk storing the Lucene indexes to have low latency.

Memory allocated to the application server

Since the query engine retrieves the necessary information from persistent storage, the memory allocated to the Java Virtual Machine (usually specified by the -Xmx parameter) can be kept low. We recommend to stay below 32 GB, which should fit all reasonable use cases, and allow benefiting from the compressed Oops feature.

Tuning the garbage collector can also benefit overall performance. This tuning should be adapted to the use case and specific Java Runtime Environment used.

Memory allocated to the operating system

On the OS running the application server, it is important to leave sufficient room to the OS cache, letting it optimize access to the persistent Lucene indexes. Indeed, once these have been loaded from the file system, the OS use its memory cache to speed up subsequent accesses to this same data, and avoid reloading it from the disk every time. This is only possible if sufficient RAM has been left for this purpose.

Validation performance

The internal validation framework will optimize the work required during successive requests to update the validation report of a dataset or a table. The incremental validation process behaves as follows:

Certain constraints are systematically re-validated, even if no updates have occurred since the last validation. These are the constraints with unknown dependencies. An element has unknown dependencies if:

Consequently, on large tables, it is recommended to:

Mass updates

Mass updates can involve several hundred thousands of insertions, modifications and deletions. These updates are usually infrequent (usually initial data imports), or are performed non-interactively (nightly batches). Thus, performance for these updates is less critical than for frequent or interactive operations. However, similar to classic batch processing, it has certain specific issues.

Transaction threshold

It is generally not advised to use a single transaction when the number of atomic updates in the transaction is beyond the order of 10 4 . Large transactions require a lot of resources, in particular, memory, from EBX® and from the underlying database.

To reduce transaction size, it is possible to:

On the other hand, specifying a very small transaction size can also hinder performance, due to the persistent tasks that need to be done for each commit.

Note

If intermediate commits are a problem because transactional atomicity is no longer guaranteed, it is recommended to execute the mass update inside a dedicated dataspace. This dataspace will be created just before the mass update. If the update does not complete successfully, the dataspace must be closed, and the update reattempted after correcting the reason for the initial failure. If it succeeds, the dataspace can be safely merged into the original dataspace.

Triggers

If required, triggers can be deactivated using the method ProcedureContext.setTriggerActivation.

Accessing tables

Functionalities

Tables are commonly accessed through EBX® UI, data services and also through the Request and Query APIs. This access involves a unique set of functions, including a dynamic resolution process. This process behaves as follows:

Query on tables

Architecture and design

In order to improve the speed of operations on tables, persistent Lucene indexes are managed by the EBX® engine.

Attention

Faster access to tables is ensured if indexes are ready and maintained in the OS memory cache. As mentioned above, it is important for the OS to have enough space allocated.

Performance considerations

The query optimizer favors the use of indexes when computing a request result. If a query cannot take advantage of the indexes, it will be resolved in Java memory, and experience poor performance on large volumes. The following guidelines apply:

Attention

  • Only XPath predicates and SQL queries can benefit from index optimization.

  • Some fields and some datasets cannot be indexed, as described in section Limitations.

  • XPath predicates on a multivalued field cannot benefit from index optimization, except for the osd:search function.

  • XPath predicates using the osd:label function cannot benefit from index optimization

If indexes have not yet been built, additional time is required to build and persist the indexes, on the first access to the table.

Accessing the table data blocks is required when the query cannot be computed against any index (whether for resolving a rule, filter or sort), as well as for building the index. If the table blocks are not present in memory, additional time is needed to fetch them from the database.

It is possible to get information through the monitoring and request logging categories.

Accessing and modifying a table

The following access lead to poor performance, and must be avoided:

Other operations on tables

The new records creations or record insertions depend on the primary key index. Thus, a creation becomes almost immediate if this index is already loaded.

Setting a fetch size

In order to improve performance, a fetch size should be set according to the expected size of the result of the request on a table. If no fetch size is set, the default value will be used.

Applications server

Configuration

To speed up the persisted data access, it is required to perform a ebx-lz4.jar native installation.

See Data compression library for more information.

Startup

To speed up the web applications server startup, the JAR files scanner should be configured.

Documentation > Reference Manual > Other