Troubleshooting
ActiveMatrix Administrator
- The Runtime State of applications is Lost Contact or Unknown
- If the Runtime State column of applications is Lost Contact or Unknown, the connection to theEnterprise Message Service server acting as the notification server and Messaging Bus has been lost.
- Action History is stuck at In Progress
-
An Action History column stuck at In Progress could indicate that:
- One or more of the pending tasks in the dialog that displays when you click the Action History link have failed, most likely due to lost communication with the notification server. The tasks will not be re-queued even after the notification server starts up.
- A node involved in that action is unavailable. When the node becomes available, the action will execute and complete.
- Failure to reconnect to the notification server
- Restart the server if you see the following message after you try to reconnect to the notification server:
Refresh Status Cache action failed , caused by: com.tibco.tibems.qin.TibQinRecoveryException: Connection to the server is failed, caused by: Connection to the server is failed, caused by: Session is closed
- Notification Server URL needs to be changed manually
- When the configured notification server fails, add another available notification server manually to the notification.xml file in the TIBCO host configuration folder. This will enable the TIBCO host to restart. However, the Administration UI continues to display the old notification server URL. Use the following steps to correct it:
- Action History shows Paused Offline
- This means that actions in Administrator are queued up while runtime objects are offline and executed when they comes back online.
- Recover from network outages or IP address changes
- The IP address of the machine on which the Administrator server is running could change due to DHCP reconfiguration if the machine is connected to a new network after being created. To recover from communication errors that can arise from the change in IP address:
- Stop all nodes managed by the SystemHost TIBCO host instance.
- Stop the SystemHost TIBCO host instance.
- If the machine on which the Administrator server is running also hosts the Enterprise Message Service server, restart the Enterprise Message Service server.
- Start the SystemHost TIBCO host instance.
- Reconnect to EMS Server after Restarting the QIN EMS Server
- Actions such as Deploy, Undeploy, Start, or Stop after the QIN EMS server crash results in Error Queing Task. After the QIN EMS server is restarted, go to Admin Configuration > Admin Server > Transport Configurationand click Reconnect to EMS Server for the Administration action function.
- Improve the Administrator UI response time
- Create an index on the TASK table to increase the Administrator UI response time.
For example, if using the Microsoft SQL server create the index using the statement CREATE INDEX index-name ON task (objectURI,queueURI).
- Updating Qin server in a large scale setup, verifyHostsEligibility action times out after 6 minutes.
- In a large scale setup, if following error occurs while updating Qin server, increase the value of
httpConnectionTimeout=3600000 set in
remote_props.properties.
[AMXAdminTask] at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280) [AMXAdminTask] at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109) [AMXAdminTask] 24 Jan 2019 20:44:27 ERROR - TIBCO-AMX-CLI-000019: Error invoking action editStatusTransport on StatusTransportDetails: TIBCO-AMX-ADMIN-012945: Connection to administrator server has timed out. Try increasing the http connection timeou t. Current configured value is 360,000 milliseconds. Even though the connection to Administrator server has timed out, Administrator server will continue processing the request. BUILD FAILED C:\Test\administrator\3.4\samples\qin_build.xml:78: TIBCO-AMX-CLI-000042: Failed on error : 'TIBCO-AMX-CLI-000019: Error invoking action editStatusTransport on StatusTransportDetails: TIBCO-AMX-ADMIN-012945: Connection to administrator server has timed out. Try increasing the http connection timeout. Current configured value is 360,000 milliseconds. Even though the connection to Administrator server has timed out, Administrator server will continue processing the request.'
Administrator Host instances
- tibcohost.exe does not start
-
If you see an exception while starting a TIBCO Host instance that looks like this:
C:\amx\tibcohost\1.0\instances\TibcoHostInstance\HPAInstance\bin> tibcohost [TibcoHost - START] [INFO ] com.tibco.amf.hpa.tibcohost.runtime.TibcoHost - No running TibcoHost instance found on localhost. [TibcoHostInstance] [ERROR] com.tibco.amf.hpa.tibcohost.runtime.TibcoHost - TIBCO-AMX-TIBCOHOST-RUNTIME-103: TibcoHost: TIBCO ActiveMatrix host pingz-t400_TibcoHostInstance failed to start. Cause com.tibco.tibems.qin.TibQinException: Connection to the server is failed.
Check your Enterprise Message Service server configuration, especially if you installed Enterprise Message Service on Windows.2009-12-17 15:09:49.954 Storage Location: 'datastore'. 2009-12-17 15:09:49.954 Routing is disabled. 2009-12-17 15:09:49.954 Authorization is disabled. 2009-12-17 15:09:49.972 Accepting connections on tcp://pingz-t400:7222. 2009-12-17 15:09:49.972 Recovering state, please wait. 2009-12-17 15:09:49.975 Server is active. 2009-12-17 15:26:01.026 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.132ba2cc_1259ef65268_-80000a699217]. 2009-12-17 15:26:01.564 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.132ba2cc_1259ef65268_-80000a699217]. 2009-12-17 15:26:16.355 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.7f68b7a6_1259ef68ea8_-80000a699217]. 2009-12-17 15:26:16.905 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.7f68b7a6_1259ef68ea8_-80000a699217]. 2009-12-17 15:26:52.138 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.-5e8ec58d_1259ef71a70_-80000a699217]. 2009-12-17 15:26:52.732 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.-5e8ec58d_1259ef71a70_-80000a699217].
In this case you likely have an invalid Enterprise Message Service configuration, which was created automatically by the Enterprise Message Service installer on Windows. To fix this, run the installer of Enterprise Message Service and replace the installer filled default ProgramData with a valid folder. The installer does not create missing folders and therefore Enterprise Message Service does not work properly.
- Disable notifications for the host and the nodes
- To disable notifications for the host and the nodes, delete the CONFIG_HOME/tibcohost/ Admin-enterpriseName-adminServerName/host/configuration/notification.xml file.
- Memory guidelines for the SystemNode for enterprises with a large number of nodes
- When many nodes restart at the same time, such as after a power failure, the SystemNode will be flooded with messages and will temporarily need increased heap memory to handle this load. The maximum heap size should be set to handle peak load. Giving a heap size of 3G (-Xmx3g) will accommodate simultaneous messages from around 400 nodes hosting user applications. If your enterprise has more nodes, then the maximum heap memory size should be appropriately increased.
- TIBCO host shows erratic behavior after waking up from hibernation
-
Sometimes the tibcohost process runs into problems with communicating with its nodes. This happens when the machine was hibernated or suspended and woken up afterwards. The management connections do not always reinitialize properly leaving the connection 'hanging'. Only a restart can solve this issue, but tibcohost may not be able to properly shut down the node processes.
Another effect is the problem of the connection to the notification server not initializing properly after the wakeup from hibernation. This is especially true when the wakeup is performed in a different environment from the hibernation. For example, hibernate in the office, wakeup at home. In this case, the IP address changes upon wakeup, which causes communication problems with connections relying on the TCP/IP stack in Java. Avoid wakeup in a different environment or restart with the new IP address.
- Is TIBCO Host instance connected to the right node process?
-
With the problem described in the preceding section, it can happen that a node process sticks around long after control is returned to the TIBCO Host instance. If the instance is either restarted or it is told to start the node again, it may immediately connect to the older node process that is in the process of shutting down.
To verify that the TIBCO Host instance is connected to the correct node process, it prints out the node process unique identifier when it successfully connected. This UUID can be compared to the UUID printed in the node process log file upon startup. Since the UUID is unique for every run, it becomes easy to verify the correctness of the connection.
Node process log:[DEBUG] control.internal.FrameworkImpl - framework is starting with UUID 116295c6-adea-472d-9655-1d6e305a1959
TIBCO Host instance log:[DEBUG] ProxyImpl.AMXAdministratorNode - reached node AMXAdministratorNode_116295c6-adea-472d-9655-1d6e305a1959
When installing a TIBCO Host instance and some nodes on remote systems you have to make sure that they are properly connected via the network. The instance and the node will try to reach the Enterprise Message Service server on the configured port (7222 per default) and for this it is necessary that the port is enabled on the firewall. Especially on Windows systems this port may be blocked by default.
The same problem will occur when the node is trying to reach Administrator. Make sure that the connector is configured on an interface that is reachable over the network and the port is unblocked on the firewall.
- TIBCO Host instance or node does not come up on remote systems
-
When installing a TIBCO Host instance and some nodes on remote systems you have to make sure that they are properly connected via the network. The instance and the node will try to reach the Enterprise Message Service server on the configured port (7222 per default) and for this it is necessary that the port is enabled on the firewall. Especially on Windows systems this port may be blocked by default.
The same problem will occur when the node is trying to reach Administrator. Make sure that the connector is configured on an interface that is reachable over the network and the port is unblocked on the firewall.
Nodes
- Node runs out of memory (Java heap space)
-
When this occurs, configure the node JVM to dump a snapshot of the heap by editing the .tra file of the node and adding the following argument to java.extended.properties:
-XX:HeapDumpPath=file
where file is the name of the file in which the binary heap dump will be written. The dump file can then be analyzed offline by profiling tools.The .tra file of the node is located in the folder CONFIG_HOME/tibcohost/Admin-enterpriseName-adminServerName/nodes/nodeName/bin.
- Node does not start
-
Look at the following places to analyze the problem:
- Check the log file of the node for exceptions
- Check the node-stdout.log file of the instance for exceptions and unusual error messages, which may indicate a problem
- Check the Equinox log file, which is always written to <nodename>/configuration/123....log. Every start of the node process produces a new version of the file. Check for exceptions.
!ENTRY com.tibco.trintiy.server.credentialserver.common 4 0 2009-05-21 11:06:05.186 !MESSAGE !STACK 0 org.osgi.framework.BundleException: The activator com.tibco.trintiy.server.credentialserver.jmx.Activator for bundle com.tibco.trintiy.server.credentialserver.common is invalid at org.eclipse.osgi.framework.internal.core.AbstractBundle.loadBundleActivator(AbstractBundle.Java:146) at org.eclipse.osgi.framework.internal.core.BundleContextImpl.start(BundleContextImpl.Java:980) at org.eclipse.osgi.framework.internal.core.BundleHost.startWorker(BundleHost.Java:346) at org.eclipse.osgi.framework.internal.core.AbstractBundle.resume(AbstractBundle.Java:355) at org.eclipse.osgi.framework.internal.core.Framework.resumeBundle(Framework.Java:1074) at org.eclipse.osgi.framework.internal.core.StartLevelManager.resumeBundles(StartLevelManager.Java:616) at org.eclipse.osgi.framework.internal.core.StartLevelManager.incFWSL(StartLevelManager.Java:508) at org.eclipse.osgi.framework.internal.core.StartLevelManager.doSetStartLevel(StartLevelManager.Java:299) at org.eclipse.osgi.framework.internal.core.StartLevelManager.dispatchEvent(StartLevelManager.Java:489) at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.Java:211) at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.Java:321) Caused by: Java.lang.ClassNotFoundException: com.tibco.trintiy.server.credentialserver.jmx.Activator at org.eclipse.osgi.framework.internal.core.BundleLoader.findClassInternal(BundleLoader.Java:483) at org.eclipse.osgi.framework.internal.core.BundleLoader.findClass(BundleLoader.Java:399) at org.eclipse.osgi.framework.internal.core.BundleLoader.findClass(BundleLoader.Java:387) at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.loadClass(DefaultClassLoader.Java:87) at Java.lang.ClassLoader.loadClass(ClassLoader.Java:251) at org.eclipse.osgi.framework.internal.core.BundleLoader.loadClass(BundleLoader.Java:315) at org.eclipse.osgi.framework.internal.core.BundleHost.loadClass(BundleHost.Java:227) at org.eclipse.osgi.framework.internal.core.AbstractBundle.loadBundleActivator(AbstractBundle.Java:139)
- Node does not stop after the TIBCO Host instance stop -wait true has completed
-
Occasionally, you will find that it takes several minutes for the node processes to finally disappear. Unfortunately, this may or may not be a problem and requires a closer look almost every time. In most cases, it is a normal behavior and can be explained like this:
- The node process runs an OSGi framework. There are many concurrent activities in separate threads that interact during the shutdown sequence. These include Springframework Timers, Framework Event Dispatcher, Startlevel Thread, custom extenders from TIBCO and from customers.
- Each thread is competing for the same shared resources (CPU, IO). Depending on the overall load of the system (operating system), it may take some time for threads to be scheduled and proceed. Because of interdependencies, this may cause a delay of the overall shutdown sequence
- During shutdown, the Activator.stop() method is called for every bundle if present. Any long running or CPU/IO intensive operation performed in that implementation stalls the overall shutdown procedure. Therefore, it is essential to keep this implementation short and quick.
- As a last item of work before ending the process, the OSGi framework (Equinox in our case) persists the current state of the runtime to the disk. This includes bundles and wiring information. Depending on the number of bundles in the runtime and the availability of IO cycles, this operation may take a long time (i.e. > 1min) to complete. It is essential not to disrupt this procedure or else the runtime state may get corrupted and the node may not come up and function as expected.
With all or most of the possible reasons for the delays listed above, there is still the possibility of a problem with the node itself. Any process that hangs around for an excessively long time, that is, > 5min should be examined carefully. To diagnose the issue you can open the node log files and look at the end for where the node may have gotten stuck. A typical run ends with statements similar to this:11 Feb 2010 18:07:08,412 [Event Dispatcher] [DEBUG] control.internal.FrameworkImpl - com.tibco.commonlogging.cbe.model stopped 11 Feb 2010 18:07:08,412 [Framework - sync] [INFO ] control.internal.FrameworkImpl - Sync thread ends. 11 Feb 2010 18:07:08,413 [Bundle Shutdown] [DEBUG] control.internal.FrameworkImpl - removing node.lck 11 Feb 2010 18:07:08,482 [Bundle Shutdown] [INFO ] stdout - Restoring STDOUT 11 Feb 2010 18:07:08,482 [Bundle Shutdown] [INFO ] stdout - Restoring STDERR 11 Feb 2010 18:07:10,968 [shutdown thread] [INFO ] control.internal.FrameworkImpl - exiting process! 11 Feb 2010 18:07:10,971 [Shutdown] [INFO ] org.mortbay.log - Shutdown hook executing 11 Feb 2010 18:07:10,971 [ Shutdown] [INFO ] org.mortbay.log - Shutdown hook complete
- Node cannot be removed
-
This problem only exists on Windows systems and has to do with file locking. If you see a message like this in the tibcohost.log file:
AMXAdminHost 26 Feb 2010 14:35:22,458 [Job_Executor10] [ERROR] com.tibco.amf.hpa.tibcohost.runtime.TibcoHostInstance - error removing node "node2": error preparing for delete by renaming C:\MatrixDevInstall\tibcohost\1.0\instances\TibcoHostInstance\Nodes\node2 to C:\MatrixDevInstall\tibcohost\1.0\instances\TibcoHostInstance\Nodes\node2.tmp0
then Java code tries to delete a folder for which another process: Windows Explorer, a text editor open with a log file, or even the node process has a lock. On Windows systems, those locks have to be removed before the node folder can be deleted.
The tool is very helpful in finding the processes that keep holding the lock.
- TIBCO host takes a long time to start up on Linux platforms
- This may happen intermittently and is not always reproducible. The pseudo-random number generator needs to be seeded with truly random bits. Reads from
/dev/random device will wait until there's data to return and in case of insufficient entropy the wait can last for a long time (many minutes). To confirm that the problem is due to seeding of pseudo-random number generator, run
kill -QUIT
pid or
kill -3
pid. The stacktrace should include
com.sun.SeedGenerator. For truly random seed bits, run the daemon
rngd which reads from a hardware device and inserts verified random entropy bits to
/dev/random. If fast start is more important, switch to
/dev/urandom which does not wait for random bits but reuses already returned bits. Alternatives include:
- Add the line
{{java.properties.java.security.egd=file:/dev/./urandom}} to
tibcohost.tra.
The .tra file of the host is located in the folder CONFIG_HOME/tibcohost/ Admin-enterpriseName-adminServerName/host/bin.
- Edit $JAVA_HOME/jre/lib/security/java.security and replace securerandom.source with securerandom.source=file:/dev/./urandom.
- Add the line
{{java.properties.java.security.egd=file:/dev/./urandom}} to
tibcohost.tra.
- Errors when starting a node in a replicated environment if an external URL used for load balancing
- If an external port is used for load balancing during replication, using the Administrator UI add to the SystemNode and SystemNodeReplica a logging configuration named org.mortbay.log with a logging appender systemnode_root with the Level set to ERROR.
- Thread blocks are observed at java.security.SecureRandom with higher concurrence
- Secure random behavior if
securerandom.source pointing to
/dev/random when the entropy pool is emply
- Stop the node.
- Modify the files as mentioned below:
Add the following property to java.securities file at TIBCO_HOME/tibcojre64/1.8.0/lib/security.
securerandom.source=file:/dev/./urandomAdd the following property to the node tra file (appended to java.extended.properties)
Djava.security.egd=file:/dev/./urandom - Restart the node.
- Problem
-
TIBCO ActiveMatrix 3.1.5 (with Oracle database) setup is upgraded to TIBCO ActiveMatrix 3.3.0 and then to TIBCO ActiveMatrix 3.4.0 successfully. When the tibcohost is restarted, the following error occurs in the SystemNode log:
[ERROR] [com.tibco.amx.platform] com.tibco.governance.mcr.aggregator.runtime.core.GovernanceAggregator - TIBCO-OGS-MCR-888025: Error in MessageProcessTask
- Workaround
-
- In the ActiveMatrix Administrator UI, navigate to Shared Objects > Resource Templates.
- Select Resource Template Type filter as Teneo and click GovernanceTeneoSharedResource.
- Navigate to the General Tab and select Data Source as payloadJdbcSharedResource.
- In the Advanced tab, verify that the property sqlCaseStrategy with value=uppercase is present. If not, create the property.
- Save the changes, and reinstall the resource instance.
- Restart the tibcohost. The above error does not occur in the SystemNode log.
- Problem
- In the same setup mentioned in the above problem (after the above mentioned workaround is performed), when creating a new Node the following error occurs in the SystemNode log:
ERROR] [] com.tibco.amf.admin.api.amx.application.impl.ApplicationServiceUtil - TIBCO-AMX-ADMIN-012258: error while getLog4jConfigInputStream java.lang.NullPointerException
- Workaround
- Before creating a new Node, copy the DefaultLogConfig.properties file from <TIBCO_HOME>\administrator\<version>\templates\ to <CONFIG_HOME>\admin\amxadmin\private\instanceOne.
Applications
- Application deployment failures caused by resource instance failures
- When deploying an application, ActiveMatrix Administrator automatically installs resource instances if there are resource templates with scope to the application. If the resource template installation fails, then application deployment also will fail. For example, if the HTTP connector has a port conflict, it fails to start. For HTTP Connector port conflicts use substitution variables to assign different port numbers for each node to avoid port conflicts. Then uninstall the application and redeploy it.
- Application deployment fails with the "Invalid action URI "null" is specified" error
- This error appears when the SOAP Action in the Concrete portion of a WSDL is an invalid URI (for example, it contains a space). Regenerate the DAA and deploy the project to fix the error.
- Problem
- Enabling custom feature which is created in TIBCO Business Studio of ActiveMatrix Service Grid 3.2.0 on Runtime Node of ActiveMatrix Service Grid 3.4.0 throws the following exception in Node logs:
ClassNotFoundException:javax.xml.bind.JAXBException
This is due to the fact that in TIBCO ActiveMatrix Service Grid 3.4.0, the export of some packages in some Third Party Component Library (TPCL) plugins jars is dropped. For example, in the com.tibco.tpcl.javax.osgi.factories_1.1.0.002.jar the following packages are not exported:javax.xml.bind javax.xml.datatype javax.xml.parsers javax.xml.stream javax.xml.transform javax.xml.validation javax.xml.xpath
- Workaround
-
- In the manifest file of the custom feature, manually add the import of package javax.xml.bind in the "Imported Packages" section.
- Remove the Required Plug-ins com.tibco.tpcl.javax.osgi.factories (because TIBCO ActiveMatrix Service Grid 3.4.0 bundled plugins does not export javax.xml.bind)
- Rebuild the project and re-generate DAA in Business Studio of TIBCO ActiveMatrix Service Grid 3.4.0.
- Upload the new DAA to Administrator UI > Software Management, and enable it on the Runtime Node.
Resource Templates
- HTTP connecter Acceptor Thread Count changed from 1 to 20
- When HTTP Connector is changed from Blocking IO Socket to Non-Blocking IO Socket using the Advanced tab, the acceptor threads count in the General tab automatically changes to 1. However, HTTP Connector instance shows 20 threads when you check the threads in the node VM using jvisualvm or similar tool.
Issue
- >
- Create a new HTTP Connector resource template with Blocking IO Sockets with an instance.
- Set the Acceptor Thread Count to -20.
- Click Advanced tab.
- Check the Use Non-Blocking IO Sockets box and Save.
- Click Yes to reinstall the resource instance.
- Click the
General tab.
Now, the Acceptor Thread Count is changed to 1 and the Save button is enabled.
- Check the thread in the node VM.
It shows 20 threads for the HTTP Connector instead of 1.
Workaround
- Users of KeyStore provider fail to detect KeyStore refreshes
- Users of KeyStore Provider such as Identity Provider, Trust Provider, and Mutual Identity Provider initialize at startup with credentials obtained from the KeyStore. However, they fail to detect future KeyStore refreshes. In order to avoid any service failures, perform the following procedure:
- Stop dependent services.
- Stop Subject, Trust, and Mutual Identity providers that supply the credentials.
- Stop KeyStore provider that supplies the KeyStore containing the credentials.
- Change login credentials of external system.
- Change the credentials in the ActiveMatrix Administrator's hosted KeyStore.
- Restart the KeyStore Credential and Subject, Trust, and Mutual Identity providers.
- Restart the dependent services.