Troubleshooting
ActiveMatrix Administrator
An Action History column stuck at In Progress could indicate that:
- One or more of the pending tasks in the dialog that displays when you click the Action History link have failed, most likely due to lost communication with the notification server. The tasks can not be re-queued even after the notification server starts up.
- A node involved in that action is unavailable. When the node becomes available, the action is executed and completed.
Refresh Status Cache action failed , caused by: com.tibco.tibems.qin.TibQinRecoveryException: Connection to the server is failed, caused by: Connection to the server is failed, caused by: Session is closed
notification.xml
file in the TIBCO host configuration folder. This enables the TIBCO host to restart. However, the Administration UI continues to display the old notification server URL. Use the following steps to correct it:
- Select Admin Configuration > Admin Server.
- Change the Notification Server URL to the one you added to the
notification.xml
file and Save - Click Reconnect to EMS Server.
- Stop all nodes managed by the SystemHost TIBCO host instance.
- Stop the SystemHost TIBCO host instance.
- If the machine on which the Administrator server is running also hosts the Enterprise Message Service server, restart the Enterprise Message Service server.
- Start the SystemHost TIBCO host instance.
For example, if using the Microsoft SQL server create the index using the statement
CREATE INDEX
index-name ON
task (objectURI,queueURI)
.
verifyHostsEligibility
action times out after 6 minutes.
httpConnectionTimeout=3600000
set in
remote_props.properties
.
[AMXAdminTask] at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280) [AMXAdminTask] at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109) [AMXAdminTask] 24 Jan 2019 20:44:27 ERROR - TIBCO-AMX-CLI-000019: Error invoking action editStatusTransport on StatusTransportDetails: TIBCO-AMX-ADMIN-012945: Connection to administrator server has timed out. Try increasing the http connection timeou t. Current configured value is 360,000 milliseconds. Even though the connection to Administrator server has timed out, Administrator server will continue processing the request. BUILD FAILED C:\Test\administrator\3.4\samples\qin_build.xml:78: TIBCO-AMX-CLI-000042: Failed on error : 'TIBCO-AMX-CLI-000019: Error invoking action editStatusTransport on StatusTransportDetails: TIBCO-AMX-ADMIN-012945: Connection to administrator server has timed out. Try increasing the http connection timeout. Current configured value is 360,000 milliseconds. Even though the connection to Administrator server has timed out, Administrator server will continue processing the request.'
Administrator Host instances
- Ensure
tibcohost.tra
is in the same folder. - Ensure the Java classpath in the tra file is updated for your environment. TIBCO host is automatically configured to use the JRE version that is installed with the product.
- Ensure your Java version is JRE 1.8.0.
If you see an exception while starting a TIBCO Host instance that looks like this:
C:\amx\tibcohost\1.0\instances\TibcoHostInstance\HPAInstance\bin> tibcohost [TibcoHost - START] [INFO ] com.tibco.amf.hpa.tibcohost.runtime.TibcoHost - No running TibcoHost instance found on localhost. [TibcoHostInstance] [ERROR] com.tibco.amf.hpa.tibcohost.runtime.TibcoHost - TIBCO-AMX-TIBCOHOST-RUNTIME-103: TibcoHost: TIBCO ActiveMatrix host pingz-t400_TibcoHostInstance failed to start. Cause com.tibco.tibems.qin.TibQinException: Connection to the server is failed.
Check your Enterprise Message Service server configuration, especially if you installed Enterprise Message Service on Windows.
2009-12-17 15:09:49.954 Storage Location: 'datastore'. 2009-12-17 15:09:49.954 Routing is disabled. 2009-12-17 15:09:49.954 Authorization is disabled. 2009-12-17 15:09:49.972 Accepting connections on tcp://pingz-t400:7222. 2009-12-17 15:09:49.972 Recovering state, please wait. 2009-12-17 15:09:49.975 Server is active. 2009-12-17 15:26:01.026 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.132ba2cc_1259ef65268_-80000a699217]. 2009-12-17 15:26:01.564 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.132ba2cc_1259ef65268_-80000a699217]. 2009-12-17 15:26:16.355 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.7f68b7a6_1259ef68ea8_-80000a699217]. 2009-12-17 15:26:16.905 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.7f68b7a6_1259ef68ea8_-80000a699217]. 2009-12-17 15:26:52.138 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.-5e8ec58d_1259ef71a70_-80000a699217]. 2009-12-17 15:26:52.732 WARNING: [admin@pingz-t400]: create subscriber failed: not allowed to create dynamic topic [EMSGMS.UnboundHost_amxadmin.-5e8ec58d_1259ef71a70_-80000a699217].
In this case you likely have an invalid Enterprise Message Service configuration, which was created automatically by the Enterprise Message Service installer on Windows. To fix this, run the installer of Enterprise Message Service and replace the installer filled default ProgramData with a valid folder. The installer does not create missing folders and therefore Enterprise Message Service does not work properly.
Sometimes the TIBCO host process runs into problems with communicating with its nodes. This happens when the machine was hibernated or suspended and woken up afterwords. The management connections do not always reinitialize properly leaving the connection 'hanging'. Only a restart can solve this issue, but TIBCO host may not be able to properly shut down the node processes.
Another effect is the problem of the connection to the notification server not initializing properly after the wake up from hibernation. This is especially true when the wake up is performed in a different environment from the hibernation. For example, hibernate in the office, wake up at home. In this case, the IP address changes upon wake up, which causes communication problems with connections relying on the TCP/IP stack in Java. Avoid wake up in a different environment or restart with the new IP address.
With the problem described in the preceding section, it can happen that a node process sticks around long after control is returned to the TIBCO Host instance. If the instance is either restarted or it is told to start the node again, it may immediately connect to the older node process that is in the process of shutting down.
To verify that the TIBCO Host instance is connected to the correct node process, it prints out the node process unique identifier when it successfully connected. This UUID can be compared to the UUID printed in the node process log file upon startup. Since the UUID is unique for every run, it becomes easy to verify the correctness of the connection.
Node process log:
[DEBUG] control.internal.FrameworkImpl - framework is starting with UUID 116295c6-adea-472d-9655-1d6e305a1959
TIBCO Host instance log:
[DEBUG] ProxyImpl.AMXAdministratorNode - reached node AMXAdministratorNode_116295c6-adea-472d-9655-1d6e305a1959
When installing a TIBCO Host instance and some nodes on remote systems you have to make sure that they are properly connected via the network. The instance and the node try to reach the Enterprise Message Service server on the configured port (7222 per default) and for this it is necessary that the port is enabled on the firewall. Especially on Windows systems this port may be blocked by default.
The same problem occurs when the node is trying to reach Administrator. Make sure that the connector is configured on an interface that is reachable over the network and the port is unblocked on the firewall.
When installing a TIBCO Host instance and some nodes on remote systems you have to make sure that they are properly connected via the network. The instance and the node try to reach the Enterprise Message Service server on the configured port (7222 per default) and for this it is necessary that the port is enabled on the firewall. Especially on Windows systems this port may be blocked by default.
The same problem occurs when the node is trying to reach Administrator. Make sure that the connector is configured on an interface that is reachable over the network and the port is unblocked on the firewall.
Nodes
When this occurs, configure the node JVM to dump a snapshot of the heap by editing the .tra file of the node and adding the following argument to
java.extended.properties
:
-XX:HeapDumpPath=file
where
file is the name of the file in which the binary heap dump is written. The dump file can then be analyzed offline by profiling tools.
The .tra file of the node is located in the folder CONFIG_HOME/tibcohost/Admin-enterpriseName-adminServerName/nodes/nodeName/bin.
Look at the following places to analyze the problem:
- Check the log file of the node for exceptions
- Check the node-stdout.log file of the instance for exceptions and unusual error messages, which may indicate a problem
- Check the Equinox log file, which is always written to <nodename>/configuration/123....log. Every start of the node process produces a new version of the file. Check for exceptions.
!ENTRY com.tibco.trintiy.server.credentialserver.common 4 0 2009-05-21 11:06:05.186 !MESSAGE !STACK 0 org.osgi.framework.BundleException: The activator com.tibco.trintiy.server.credentialserver.jmx.Activator for bundle com.tibco.trintiy.server.credentialserver.common is invalid at org.eclipse.osgi.framework.internal.core.AbstractBundle.
loadBundleActivator(AbstractBundle.Java:146) at org.eclipse.osgi.framework.internal.core.BundleContextImpl.
start(BundleContextImpl.Java:980) at org.eclipse.osgi.framework.internal.core.BundleHost.
startWorker(BundleHost.Java:346) at org.eclipse.osgi.framework.internal.core.AbstractBundle.
resume(AbstractBundle.Java:355) at org.eclipse.osgi.framework.internal.core.Framework.
resumeBundle(Framework.Java:1074) at org.eclipse.osgi.framework.internal.core.StartLevelManager.
resumeBundles(StartLevelManager.Java:616) at org.eclipse.osgi.framework.internal.core.StartLevelManager.
incFWSL(StartLevelManager.Java:508) at org.eclipse.osgi.framework.internal.core.StartLevelManager.
doSetStartLevel(StartLevelManager.Java:299) at org.eclipse.osgi.framework.internal.core.StartLevelManager.
dispatchEvent(StartLevelManager.Java:489) at org.eclipse.osgi.framework.eventmgr.EventManager.
dispatchEvent(EventManager.Java:211) at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.
run(EventManager.Java:321) Caused by: Java.lang.ClassNotFoundException: com.tibco.trintiy.server.credentialserver.jmx.Activator at org.eclipse.osgi.framework.internal.core.BundleLoader.
findClassInternal(BundleLoader.Java:483) at org.eclipse.osgi.framework.internal.core.BundleLoader.
findClass(BundleLoader.Java:399) at org.eclipse.osgi.framework.internal.core.BundleLoader.
findClass(BundleLoader.Java:387) at org.eclipse.osgi.internal.baseadaptor.DefaultClassLoader.
loadClass(DefaultClassLoader.Java:87) at Java.lang.ClassLoader.loadClass(ClassLoader.Java:251) at org.eclipse.osgi.framework.internal.core.BundleLoader.
loadClass(BundleLoader.Java:315) at org.eclipse.osgi.framework.internal.core.BundleHost.loadClass(BundleHost.
Java:227) at org.eclipse.osgi.framework.internal.core.AbstractBundle.
loadBundleActivator(AbstractBundle.Java:139)
Occasionally, the process takes several minutes for the node processes to finally disappear. Unfortunately, this may or may not be a problem and requires a closer look almost every time. In most cases, it is a normal behavior and can be explained like this:
- The node process runs an OSGi framework. There are many concurrent activities in separate threads that interact during the shutdown sequence. These include Springframework Timers, Framework Event Dispatcher, Startlevel Thread, custom extenders from TIBCO and from customers.
- Each thread is competing for the same shared resources (CPU, IO). Depending on the overall load of the system (operating system), it may take some time for threads to be scheduled and proceed. Because of interdependencies, this may cause a delay of the overall shutdown sequence
- During shutdown, the Activator.stop() method is called for every bundle if present. Any long running or CPU/IO intensive operation performed in that implementation stalls the overall shutdown procedure. Therefore, it is essential to keep this implementation short and quick.
- As a last item of work before ending the process, the OSGi framework (Equinox in our case) persists the current state of the runtime to the disk. This includes bundles and wiring information. Depending on the number of bundles in the runtime and the availability of IO cycles, this operation may take a long time (i.e. > 1min) to complete. It is essential not to disrupt this procedure or else the runtime state may get corrupted and the node may not come up and function as expected.
With all or most of the possible reasons for the delays listed above, there is still the possibility of a problem with the node itself. Any process that hangs around for an excessively long time, that is, > 5min should be examined carefully. To diagnose the issue you can open the node log files and look at the end for where the node may have gotten stuck. A typical run ends with statements similar to this:
11 Feb 2010 18:07:08,412 [Event Dispatcher] [DEBUG] control.internal.FrameworkImpl - com.tibco.commonlogging.cbe.model stopped 11 Feb 2010 18:07:08,412 [Framework - sync] [INFO ] control.internal.FrameworkImpl - Sync thread ends. 11 Feb 2010 18:07:08,413 [Bundle Shutdown] [DEBUG] control.internal.FrameworkImpl - removing node.lck 11 Feb 2010 18:07:08,482 [Bundle Shutdown] [INFO ] stdout - Restoring STDOUT 11 Feb 2010 18:07:08,482 [Bundle Shutdown] [INFO ] stdout - Restoring STDERR 11 Feb 2010 18:07:10,968 [shutdown thread] [INFO ] control.internal.FrameworkImpl - exiting process! 11 Feb 2010 18:07:10,971 [Shutdown] [INFO ] org.mortbay.log - Shutdown hook executing 11 Feb 2010 18:07:10,971 [ Shutdown] [INFO ] org.mortbay.log - Shutdown hook complete
This problem only exists on Windows systems and has to do with file locking. If you see a message like this in the tibcohost.log file:
AMXAdminHost 26 Feb 2010 14:35:22,458 [Job_Executor10] [ERROR] com.tibco.amf.hpa.tibcohost.runtime.TibcoHostInstance - error removing node "node2": error preparing for delete by renaming C:\MatrixDevInstall\tibcohost\1.0\instances\TibcoHostInstance\Nodes\node2 to C:\MatrixDevInstall\tibcohost\1.0\instances\TibcoHostInstance\Nodes\
node2.tmp0
then Java code tries to delete a folder for which another process: Windows Explorer, a text editor open with a log file, or even the node process has a lock. On Windows systems, those locks have to be removed before the node folder can be deleted.
The tool is very helpful in finding the processes that keep holding the lock.
- Add the line
{{java.properties.java.security.egd=file:/dev/./urandom}}
to tibcohost.tra.The .tra file of the host is located in the folder CONFIG_HOME/tibcohost/ Admin-enterpriseName-adminServerName/host/bin.
- Edit
$JAVA_HOME/jre/lib/security/java.security and replace
securerandom.source
withsecurerandom.source=file:/dev/./urandom
.
SystemNode
and
SystemNodeReplica
a logging configuration named
org.mortbay.log
with a logging appender
systemnode_root
with the Level set to ERROR.
- Stop the node.
- Modify the files as mentioned below:
Add the following property to java.securities file at TIBCO_HOME/tibcojre64/1.8.0/lib/security.
securerandom.source=file:/dev/./urandomAdd the following property to the node tra file (appended to java.extended.properties)
Djava.security.egd=file:/dev/./urandom - Restart the node.
TIBCO ActiveMatrix 3.1.5 (with Oracle database) setup is upgraded to TIBCO ActiveMatrix 3.3.0 and then to TIBCO ActiveMatrix 3.4.0 successfully. When the tibcohost is restarted, the following error occurs in the SystemNode log:
[ERROR] [com.tibco.amx.platform] com.tibco.governance.mcr.aggregator.runtime.core.GovernanceAggregator - TIBCO-OGS-MCR-888025: Error in MessageProcessTask
- In the ActiveMatrix Administrator UI, navigate to Shared Objects > Resource Templates.
- Select Resource Template Type filter as Teneo and click GovernanceTeneoSharedResource.
- Navigate to the General Tab and select Data Source as payloadJdbcSharedResource.
- In the
Advanced tab, verify that the property
sqlCaseStrategy
withvalue=uppercase
is present. If not, create the property. - Save the changes, and reinstall the resource instance.
- Restart the tibcohost. The above error does not occur in the SystemNode log.
ERROR] [] com.tibco.amf.admin.api.amx.application.impl.ApplicationServiceUtil - TIBCO-AMX-ADMIN-012258: error while getLog4jConfigInputStream java.lang.NullPointerException
Applications
ClassNotFoundException:javax.xml.bind.JAXBException
This is due to the fact that in TIBCO ActiveMatrix Service Grid 3.4.0, the export of some packages in some Third Party Component Library (TPCL) plugins jars is dropped. For example, in the com.tibco.tpcl.javax.osgi.factories_1.1.0.002.jar the following packages are not exported:
javax.xml.bind javax.xml.datatype javax.xml.parsers javax.xml.stream javax.xml.transform javax.xml.validation javax.xml.xpath
- In the manifest file of the custom feature, manually add the import of package
javax.xml.bind
in the "Imported Packages" section. - Remove the Required Plug-ins
com.tibco.tpcl.javax.osgi.factories
(because TIBCO ActiveMatrix Service Grid 3.4.0 bundled plugins does not exportjavax.xml.bind
) - Rebuild the project and re-generate DAA in Business Studio of TIBCO ActiveMatrix Service Grid 3.4.0.
- Upload the new DAA to Administrator UI > Software Management, and enable it on the Runtime Node.
Resource Templates
Issue
- >
- Create a new HTTP Connector resource template with Blocking IO Sockets with an instance.
- Set the Acceptor Thread Count to -20.
- Click Advanced tab.
- Check the Use Non-Blocking IO Sockets box and Save.
- Click Yes to reinstall the resource instance.
- Click the
General tab.
Now, the Acceptor Thread Count is changed to 1 and the Save button is enabled.
- Check the thread in the node VM.
It shows 20 threads for the HTTP Connector instead of 1.
Workaround
- Click General and click Save.
- Click
Yes to reinstall the resource instance.
The Acceptor Thread Count now shows 1 in the node VM for the HTTP Connector instance.
- Stop dependent services.
- Stop Subject, Trust, and Mutual Identity providers that supply the credentials.
- Stop KeyStore provider that supplies the KeyStore containing the credentials.
- Change login credentials of external system.
- Change the credentials in the ActiveMatrix Administrator's hosted KeyStore.
- Restart the KeyStore Credential and Subject, Trust, and Mutual Identity providers.
- Restart the dependent services.