Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved


Chapter 14 Multicast Deployment and Troubleshooting : Troubleshooting EMS Multicast

Troubleshooting EMS Multicast
Multicast deployment issues are often more difficult to resolve than similar unicast issues. Reasons for the additional difficulty include:
Older networking equipment that was not designed with multicast deployment in mind. For example, switches that can only flood multicast or routers that do not have modern multicast routing protocols.
Different equipment may solve the same problem in different ways. For example, some switches use IGMP snooping while others use CGMP.
Bandwidth is automatically shared equitably among competing unicast streams, but administrator intervention may be required to achieve desired multicast bandwidth sharing.
Troubleshooting Tips
This section give some troubleshooting tips to help you respond to difficulties you may experience with your multicast deployment.
General Tips
If you are experiencing problems with your deployment, begin with these practices:
The "bottom-up" approach generally seems best. That is, get the lowest layers of the network stack working first.
Begin with the EMS server and trace your way through each switch and router to all receivers. Try moving your receiving application to the same hub as the server (not a switch or a router), and confirm that you have multicast connectivity. Once that works, move on to more complicated multicast networks.
Connectivity
EMS will detect multicast connectivity issues; it may take up to 64 seconds to detect a connectivity problem. These suggestions can help resolve issues with connectivity:
Verify that the network has good unicast connectivity between the sender and all receivers before tackling multicast connectivity problems.
Test your multicast application without enabling multicast in the EMS server to determine if a more general topic or application configuration issue is preventing message reception. For example, a consumer that is consuming on the wrong topic.
Enable multicast and topic tracing in the server to ensure proper configuration, and to verify that messages are being multicast by the server.
Ensure that you are using the proper interface(s) in the server and the multicast daemon. On a multi-homed host, it is possible that the default interface cannot receive multicast data from the server.
Ensure that the channel's ttl is large enough for data to cross all of your switches and routers.
Data Loss
These suggestions can help if you are experiencing data loss:
Enable and check statistics to see if data is being delivered and whether excessive loss is encountered. If loss is detected, decreasing the multicast channel's maxrate property may alleviate the situation.
Make sure that multicast streams are being generated with a time to live that is long enough for messages to reach their destination using the longest-possible path through the network.
If you see increased loss as multicast rates go up, look for routers or switches that might be configured to limit the broadcast rate. These generally limit the multicast rate too. For example, Cisco Catalyst 5000 series switches can be configured to limit the packet per second or percentage of broadcast/multicast traffic with the set port broadcast command.
Application and Multicast Daemon Errors and Warnings
You may find these tips useful if you are experiencing errors in the multicast daemon or client application:
Register a multicast exception listener in the receiving application. This provides the application with a way to detect, log, and handle multicast warnings and errors.
Note that multicast events are also logged at the client if client trace is enabled on the server, but that comes at a performance price and can cause other problems. For this reason, we do not recommend using client trace outside of debugging basic connectivity issues or as directed by TIBCO support.
Typically, when consumer creation fails for a consumer on a multicast-enabled topic, a message is written to the multicast daemon's log (or console) as well as to the server log. An appropriate exception or return code is generated from the call on the client as well. After eliminating the other non-multicast related reasons (security, general configuration) you may want to check:
When the multicast daemon detects excessive loss, the multicast connection exception IO Failed is generated in the application. Usually, this means that the server is sending too fast, and maxrate for the channel needs to be decreased. The multicast daemon will report an error, similar to the following:
2007-10-02 16:45:09.551 Multicast error: channel='mcast', Loss Detected, status=IO failed
You will also notice in the multicast statistics that the particular channel's rcv_losses are growing.
If a consumer receives a multicast exception of TIBEMS_TIMEOUT with a message similar to Timeout reached which may indicate a configuration or hardware problem, this indicates a lack of multicast connectivity. While unicast connectivity exists between the client and server and the multicast channel was set up, multicast data cannot get from the server to the local multicast daemon. Note that this may take more than a minute to detect.
Start a subscriber listening to $sys.monitor.multicast.stats monitoring messages to receive multicast-related statistics.
Server Errors
In General, server errors are self-descriptive. It is important to note that client errors may be returned to the server to be logged, providing a centralized place to look for multicast errors. However, these errors do not include minor loss on a particular client, or loss of messages from a client failover.

Copyright © TIBCO Software Inc. All Rights Reserved
Copyright © TIBCO Software Inc. All Rights Reserved