This page lists various network related issues that may be encountered with service discovery, how to diagnose them, and how to resolve them.
A node requires that service discovery functions properly at least locally on the machine where the node is installed.
If another program has the service discovery UDP port opened exclusively, node installation fails with a message similar to:
$ epadmin install node --nodename mynode.mycluster
[mynode.mycluster] Installing node
[mynode.mycluster] DEVELOPMENT executables
[mynode.mycluster] File shared memory
[mynode.mycluster] 4 concurrent allocation segments
[mynode.mycluster] Host name fig.local
[mynode.mycluster] Starting node services
[mynode.mycluster] Loading node configuration
[mynode.mycluster] Auditing node security
Service discovery verification failed: Could not start the discovery service listener
on port 54321, network error: SWSocket::initServer:
Call to 'bind' failed: Address already in use [errno:98].
Resolution: either choose another port
for service discovery using the --discoveryport
option
(see epadmin-node(1)), or find and terminate the program that
is using the port.
If an invalid port number is specified for service discovery using the --discoveryport
option (see epadmin-node(1)), node installation will fail with a
message similar to:
$ epadmin install node --nodename mynode.mycluster
--discoveryport 1
[mynode.mycluster] Installing node
[mynode.mycluster] DEVELOPMENT executables
[mynode.mycluster] File shared memory
[mynode.mycluster] 4 concurrent allocation segments
[mynode.mycluster] Host name fig.local
[mynode.mycluster] Starting node services
[mynode.mycluster] Loading node configuration
[mynode.mycluster] Auditing node security
Service discovery verification failed: Could not start the discovery service listener
on port 1, network error: SWSocket::initServer: Call to 'bind' failed:
Permission denied [errno:13].
Resolution: choose another unused UDP port in the range of 1024 to 65535.
If the name of the node being installed (either by default or specified using the
--nodename
) is already in use by another node using
the same service discovery port, node installation will failure with a message
similar to:
$ epadmin install node --nodename mynode.mycluster
[mynode.mycluster] Installing node
install of node mynode.mycluster using discovery port 54321 failed: the service name is
already in use by service address fig.local:35883
Resolution: either stop and remove the other node, or choose a different node name. See epadmin-node(1).
Service discovery uses UDP broadcast packets for making discovery requests, and
socket to socket UDP packets for responses. If these packets are being filtered or
dropped by the operating system, or by routers in between the node and epadmin
, discovery requests will not be seen by the discovery
server running within the node.
When UDP packets are being filtered or dropped on the machine where the node is
installed, epadmin install node
will fail with a
message similar to:
$ epadmin install node --nodename mynode.mycluster
[mynode.mycluster] Installing node
[mynode.mycluster] DEVELOPMENT executables
[mynode.mycluster] File shared memory
[mynode.mycluster] 4 concurrent allocation segments
[mynode.mycluster] Host name fig.local
[mynode.mycluster] Starting node services
[mynode.mycluster] Loading node configuration
[mynode.mycluster] Auditing node security
Service discovery verification failed: Service discovery did not find any results
Resolution: ensure that UDP packets are not being filtered on the port being used by the discovery service.
During node installation, local service discovery verification is done. Failures cause the installation to fail (see Node Installation Failures).
Service discovery verification may also be done as a stand-alone epadmin
command, with or without any nodes installed, using the
epadmin verify services
command.
By default, the epadmin verify services
command runs
both a discovery server and a discovery client locally. The client makes a request,
and verifies that it receives the expected response.
$ epadmin verify services
Service discovery is functioning properly locally.
The verification server may be run independently of the client. Run the server in one terminal:
$ epadmin verify services --mode server
Service discovery server started. Interrupt to exit.
Note
The server does not return until interrupted.
In another terminal run a verification client:
$ epadmin verify services --mode client
Service discovery is functioning properly locally.
Note
The verification client may be run multiple times using the same verification server.
Start the server on one machine:
$ hostname fig.local $ epadmin verify services --mode server Service discovery server started. Interrupt to exit.
Run the client on another machine:
$ hostname mulberry.local $ epadmin verify services --mode client Service discovery is functioning properly locally.
The --discoveryport
, --discoveryhosts
, and --discoverytimeout
global options are honored by the epadmin verify services command. See epadmin-globals(1) .
The --debug
global option also effects the
verify services command and is shown
in the next section.
The --debug
global option enables the output of debug
tracing for the service discovery verification server and client.
Start the verification server with the --debug
global
option:
$ epadmin --debug verify services --mode server
2018-05-23 15:02:19.159215|DSV|INFO |5214|discovery.cpp(288)|SWDiscovery::Discovery for
service x.y.zz.y, type test-type, address test-address, started on port 54321
Service discovery server started. Interrupt to exit.
The trace indicates that the server has successfully started listening on the default discovery port, and contains the x.y.zz.y service.
Run the verification client with the --debug
global
option:
$ epadmin --debug verify services --mode client
2018-05-23 15:07:41.095600|DSV|DEBUG|6225|client.cpp(351)|Client sending:
PDU:A5:2:DiscoverServicesRequest:3:10.240.6.255/6225/3/0:x.y.zz.y:test-type::
on 10.240.6.255:54321
2018-05-23 15:07:41.095656|DSV|DEBUG|6225|client.cpp(351)|Client sending:
PDU:A5:2:DiscoverServicesRequest:3:255.255.255.255/6225/4/0:x.y.zz.y:test-type::
on 255.255.255.255:54321
2018-05-23 15:07:41.095818|DSV|DEBUG|6225|client.cpp(674)|Client getResults matched
response: PDU:A5:2:DiscoverServicesResponse:
4:10.240.6.255/6225/3/0:x.y.zz.y:test-type:test-address: from 10.240.6.56:46180
Service discovery is functioning properly locally.
The trace shows the client sending two service discovery broadcast requests, looking for the service name (x.y.zz.y) and the service type (test-type). The first request is sent on the broadcast address for the current host name (in this case 10.240.6.255), and a second requests goes out on the localhost interface. (255.255.255.255).
2018-05-23 15:07:41.095600|DSV|DEBUG|6225|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:3:10.240.6.255/6225/3/0:x.y.zz.y:test-type:: on 10.240.6.255:54321 2018-05-23 15:07:41.095656|DSV|DEBUG|6225|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:3:255.255.255.255/6225/4/0:x.y.zz.y:test-type:: on 255.255.255.255:54321
The client trace then shows it having received a response from the verification server:
2018-05-23 15:07:41.095818|DSV|DEBUG|6225|client.cpp(674)|Client getResults matched response: PDU:A5:2:DiscoverServicesResponse: 4:10.240.6.255/6225/3/0:x.y.zz.y:test-type:test-address: from 10.240.6.56:46180
The verification server terminal shows the server receiving the two requests and sending responses to each of them:
018-05-23 15:07:41.095708|DSV|DEBUG|6186|discovery.cpp(412)|Discovery test-address received: PDU:A5:2:DiscoverServicesRequest:3:10.240.6.255/6225/3/0:x.y.zz.y:test-type:: from 10.240.6.56:38912 2018-05-23 15:07:41.095730|DSV|DEBUG|6186|util.cpp(365)|Discovery sending response to 10.240.6.56:38912 : PDU:A5:2:DiscoverServicesResponse: 4:10.240.6.255/6225/3/0:x.y.zz.y:test-type:test-address: 2018-05-23 15:07:41.095793|DSV|DEBUG|6186|discovery.cpp(412)|Discovery test-address received: PDU:A5:2:DiscoverServicesRequest:3:255.255.255.255/6225/4/0:x.y.zz.y:test-type:: from 10.240.6.56:47951 2018-05-23 15:07:41.095805|DSV|DEBUG|6186|util.cpp(365)|Discovery sending response to 10.240.6.56:47951 : PDU:A5:2:DiscoverServicesResponse: 4:255.255.255.255/6225/4/0:x.y.zz.y:test-type:test-address:
In the example above, the client returned successfully after receiving the first matching response, because its request contained a fully qualified node name (see the Service Names section of the Spotfire Streaming Architects Guide).
When the service discovery request does not specify a fully qualified node name,
then the client will wait for the full discovery timeout period (default: 1 second,
see --discoverytimeout
in epadmin-globals(1)). The client discards duplicate
responses, which is shown below, running a standard epadmin display services command talking to the
still running verification server from above.
$ epadmin --debug display services --servicetype test-type 2018-05-23 18:30:09.331217|DSV|DEBUG|13417|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:2:10.240.6.255/13417/3/0::test-type:: on 10.240.6.255:54321 2018-05-23 18:30:09.331250|DSV|DEBUG|13417|client.cpp(351)|Client sending: PDU:A5:2:DiscoverServicesRequest:2:255.255.255.255/13417/4/0::test-type:: on 255.255.255.255:54321 2018-05-23 18:30:09.331314|DSV|DEBUG|13417|client.cpp(674)|Client getResults matched response: PDU:A5:2:DiscoverServicesResponse: 4:10.240.6.255/13417/3/0:x.y.zz.y:test-type:test-address: from 10.240.6.56:37287 2018-05-23 18:30:09.331375|DSV|DEBUG|13417|client.cpp(674)|Client getResults matched response: PDU:A5:2:DiscoverServicesResponse: 4:255.255.255.255/13417/4/0:x.y.zz.y:test-type:test-address: from 10.240.6.56:46595 2018-05-23 18:30:09.331382|DSV|DEBUG|13417|results.cpp(300)|Discarding duplicate response from x.y.zz.y:test-type:test-address Service Name =x.y.zz.y
Service Type = test-type Network Address = dtm://test-address
In cases where UDP packet filtering is suspected, debug tracing can be used to determine if the discovery server is receiving the requests and responding, and if the client is receiving the responses.
Start a verification discovery server, with debug tracing enabled:
$ epadmin --debug verify services --mode server
2018-05-24 09:45:52.884984|DSV|INFO |8504|discovery.cpp(288)|SWDiscovery::Discovery
for service x.y.zz.y, type test-type, address test-address, started on port 54321
Service discovery server started. Interrupt to exit.
In another terminal, on the same machine, or another machine if debugging cross-machine service discovery, run a verification discovery client, with debug tracing enabled. These traces show the client successfully sending two discovery requests but not receiving any responses:
$ epadmin --debug verify services --mode client
2018-05-24 09:55:01.533096|DSV|DEBUG|8521|client.cpp(351)|Client sending:
PDU:A5:2:DiscoverServicesRequest:3:10.240.6.255/8521/3/0:x.y.zz.y:test-type::
on 10.240.6.255:54321
2018-05-24 09:55:01.533141|DSV|DEBUG|8521|client.cpp(351)|Client sending:
PDU:A5:2:DiscoverServicesRequest:3:255.255.255.255/8521/4/0:x.y.zz.y:test-type::
on 255.255.255.255:54321
Service discovery verification failed: Service discovery did not find any results
Nothing was output in the discovery server terminal, showing that it did not receive either of the requests.