Diagnosing Network Issues

A common issue in a GridServer grid is that a misconfiguration in component communication on a network causes Service failure. This section gives a few common issues to troubleshoot in this situation. For more information about configuring GridServer with regard to networks, see the GridServer Installation Guide.

Direct Data Transfer (DDT)

By default, DDT is enabled. When a Service creation or request is initiated on a Driver, the init data or request argument is kept on the Driver, and only a URL is sent to the Manager. When an Engine receives the request, it is sent the URL and downloads the data directly from the Driver. DDT is set in the driver.properties, by setting DSDirectDataTransfer to true. Also, by default, the Driver downloads the output data directly from the Engine. This is set in the Engine Configuration by setting the Direct Data Transfer Enabled option to true. The Driver uses an internal file server by default, at the next available port from the value of DSWebserverPort, set in the driver.properties file. The Engine listens, by default, on port 27159. This is set in the Engine configuration.

If the Engine successfully reads the input data these messages in the Engine log appear:

Fine: [TaskExecutor] Reading data from http://192.168.32.137:1420/ds-7466344146886677638/5.in

In the Driver log a message where the task retrieves the output data appears:

Fine: UrlByteSource Getting data from: http://10.126.209.12:27159/data//bapa101-0/ddt/ds-2491378560399007707/0.out

The most common problems with DDT are firewall issues. Use telnet machine:port to test connections between components. If you are having problems with DDT, try disabling it and temporarily running in data transfer mode.

Data Transfer

Setting the DSDirectDataTransfer option in driver.properties to false causes the Driver to upload all input data to the Broker. You must also set Direct Data Transfer Enabled to false in your Engine configurations, and the Driver downloads the output data from the Broker. The data transfer settings for the Broker are at Admin > System Admin > Manager Configuration > Services, under the Data Transfer heading. They are Store Input to Disk and Store Output to Disk.

Connection Issues

When using data transfer mode, messages like the following mean the Engine has lost connection to the Director:

Warning: [WebServerBridgePlugin] Failed ping attempt on http://161.2.27.160:27159/data/ping.html, java.lang.RuntimeException: java.net.ConnectException: Connection refused: connect

Make sure the Director is running and that the IP address in the log is the IP address of your Director. From your Engine machine, telnet to port 27159 of the Director. If the connection is refused, you might have a network problem. When using DDT, telnet between Driver and Engine to rule out network problems.

Timeout Issues

If you get timeout messages such as the following, you might need to adjust the configuration for this client:

04/15/05 09:18:57.964 Warning: [global] Error reading from http://10.47.117.158:27159/data//2500-dklptt3z-0/ddt/4937299722762820807 /0.out.z: java.io.IOException: Timed out reading
http://10.47.117.158:27159/data//2500-dklptt3z-0/ddt/4937299722762820807/0.out.z

With Direct Data Transfer (DDT), the settings to adjust are at Admin > System Admin > Manager Configuration > Communication, under the Data Transfer heading. If the Driver is timing out attempting to read the output file from the Engine, increase the values under Driver Data Transfer. Also check that you can access the Engine’s file server port 27159 using telnet.

If you are not using DDT for Engines, or if you are using the .NET Driver, the relevant settings are under the HTTP Connections heading of the page above. For example, you might want to increase the Read Timeout setting.