Configuring a Caching HTTP Proxy Server
In a GridServer deployment where a Broker and its Engines are separated by a WAN, it can be inefficient to transfer the same data over the WAN to multiple Engines from the Broker or the Clients. One solution is to use an HTTP proxy server (such as Squid Web Cache) to cache the session’s init data, which any Engine that works on the session must transfer. You can specify a proxy server in an Engine configuration, and the proxy server caches the Service data for other Engines also using the same proxy server.
To use a proxy server, such as Squid, for resource synchronization or data transfer, configure Proxy Host and Proxy Port parameters to the proxy hostname and port. In case you want to use Proxy with authentication along with Proxy Host and Proxy Port, you must use Proxy Username and Proxy Password.
Two additional properties dictate what data is cached. Use Proxy for Data Transfer causes Engines to use the HTTP proxy server for download of any session’s init data. The Use Proxy for Resource Synchronization property causes Engines to use the HTTP proxy for resource synchronization download.
If Engines are configured to use a proxy server and the proxy is not available, the Engine does not attempt to download the Service data using alternate connection parameters. It’s the administrator’s responsibility to make sure the proxy server is up and properly configured. The administrator must consider implementing DNS or IP failover if high availability is required.
Due to the fact that HTTPS requests might not be properly cached in a proxy server, HTTPS is not supported.
Token security for all resource downloads is not supported when using a proxy server. It is assumed that the proxy server is in the same LAN and the LAN is secure. Note that the proxy server cannot download a resource until one of the Engines provides a valid download token.
Note that the following potential problems might occur when using caching proxy servers for resource synchronization:
• | Engines can download stale copies of an updated resource from the proxy server. This occurs when the cache timeout is too long or a recently downloaded resource is updated shortly afterward. |
• | The proxy server might not be able to serve a cached copy of an updated resource if multiple Engines start downloading the resource in a very short time span. This can occur when using Squid even if the collapsed_forwarding parameter is enabled. |
To address these issues, adjust the cache size to ensure that the proxy server can cache large files. Additionally, analyzing the resource download pattern and the proxy server configuration is recommended to achieve optimal results. The important factors to consider are:
• | Size of the total resources that require downloading |
• | Size of the total proxy cache, the maximum size per object |
• | Maximum age of a cached object |
• | Proxy server behavior for concurrent requests |
• | Configure the HTTP proxy server to ignore the no-cache header. For example, the following refresh_pattern option in the squid.conf file causes the squid cache to ignore no-cache headers for URLs matching the regular expression of “^http://.*/livecluster/resourcesproxy ”: |
refresh_pattern ^http://.*/livecluster/resourcesproxy 0 20% 4320 ignore-no-cache