Overview
In some application domains, response time is critically important. Several factors affect network latency, including network bandwidth conditions, hardware capabilities, multitasking, and messaging throughput patterns.
The latency assessment tool, rvlat
, can help you understand the latency characteristics of your network. rvlat
measures latency statistics and produces reports.
Message latency (as measured by rvlat
) is the round-trip time between the client call that sends a request message to a server, and the message callback when the client receives a response from the server.
rvlat
is an executable program that runs in two modes—as a requesting client or as a responding server. To use rvlat
, you must run one instance of each mode.
Principles of Operation
The basic operation of rvlat
is similar to Rendezvous performance assessment software, even though it measures a different property of the network. An rvlat
client process sends a run of messages to a server; the server replies to those messages; the client receives the replies and measures latency statistics. You can vary parameters such as message size, run length, batch size, pause interval—which affect the network latency. (For descriptions of these quantities, see Performance Assessment (rvperf).)
rvlat
can measure multicast or broadcast latency. It does not measure point-to-point latency.
You can use rvlat
to measure latency while communicating through Rendezvous local daemons and remote daemons.
Measuring Technique
rvlat
measures the round-trip time for a request-reply message pair:
Procedure
1. | The client timestamps its outbound request message. |
2. | The server responds to a request by immediately returning the same message to the client. |
3. | The client timestamps the inbound reply, and measures the difference between the two timestamps to obtain the round-trip time. |
Many applications that require low latency send messages in only one direction. However, clock synchronization between two computers is not precise enough to accurately measure one-way travel time. To avoid this difficulty rvlat
measures round-trip time using a single clock.
Nonetheless, measuring round-trip time can also distort the results in several ways. For example, doubling the number of messages doubles the network bandwidth usage, and the effect on Rendezvous can be different for one-way versus two-way communication. The two computers might have different throughput capabilities. Timestamps, data computations and data output at the client add overhead. Turn-around time at the server adds a small overhead.
Serial & Batch Modes
rvlat
can measure round-trip time in two modes. To get a full understanding of your network’s latency characteristics, we recommend measuring latency in both modes.
• | In serial mode (the default behavior), the client sends one request at a time. When the reply arrives, the client records the round-trip time. Only after processing the reply does the client send the next request message. |
Serial mode can help you understand patterns of latency variation over time.
• | In batch mode, the client sends a batch of messages as rapidly as possible, then pauses for a specified interval while gathering replies and measuring their round-trip times. When the interval elapses, the client sends another batch. The -batch parameter specifies batch mode, and requires a size argument (that is, the number of requests per batch). |
Batch mode simulates high-throughput network conditions, which can produce different latency characteristics than low-throughput conditions.
When you specify batch mode, you may optionally specify vectored mode as well—which sends each batch as a vector of messages, using a single send call.
In serial mode and with small batches, the distorting factors are minimal. However, when the batch size is large, the distortion can be more noticeable.
You can reduce these distorting factors large batch sizes by reducing the number of round-trip messages. The -sample
parameter instructs the server to respond to a subset of the request messages that it receives, using a probability-based sampling method.
You can use sampling to create high-throughput network conditions, while dramatically reducing the volume of data collected.
For example, consider a run of 1,000,000 requests with a message payload of 100 bytes each. Sending 1,000,000 requests but only 5000 replies (a 0.5% sample) represents a network bandwidth load of approximately 100,500,000 bytes. The 0.5% sample distorts the results less than a 100% sample, and collects far less data, yet the client still has enough data points to measure latency under high-throughput conditions.
However, this sampling technique can also miss important patterns in the data. For example, if latency spikes occur with regular periodicity, random sampling might miss some or all of those spikes.