Performance Measurement (Peer to Peer)
FTL has sample applications to measure:
-
Latency: The time delay between message generation and delivery
-
Throughput: The number of messages processed in a given time
The samples can be run with any of the transports including shared memory, TCP, multicast, RUDP, and so on. By default, they run using the shared memory transport for best performance and least configuration. For more details, see the FTL Administration guide, Transport Protocol Types.
CPU Affinity Considerations
The applications discussed below may be run without considering CPU affinity. However, for best performance, the sender and receiver applications can be assigned to specific processors to improve cache performance and limit context switching.
To minimize latency, the latency sender and receiver are designed to receive messages from the transport and dispatch message callbacks from a single thread. This is accomplished through the use of inline event queues and publishers. For more information, see the FTL Development guide, Inline Mode. As a rough rule of thumb, the latency sender and receiver may be assigned to one processor each.
To maximize throughput, the throughput receiver is designed to receive messages from the transport on one thread, and dispatch message callbacks from a second thread. (This is accomplished through the use of a non-inline event queue.) The sender application has a dedicated sending thread. As a rough rule of thumb, the throughput sender and receiver may be assigned to two processors each.
In addition, when using the shared memory transport (which is the default in the sample realm configuration, tibrealm.json), the sender and receiver applications should be assigned to processors in a way that maximizes sharing of cache memory.
The following is just one example on Linux. Some experimentation will be required to achieve the best performance on a specific system.
taskset -c 1 ./tiblatrecv
taskset -c 2 ./tiblatsend
taskset -c 1,2 ./tibthrurecv
taskset -c 3,4 ./tibthrusend
Latency Test
The tiblatrecv and tiblatsend applications work together to provide a latency value calculated by averaging the time needed to send 5 million messages from the sender to the receiver.
Perform these steps:
-
Open two terminals. Make sure you have run the
setupcommand and started the FTL server. See the Command Reference. Note that theftlstartscript automatically loads the realm configuration needed for this test. -
From
samples\bin\advancedin one terminal run:-
Linux and macOS:
$./tiblatrecv localhost:8080
-
Windows:
>tiblatrecv localhost:8080
-
-
From
samples\bin\advancedin the other terminal run:-
Linux and macOS:
$./tiblatsend localhost:8080
-
Windows:
>tiblatsend localhost:8080
-
The tiblatsend terminal displays a summary of the total elapsed message delivery time, the number of messages sent, and an average representing the per message latency.
#
# tiblatsend
#
# TIBCO FTL Version <n.n.n>
Invoked as: tiblatsend
Calibrating tsc timer... done.
CPU speed estimated to be 2.20E+09 Hz
Sending 5000000 messages with payload size 16
Sampling latency every 5000000 messages.
Total time: 4.57E+00 sec. for 5000000 messages
One way latency: 456.62E-09 sec.
You can use these options with syntax for your platform:
-
countto control the number of messages sent. -
sizeto change the size of the messages sent. -
helpto see the compete set of command line options.
Try different values to see the impact on latency using your hardware.
Throughput Test
Throughput measures the number of messages processed between applications in a given time. Larger and fewer messages typically reduces the impact of per-message overhead on messaging throughput.
The tibthrurecv and tibthrusend applications work together to provide a throughput value by measuring the time needed to send 5 million messages from the sender to the receiver.
Perform these steps:
-
Open two terminal and make sure you have run the
setupcommand and started the FTL server. See the Command Reference, if necessary. Note that the ftlstart script automatically loads the realm configuration needed for this test. -
From
samples\bin\advancedin one terminal run:-
Linux and macOS:
$./tibthrurecv localhost:8080
-
Windows:
>tibthrurecv localhost:8080
-
-
From
samples\bin\advancedin the other terminal run:-
Linux and macOS:
$./tibthrusend localhost:8080
-
Windows:
>tibthrusend localhost:8080
-
The tibthrusend terminal displays a summary of:
-
The number of messages sent (together with the number of batches and the batch size which can be set using the
--batchsizecommand line option) - The total send and receive times
- The aggregate message send and receive rates
Typical output from tibthrusend follows. Based on this run:
-
The sender sent all 5 million messages in 2.92 seconds at an average rate of 1.72 million messages per second.
-
The receiver received these messages in about the same amount of time at an average rate of 27 million bytes per second.
#
# tibthrusend
#
# TIBCO FTL Version <n.n.n>
#
Invoked as: tibthrusend http://localhost:8080
Calibrating tsc timer... done.
CPU speed estimated to be 2.20E+09 Hz
Sender Report:
Requested 5000000 messages.
Sending 5000000 messages (50000 batches of 100) with payload size 16
Sent 5.00E+06 messages in 2.92E+00 seconds. (1.71E+06 msgs/sec)
Receiver Report:
Received 5.00E+06 messages in 2.92E+00 seconds. (1.71E+06 messages per second)
Received 80.00E+06 bytes in 2.92E+00 seconds. (27.38E+06 bytes per second)
Messages per callback: min/max/avg/dev: 1.00E+00 / 100.00E+00 / 48.34E+00 / 29.37E+00
Performance is impacted by the performance and load of the system you are running on.
You can use these options with syntax for your platform:
-
countto control the number of messages sent usingtibthrusend. Keep these values relatively large in order to get representative results. -
sizeto specify the size of the messages in bytes. The default size of 16 bytes is a relatively small message and might not be representative of what your applications would be using. -
helpto see the compete set of command line options for bothtibthrusendandtibthrurecv.
Try different values for --size to see its impact on throughput using your hardware.