Persistence Service Disk Capacity

The FTL Server configuration parameter max.disk.fraction monitors disk capacity and prevents a disk full state to keep the persistence cluster running in the event that disk space is not available. For example, assume a disk size is 10GB and the default max.disk.fraction is set at 0.95. The persistence service would stop accepting messages once the disk usage reaches 9.5GB.

Considerations

Persistence services must have disk persistence enabled to enforce max.disk.fraction.
Only replicated stores are persisted to disk.
Even with max.disk.fraction enabled, you still need to configure reasonable byte limits and message limits at the cluster level, store level, or both. See max.disk.fraction in Persistence Service Configuration Parameters.
The persistence service measures total disk usage, not just its own, and compares that to max.disk.fraction. For example, if several persistence services use the same disk, max.disk.fraction is compared to all disk usage across all persistence services and any other processes.
The disk volume with the persistence data directory should have the same disk capacity for each persistence service in a cluster. Each persistence service in a cluster should also use the same value of max.disk.fraction.

Values and Behavior

The max.disk.fraction default value is 0.95. Publish calls fail once the total disk usage approaches the max.disk.fraction setting multiplied by the capacity of the disk that contains the persistence data directory. The persistence service may go over or under the limit by a small amount. A best practice is to allow for some overage so the persistence service continues to process subscriber acknowledgments while the disk is nearly full. The default value of 0.95 should allow for sufficient overage in common scenarios. In high fan-out cases, where many subscribers must acknowledge the same message before it can be deleted, consider reducing max.disk.fraction.

The impact of publish failures due to the disk usage limit depends on the publisher mode. (Also see Publisher Mode.)

If the publisher mode is store_confirm_send, the publish will be retried automatically by the FTL client library for the publisher's retry duration. This allows some time for subscribers to consume messages and free disk space before the publish call returns an exception.
If the publisher mode is store_send_noconfirm, there is no retry, and the call returns immediately with no exception.

Once enough disk space has been freed, the publish call will no longer fail.

Values for max.disk.fraction of less than 0 and more than 1 are not allowed.

A value for max.disk.fraction of 0 disables the feature which means there is no limit on disk usage. When set to 0 and persistence services have disk persistence enabled, publish calls will not stop, and an overfull backlog may cause a system failure.

Disk Space for Backup or Compaction

There must be enough disk space available to hold the current message backlog when backing up or compacting disk persistence files. When the persistence service is online, backup and compaction cannot run at the same time.

The persistence service will consume additional disk space when compacting or backing up.
If disk usage approaches the disk usage limit (obtained by multiplying disk capacity by max.disk.fraction), or the message backlog is large, the persistence service may abort a backup or compaction and log a warning.
For manual compaction, the best practice is to perform compaction when the message backlog is small.

If storage limits are not configured, the persistence service may consume disk space up to max.disk.fraction for the message backlog. At this point it may be difficult start a backup or compaction even after the message backlog is consumed. In situations like this, the offline compaction tool can be used. See Compact Disk Persistence Files with Persistence Service Offline.

The persistence service prioritizes avoiding disk full errors over storage for pending messages. The persistence service prioritizes storage for pending messages over backups and compactions.

Did you find this helpful?

Yes No