Fail Back to the Original Site
After recovering from a disaster, the disaster recovery feature can be used to fail back to the primary site once it is available again. If returning to the original primary site, there must be nothing left at the original site, including any lingering FTL processes or any data directories used by FTL. Clean up all processes and files as needed.
You can also use the disaster recovery feature to migrate FTL operations to a different site, even though no disaster has occurred.
-
Start the FTL servers at the original primary site (the site to which FTL operations will be migrated).
You may use the following for reference:
-
samples
/yaml/dr/tibftlserver_primary_failback.yaml -
samples/yaml/dr-secure/tibftlserver_primary_failback.yaml
These FTL servers will act as disaster recovery servers for the FTL servers running at the disaster recovery site (which are acting as primary servers).
In each YAML configuration file the
drforparameter must be defined, which is a list of addresses that FTL server will use to connect to FTL servers at the primary site. For details see FTL Server Configuration Parameters.If authentication is required, configure the
userandpasswordparameters with credentials that the primary servers can use to authenticate themselves to the disaster recovery FTL servers. Ensure that this username is in theftl-internalauthorization group (according to the authentication service at the disaster recovery site). See FTL Server Authorization Groups.If TLS security is required, distribute the keystore file and trust file to all FTL servers. See Enabling TLS for FTL Server.
-
-
Enable DR connectivity at the disaster recovery site.
This will allow the FTL servers at the disaster recovery site to connect to the FTL servers at the primary site. No restart is required.
Issue the enable_dr REST command at the disastery recovery site, using the URLs of the primary FTL servers as the argument. From this point on these URLs are persisted even if FTL servers at the disaster recovery site are restarted. Optionally, at some later time, add the
drtoURLs to the configuration file at the disaster recovery site. For details onenable_dr, see POST cluster. For details ondrto, see FTL Server Configuration Parameters. -
Update the realm configuration to re-enable DR replication.
You may use the following for reference:
-
samples/yaml/dr/dr-cluster-sample-dr-failback.json -
samples/yaml/dr-secure/dr-cluster-sample-dr-failback.json
At the persistence cluster level, DR Enabled must be checked.
Deploy the updated realm definition.
-
-
Stop all client applications at the disaster recovery site.
All publisher and subscriber activity must be stopped so that no data is lost when the primary site is re-activated.
-
Verify replication of data to the primary site.
The administrative GUI can be used to verify that the same number of messages are stored at the primary and disaster recovery sites.
-
Stop all FTL servers at the disaster recovery site.
At this point the disaster recovery site has been disabled.
-
Activate the FTL servers at the primary site.
Issue the
activate_drREST command at the primary site. From this point ondrforin the primary site configuration file will be ignored, even if the FTL servers at the primary site restart.Optionally, at some later time, remove
drforfrom the primary site configuration file. For details onactivate_dr, see POST cluster. For details ondrfor, see FTL Server Configuration Parameters. -
Update the realm configuration to activate the persistence services at the primary site.
You may use the following for reference:
-
samples/yaml/dr/dr-cluster-sample-primary-activate.json -
samples/yaml/dr-secure/dr-cluster-sample-primary-activate.json
In the persistence clusters grid of the FTL server GUI, change the primary set of each participating cluster so that the persistence services at the primary site become the primary set. Disable DR replication by clearing DR Enabled at the cluster level.
Deploy the modified realm definition.
-
-
Ensure that the persistence services at the primary site form a quorum and then direct application clients to the primary site.
At this point, all operations should be migrated to the primary site. The remainder of this procedure deals with configuring the disaster recovery site to function again as a backup.
-
Clear all data at the disaster recovery site.
When new disaster recovery FTL servers at the disaster recovery site are started, there must be no leftover FTL processes or FTL data directories.
-
Start FTL servers at the disaster recovery site.
You may use the following for reference:
-
samples/yaml/dr/tibftlserver_dr.yaml -
samples/yaml/dr-secure/tibftlserver_dr.yaml
In each YAML configuration file, the
drforparameter must be defined, which is a list of addresses that the FTL server will use to connect to FTL servers at the primary site. For details see FTL Server Configuration Parameters.If authentication is required, configure the
userandpasswordparameters with credentials that the disaster recovery servers can use to authenticate themselves to the primary FTL servers. Ensure that this username is in theftl-internalauthorization group (according to the authentication service at the primary site). See FTL Server Authorization Groups.If TLS security is required, distribute the keystore file and trust file to all FTL servers. See Enabling TLS for FTL Server.
-
-
Enable DR connectivity at the primary site.
This will allow the FTL servers at the primary site to connect to the FTL servers at the disaster recovery site. No restart is required.
Issue the
enable_drREST command at the primary site, using the URLs of the disaster recovery FTL servers as the argument. From this point on, these URLs are persisted, even if FTL servers at the primary site are restarted. Optionally, at some later time, add thedrtoURLs to the configuration file at the primary site. For details onenable_dr, see POST cluster. For details ondrto, see FTL Server Configuration Parameters . -
Update the realm configuration to re-enable DR replication.
You may use the following for reference:
-
samples/yaml/dr/dr-cluster-sample.json -
samples/yaml/dr-secure/dr-cluster-sample.json
At the persistence cluster level, DR Enabled must be checked.
-
-
Verify replication of data to the disaster recovery site.
The administrative GUI can be used to verify that the same number of messages are stored at the primary and disaster recovery sites.