Planned Fail-back to the Active Satellite Site
At a high level, to fail-back to the active satellite site you need to simulate a failover to site 3. To ensure that no messages are lost, you may issue the suspend command to site 4 at the appropriate time (see the steps below). The suspend command causes the persistence services at site 4 to stop accepting messages from both clients and routes. This allows site 4 to finish replicating all data to site 3 before site 3 is activated. Then, once DNS is remapped, site 3 picks up where site 4 left off, accepting pending messages from clients and routes.
FTL configuration must still be managed at site 1 (or site 2, if site 2 happens to be active at the time).
-
Clear all data directories at site 3.
-
Start FTL servers at site 3. No change to the YAML file is needed.
Reference files:
-
samples/yaml/satellite-dr/tibftlserver_sat_primary.yaml
-
-
Update the realm configuration to re-enable DR replication for the affected persistence cluster. Verify DR replication of any pending messages (in the user interface).
Reference files:
-
samples/yaml/satellite-dr/satellite-dr-sample-satdr-failback.json
-
-
When ready for a planned failback, suspend messaging at site 4. Use the REST API (command
suspend). For details, seePOST cluster. -
Wait for the standby persistence services at site 3 to report their status as suspended (in the user interface).
-
Shut down site 4.
-
Clients and servers need to reconnect to site 3. Remap the DNS or restart them.
-
Update the realm configuration to activate messaging at site 3. This requires two changes to the persistence cluster: disable disaster recovery replication, and make site 3's server set the active server set. (Disaster recovery replication is re-enabled later once site 4 is brought back.)
Reference files:
-
samples/yaml/satellite-dr/satellite-dr-sample-satpri-activate.json
-
-
Clear all data directories at site 4.
-
Start FTL servers at site 4. No change to the YAML file is needed.
Reference files:
-
samples/yaml/satellite-dr/tibftlserver_sat_dr.yaml
-
-
Update the realm configuration to re-enable disaster recovery replication for the affected persistence cluster. Verify disaster recovery replication of any pending messages (example in the user interface).
Reference files:
-
samples/yaml/satellite-dr/satellite-dr-sample.json
-
-
At this point, the system is ready for another failover.