Recovering after Disaster

Add these steps to your enterprise's comprehensive plan for switching business operations to the disaster recovery site. When disaster disables the main site, administrators complete these steps as part of the comprehensive plan.

Procedure

  1. Optional. Remap DNS addresses.
    If your disaster recover plan includes remapping the DNS addresses of realm servers and persistence servers, then complete this action first.
  2. Restart the remaining realm servers in this order.
    1. Stop all remaining realm servers.
    2. Restart the former disaster recovery realm server, omitting the command line parameter --drfor, so this process becomes the new primary realm server.
      If authentication is required, supply the --server.user and --server.password parameters, with credentials that the new primary server can use to authenticate itself to its affiliated servers.

      If satellites and backups require different authentication credentials, then also supply the --server.authtobackup.user and --server.authtobackup.password parameters. The new primary server uses these credentials to authenticate itself to its backup server.

      Ensure that these user names are in the authorization group ftl-primary.

    3. Restart the backup servers at the disaster recovery site with unchanged parameter values.
    4. Restart the satellite servers.
      Adjust the values of their --satelliteof parameters, if needed, so that they connect to the new primary server. If your recovery plan remaps DNS addresses, then it is not necessary to adjust these values.
  3. Activate the persistence servers at the disaster recovery site.
    1. Using the browser GUI of the new primary realm server, navigate to the clusters grid, and ensure that the Primary Set column is visible.
    2. Change the primary set of each participating cluster so that the persistence servers at the disaster recovery site will become the primary set.
    3. After changing the primary server set for all participating persistence clusters, deploy the new realm definition.
  4. Ensure that the persistence servers at the disaster recovery site form a quorum.
    Check the persistence clusters status table and its servers list sub-tables to verify that every particpating cluster has formed a quorum, and that all the servers in each cluster are synchronized.

    If the cluster cannot form a quorum, clients cannot connect to its servers. Consider forcing a quorum; see Before Forcing a Quorum.

  5. Direct application clients to servers at the disaster recovery site.
    Choose only one of these two alternatives:
    • If you remapped the DNS addresses of realm servers and persistence servers:
      1. Verify that the new DNS information has propagated.
      2. Verify that all clients automatically connect or reconnect to servers at the disaster recovery site, and are operating correctly.
    • Otherwise, when you restart all application clients, explicitly supply the locations of the new primary realm server or one of its satellites.
    The disaster recovery site is now the new active site of FTL operations.
  6. Arrange another disaster recovery site, to protect against a disaster at the newly active site.