Multi-Site Active-Active DR Strategy in SITE Cloud
By
Cloud Product Team • 9 min read •November 6, 2025

Implementing a Multi-Region Active/Active Disaster Recovery Strategy
In this post, you'll learn how to implement an Active/Active Disaster Recovery (DR) strategy to run your workload and serve requests concurrently from the Riyadh and Jeddah SITE Cloud Regions. This strategy is designed to ensure high availability and business continuity, allowing your workload to remain accessible and operational despite major outage events, such as natural disasters, systemic technical failures, or significant human error.
Multi-Site Active/Active Architecture
The architecture illustrates how to leverage SITE Cloud Regions as your active sites, establishing a multi-Region Active/Active deployment.
- Workloads: Each Region hosts a highly available (HA), multi-Availability Zone (Z1,Z2) workload stack. This architecture inherently offers resilience against localized failures within a single site.
- Data Replication: Data is synchronously or asynchronously replicated live from the Riyadh primary database to the Jeddah database instance.
- Backup and Recovery: A robust backup and recovery mechanism is configured in both Regions. This provides protection against logical disasters, such as data corruption or accidental deletion, enabling a point-in-time recovery (PITR) to the last known good state.
Figure 1: Architecture diagram for active-active multi-site 
Global Traffic Routing
Each regional application stack is designed to serve production traffic. The architecture utilizes the SITE Cloud Global Load Balancer (GLB), a highly available and scalable cloud Domain Name System (DNS) service, for traffic steering. The SITE Cloud GLB supports various routing policies, including:
Active-Active Configuration
Two or more sites are mapped to the Active-Active GLB configuration. Upon a DNS query, the GLB returns one of the configured IP addresses for the active endpoints. The selection is typically non-deterministic (e.g., random or round-robin without session persistence) across the sites. This configuration is predicated on the assumption of identical workload setups (or symmetrical scaling) across all active sites, making cost optimization a significant factor.
Active-Passive Configuration
Two or more sites are mapped to an Active-Passive setup. Traffic is only routed to the secondary (passive) site if the primary (active) site is deemed unhealthy or all its pool members are unavailable.
Active/Active Data Replication
SITE Cloud DBaaS-PostgreSQL utilizes its Cluster-to-Cluster (C2C) feature to support PostgreSQL replication. This typically involves a primary cluster replicating to one or more standby/secondary clusters.
Bi-directional Replication Considerations
While bi-directional replication can be configured for a multi-active architecture, it introduces significant complexity. Native PostgreSQL replication (including logical replication) lacks built-in conflict detection and resolution. Implementing bi-directional data flow requires meticulous application design to ensure disjoint write sets and prevent data conflicts (e.g., ensuring a record is only written to in a single region).
Reads from Replicas
To optimize performance, read operations are typically distributed across one or more replica nodes (standbys/secondaries). This offloads read traffic from the primary instance, reducing its transaction load and enhancing overall system responsiveness, which is particularly beneficial for read-heavy applications.
Automated Failover
In an Active/Active multi-Region strategy, if a Region's workload becomes inoperable or reports a failure, the automated failover process will reroute traffic away from the impacted Region to the remaining healthy Region.
This is managed effectively using SITE Cloud Global Load Balancers health checks. For the failover to execute rapidly and meet your Recovery Time Objective (RTO) targets, it is crucial to set a low Time-To-Live (TTL) value on the associated DNS records. A low TTL ensures that DNS resolvers refresh the cached information quickly, reflecting the updated, healthy IP addresses.
Cloud Security Overview
Network Security
- North-South Traffic: All external (North-South) network traffic is inspected by Next-Generation Firewalls (NGFW), providing essential Layer-7 application-layer security.
- Web Application Firewall (WAF): A Cloud WAF is integrated with the Application Load Balancers to provide robust defense against common web exploits and vulnerabilities.
Endpoint Security
- Hardened Images: Only security-hardened Virtual Machine (VM) images are provided by default, reducing the attack surface.
- Microsegmentation: VMs are deployed with default microsegmentation, which offers Layer-7 firewalling capabilities that actively prevent lateral threat movement across the internal network.
- Threat Monitoring: VMs include built-in Endpoint Protection Platform (EPP) and Endpoint Detection and Response (EDR) security agents.
- 24/7 SOC & NOC : EPP & EDR facilitates 24/7 threat monitoring and incident response by the SITE Cloud Security Operations Center (SOC) and Network Operations Center (NOC).
Conclusion and Trade-offs
The multi-site Active/Active strategy is the optimal choice for workloads demanding the quickest recovery time (lowest RTO) and the least data loss (lowest Recovery Point Objective - RPO). A multi-Region implementation provides the maximum geographical separation and operational independence between sites, and also offers the advantage of low-latency access for a distributed user base. Trade-offs must be considered: the implementation, synchronization, and ongoing operation of this strategy, especially across multiple Regions, are typically more complex and significantly more expensive than simpler DR architectures.