Deploy on multiple availability zones (AZ)

During the deployment to hardware/VMs, it is important to spread all the database copies (Juju units) to different hardware servers, or even better, to the different availability zones (AZ). This will guarantee no shared service-critical components across the DB cluster (eliminate the case with all eggs in the same basket).

This guide will take you through deploying a PostgreSQL cluster on GCE using 3 available zones. All Juju units will be set up to sit in their dedicated zones only, which effectively guarantees database copy survival across all available AZs.

Prerequisites

  • A physical or virtual machine running Ubuntu 22.04+

  • Juju 3 (3.6+ is recommended)

  • A cloud service that supports and provides availability zones.

Note

Multi-availability zones are enabled by default on EC2/GCE and supported by LXD and MicroCloud.

Set up GCE on Google Cloud

Let’s deploy a PostgreSQL cluster on GKE (us-east4) using all 3 zones there (us-east4-a, us-east4-b, us-east4-c) and make sure all pods always sits in the dedicated zones only.

Caution

Creating the following GKE resources may cost you money - be sure to monitor your GCloud costs.

Log into Google Cloud and bootstrap GCE on Google Cloud:

gcloud auth login
gcloud iam service-accounts keys create sa-private-key.json  --iam-account=juju-gce-account@[your-gcloud-project-12345].iam.gserviceaccount.com
sudo mv sa-private-key.json /var/snap/juju/common/sa-private-key.json
sudo chmod a+r /var/snap/juju/common/sa-private-key.json

juju add-credential google
juju bootstrap google gce
juju add-model mymodel

Deploy PostgreSQL with Juju zones constraints

Juju provides support for availability zones using constraints.

The command below demonstrates how Juju automatically deploys Charmed PostgreSQL VM using Juju constraints:

juju deploy postgresql --channel 14/stable -n 3 \
  --constraints zones=us-east1-b,us-east1-c,us-east1-d

After a successful deployment, juju status will show an active application:

Model    Controller  Cloud/Region     Version    SLA          Timestamp
mymodel  gce         google/us-east1  3.5.4      unsupported  00:16:52+02:00

App         Version  Status  Scale  Charm       Channel    Rev  Exposed  Message
postgresql  14.12    active      3  postgresql  14/stable  468  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/0   active    idle   0        34.148.44.51    5432/tcp  
postgresql/1   active    idle   1        34.23.202.220   5432/tcp  
postgresql/2*  active    idle   2        34.138.167.85   5432/tcp  Primary

Machine  State    Address        Inst id        Base          AZ          Message
0        started  34.148.44.51   juju-e7c0db-0  [email protected]  us-east1-d  RUNNING
1        started  34.23.202.220  juju-e7c0db-1  [email protected]  us-east1-c  RUNNING
2        started  34.138.167.85  juju-e7c0db-2  [email protected]  us-east1-b  RUNNING

and each unit/vm will sit in the separate AZ out of the box:

> gcloud compute instances list
NAME           ZONE        MACHINE_TYPE  PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
juju-a82dd9-0  us-east1-b  n1-highcpu-4               10.142.0.30  34.23.252.144  RUNNING  # Juju Controller
juju-e7c0db-2  us-east1-b  n2-highcpu-2               10.142.0.32  34.138.167.85  RUNNING  # postgresql/2
juju-e7c0db-1  us-east1-c  n2-highcpu-2               10.142.0.33  34.23.202.220  RUNNING  # postgresql/1
juju-e7c0db-0  us-east1-d  n2-highcpu-2               10.142.0.31  34.148.44.51   RUNNING  # postgresql/0

Simulation: A node gets lost

Let’s destroy a GCE node and recreate it using the same AZ:

> gcloud compute instances delete juju-e7c0db-1 
No zone specified. Using zone [us-east1-c] for instance: [juju-e7c0db-1].
The following instances will be deleted. Any attached disks configured to be auto-deleted will be deleted unless they are attached to any other instances or the `--keep-disks` flag is given and specifies them for keeping. Deleting a disk is 
irreversible and any data on the disk will be lost.
 - [juju-e7c0db-1] in [us-east1-c]

Do you want to continue (Y/n)?  Y

Deleted [https://www.googleapis.com/compute/v1/projects/data-platform-testing-354909/zones/us-east1-c/instances/juju-e7c0db-1].
Model    Controller  Cloud/Region     Version    SLA          Timestamp
mymodel  gce         google/us-east1  3.5.4      unsupported  00:25:14+02:00

App         Version  Status  Scale  Charm       Channel    Rev  Exposed  Message
postgresql  14.12    active    2/3  postgresql  14/stable  468  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/0   active    idle   0        34.148.44.51    5432/tcp  
postgresql/1   unknown   lost   1        34.23.202.220   5432/tcp  agent lost, see 'juju show-status-log postgresql/1'
postgresql/2*  active    idle   2        34.138.167.85   5432/tcp  Primary

Machine  State    Address        Inst id        Base          AZ          Message
0        started  34.148.44.51   juju-e7c0db-0  [email protected]  us-east1-d  RUNNING
1        down     34.23.202.220  juju-e7c0db-1  [email protected]  us-east1-c  RUNNING
2        started  34.138.167.85  juju-e7c0db-2  [email protected]  us-east1-b  RUNNING

Here we should remove the no-longer available server/vm/GCE node and add a new one. Juju will create it in the same AZ us-east4-c:

> juju remove-unit postgresql/1 --force --no-wait
WARNING This command will perform the following actions:
will remove unit postgresql/1

Continue [y/N]? y

The command juju status shows the machines in a healthy state, but PostgreSQL HA recovery is necessary:

Model    Controller  Cloud/Region     Version    SLA          Timestamp
mymodel  gce         google/us-east1  3.5.4      unsupported  00:30:09+02:00

App         Version  Status  Scale  Charm       Channel    Rev  Exposed  Message
postgresql  14.12    active      2  postgresql  14/stable  468  no       

Unit           Workload  Agent  Machine  Public address  Ports     Message
postgresql/0   active    idle   0        34.148.44.51    5432/tcp  
postgresql/2*  active    idle   2        34.138.167.85   5432/tcp  Primary

Machine  State    Address        Inst id        Base          AZ          Message
0        started  34.148.44.51   juju-e7c0db-0  [email protected]  us-east1-d  RUNNING
2        started  34.138.167.85  juju-e7c0db-2  [email protected]  us-east1-b  RUNNING

Request Juju to add a new unit in the proper AZ:

juju add-unit postgresql -n 1

Juju uses the right AZ where the node is missing. Run juju status:

Model    Controller  Cloud/Region     Version    SLA          Timestamp
mymodel  gce         google/us-east1  3.5.4      unsupported  00:30:42+02:00

App         Version  Status  Scale  Charm       Channel    Rev  Exposed  Message
postgresql           active    2/3  postgresql  14/stable  468  no       

Unit           Workload  Agent       Machine  Public address  Ports     Message
postgresql/0   active    idle        0        34.148.44.51    5432/tcp  
postgresql/2*  active    idle        2        34.138.167.85   5432/tcp  Primary
postgresql/3   waiting   allocating  3                                  waiting for machine

Machine  State    Address        Inst id        Base          AZ          Message
0        started  34.148.44.51   juju-e7c0db-0  [email protected]  us-east1-d  RUNNING
2        started  34.138.167.85  juju-e7c0db-2  [email protected]  us-east1-b  RUNNING
3        pending                 juju-e7c0db-3  [email protected]  us-east1-c  starting

Remove GCE setup

Caution

Do not forget to remove your test setup - it can be costly!

Check the list of currently running GCE instances:

> gcloud compute instances list
NAME           ZONE        MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP    STATUS
juju-a82dd9-0  us-east1-b  n1-highcpu-4                10.142.0.30  34.23.252.144  RUNNING
juju-e7c0db-2  us-east1-b  n2-highcpu-2                10.142.0.32  34.138.167.85  RUNNING
juju-e7c0db-3  us-east1-c  n2d-highcpu-2               10.142.0.34  34.23.202.220  RUNNING
juju-e7c0db-0  us-east1-d  n2-highcpu-2                10.142.0.31  34.148.44.51   RUNNING

Request Juju to clean all GCE resources:

juju destroy-controller gce --no-prompt --force --destroy-all-models

Re-check that there are no running GCE instances left (it should be empty):

gcloud compute instances list