repmgr command reference sheet

All repmgr commands should be run as OS user postgres.
Notice that all paths include the postgresql version number, so care should be taken when copy/pasting commands.

Status checks:

/usr/pgsql-11/bin/repmgr -f /etc/repmgr/11/repmgr.conf cluster show
/usr/pgsql-11/bin/repmgr -f /etc/repmgr/11/repmgr.conf daemon status

Pause replication:

Use this when you want to perform maintenance on the master without a failover occuring.
/usr/pgsql-11/bin/repmgr -f /etc/repmgr/11/repmgr.conf daemon pause
/usr/pgsql-11/bin/repmgr -f /etc/repmgr/11/repmgr.conf daemon unpause

Manual switchover

You may sometimes need to switchover within the same site (for example to clear lots of wals on a master holding a dead replication slot). You can run the following command on the node you wish to promote to primary:
/usr/pgsql-<pg_version>/bin/repmgr -f /etc/repmgr/<pg_version>/repmgr.conf standby switchover --siblings-follow --force-rewind --dry-run
If this works, run it again without the --dry-run:
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby switchover --siblings-follow --force-rewind

Adding nodes back to a cluster:

There are two ways to add a failed primary back into the cluster as a slave.

Perform node rejoin
Perform a standby clone

Node rejoin:

Use this when a master failed and needs to become a slave to the new master.
This will only work if the postgres instance was shutdown gracefully. if it wasn’t , first attempt to start it and then stop it, with systemctl start postgresql-<pg_version>;systemctl stop postgresql-<pg_version>.

/usr/pgsql-<pg_version>/bin/repmgr -f /etc/repmgr/<pg_version>/repmgr.conf node rejoin -d 'host=<DB DNS name of current master node> dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,pg_hba.conf,server.crt,server.key

Real life example:

/usr/pgsql-11/bin/repmgr -f /etc/repmgr/11/repmgr.conf node rejoin -d 'host=hostname-prod-1.pgsql.azure.com dbname=repmgr user=repmgr' --force-rewind --config-files=postgresql.conf,pg_hba.conf,server.crt,server.key

You may also have to perform a standby register. (See below)

Standby clone:

Use this if:

A master node failed and needs to be added back to the cluster as a slave to the new master, but Only if node rejoin didn’t work!
You want to add a new VM to the cluster as a slave

ssh <failed primary>
sudo su - postgres
cp /var/lib/pgsql/<pg_version>/data/{postgresql.conf,pg_hba.conf,server.crt,server.key} /tmp/
rm -rf /var/lib/pgsql/<pg_version>/data/
rm -rf /temp_tbs/*  
/usr/pgsql-<pg_version>/bin/repmgr -f /etc/repmgr/<pg_version>/repmgr.conf standby clone -h <DB DNS name of current master node> -U repmgr -d repmgr -F --fast-checkpoint
cp /tmp/{postgresql.conf,pg_hba.conf,server.crt,server.key} /var/lib/pgsql/<pg_version>/data/
exit
sudo systemctl start postgresql-<pg_version>
sudo su - postgres
/usr/pgsql-<pg_version>/bin/repmgr -f /etc/repmgr/<pg_version>/repmgr.conf standby register --force

You may also have to restart repmrd, cehck with systemctl status repmgr<pg_version>. You can also check with repmgr daemon status. (with the repmgr full path, /usr/pgsql…)

Sometimes, the clone will fail for no obvious reason. In that case, you can run the same standby clone command with --log-level debug --verbose, and get more information on why it might be failing.

Real life example:

sudo su - postgres
cp /var/lib/pgsql/12/data/{postgresql.conf,pg_hba.conf} /tmp/
rm -rf /var/lib/pgsql/12/data/
exit
sudo su -
rm -rf /temp_tbs/*
exit
sudo su - postgres
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby clone -h devnode.com -U repmgr -d repmgr -F --fast-checkpoint
cp /tmp/{postgresql.conf,pg_hba.conf} /var/lib/pgsql/12/data/
exit
sudo systemctl start postgresql-12
sudo su - postgres
/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby register --force
exit
sudo systemctl status repmgr12