Pacemaker and Corosync HA


In this setup we will setup a HA failover solution using Corosync and Pacemake, in a Active/Passive setup.

Installation and Setup

Prerequisites

  • Hosts or DNS resolvers
  • NTP Must be installed and configured on all nodes
cat /etc/hosts
10.0.1 10	ha1 server01
10.0.1.11	ha2 server02

Installation
We will install pacemaker, it should install corosync as an dependency, if not install it.

apt-get install pacemaker

Edit corosync.conf. The bind address is the network address, NOT the IP. The mcastaddr is default, which is fine.

cat /etc/corosync/corosync.conf
interface {
        # The following values need to be set based on your environment
        ringnumber: 0
        bindnetaddr: 10.0.1.0
        mcastaddr: 226.94.1.1
        mcastport: 5405
   }

We also want corosync to start pacemaker automatically. If we do not do this, we will have to start pacemaker manually.
ver: 0 Indicates corosync to start pacemaker automatically. Setting it to 1, will require manually start of pacemaker!

 cat /etc/corosync/corosync.conf
service {
 	# Load the Pacemaker Cluster Resource Manager
 	ver:       0
 	name:      pacemaker
}

Copy/paste the content of corosync.conf, or scp the file to the second node.

scp /etc/corosync/corosync.conf 10.0.1.11:/etc/corosync/corosync.conf

Make corosync starts at boot time.

cat /etc/default/corosync
# start corosync at boot [yes|no]
START=yes

Start corosync

/etc/init.d/corosync start

Check the status of the cluster

Last updated: Fri Jun  9 11:02:55 2017          Last change: Wed Jun  7 14:26:06 2017 by root via cibadmin on server01
Stack: corosync
Current DC: server01 (version 1.1.14-70404b0) - partition with quorum
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ server01 ]

Copy the config file to the second node

scp /etc/corosync/corosync.conf server02:/etc/corosync/

Now on the second node, try to start corosync

/etc/init.d/corosync start

Check the status again. We should now hopefully see the second node joining. If this fails check the firewall settings and hosts file (they must be able to resolve).

We are getting some warnings. Use the following commands:

crm configure property stonith-enabled=false
sudo crm configure property no-quorum-policy=ignore
crm_verify -L

Now add a virtual IP to the cluster.

crm configure primitive VIP ocf:IPaddr2 params ip=10.0.1.100 nic=eth0 op monitor interval=10s

Now we should have added an VIP/Floating IP, we can test this by a simple ping. Should respond from both nodes.

Adding Resources: Services

Now we are ready to add a service to our cluster. In this example we use a postfix service (smtp) that we want to failover. Postfix must be installed on both nodes

crm configure primitive HA-postfix lsb:postfix op monitor interval=15s

Check the status.

crm status

As we have not linked the IP to the service yet, postfix could be running on server02 while the IP is on server01. We need to set them both in one HA group.

crm configure group HA-Group VIP HA-postfix

If we check the status again, we can see that the two resources are now running on the same server.

Online: [ server01 server02 ]

 Resource Group: HA-Group
     VIP	(ocf::heartbeat:IPaddr2):	Started server01
     HA-postfix	(lsb:postfix):	Started server01

Looks good !

If an resource fails, for some reason, like postfix crashes, and cannot start again, we want to migrate to another server.
Per default the migration-threshold is not defined/set to infinity, which will never migrate it.

When we have 3 fails, migrate the node, and expire the failed resource after 60 seconds. This will allow it to automatically to move it back to this node.

primitive HA-postfix lsb:postfix \
        op monitor interval="15s" \
        meta target-role="Started" migration-threshold="3" failure-timeout=60s 

Now we are DONE!

Some extra commands that might be usefull when managing the cluster:

Deleting a resource

crm resource stop HA-XXXX
crm configure delete HA-XXXX

Where XXXX is the name of the HA cluster.

Migrate / Move Resource

crm_resource --resource HA-Group --move --node server02

View configuration

crm configure show

View status and fail counts

crm_mon -1 --fail