In this setup we will setup a HA failover solution using Corosync and Pacemake, in a Active/Passive setup.
Installation and Setup
- Hosts or DNS resolvers
- NTP Must be installed and configured on all nodes
10.0.1 10 ha1 server01
10.0.1.11 ha2 server02
We will install pacemaker, it should install corosync as an dependency, if not install it.
[code]apt-get install pacemaker[/code]
Edit corosync.conf. The bind address is the network address, NOT the IP. The mcastaddr is default, which is fine.
# The following values need to be set based on your environment
We also want corosync to start pacemaker automatically. If we do not do this, we will have to start pacemaker manually.
ver: 0 Indicates corosync to start pacemaker automatically. Setting it to 1, will require manually start of pacemaker!
[code] cat /etc/corosync/corosync.conf
# Load the Pacemaker Cluster Resource Manager
Copy/paste the content of corosync.conf, or scp the file to the second node.
[code]scp /etc/corosync/corosync.conf 10.0.1.11:/etc/corosync/corosync.conf[/code]
Make corosync starts at boot time.
# start corosync at boot [yes|no]
Check the status of the cluster
[code]Last updated: Fri Jun 9 11:02:55 2017 Last change: Wed Jun 7 14:26:06 2017 by root via cibadmin on server01
Current DC: server01 (version 1.1.14-70404b0) – partition with quorum
2 Nodes configured, 2 expected votes
0 Resources configured.
Online: [ server01 ]
Copy the config file to the second node
[code]scp /etc/corosync/corosync.conf server02:/etc/corosync/
Now on the second node, try to start corosync
Check the status again. We should now hopefully see the second node joining. If this fails check the firewall settings and hosts file (they must be able to resolve).
We are getting some warnings. Use the following commands:
[code]crm configure property stonith-enabled=false
sudo crm configure property no-quorum-policy=ignore
Now add a virtual IP to the cluster.
[code]crm configure primitive VIP ocf:IPaddr2 params ip=10.0.1.100 nic=eth0 op monitor interval=10s
Now we should have added an VIP/Floating IP, we can test this by a simple ping. Should respond from both nodes.
Adding Resources: Services
Now we are ready to add a service to our cluster. In this example we use a postfix service (smtp) that we want to failover. Postfix must be installed on both nodes
[code]crm configure primitive HA-postfix lsb:postfix op monitor interval=15s[/code]
Check the status.
As we have not linked the IP to the service yet, postfix could be running on server02 while the IP is on server01. We need to set them both in one HA group.
[code]crm configure group HA-Group VIP HA-postfix[/code]
If we check the status again, we can see that the two resources are now running on the same server.
Online: [ server01 server02 ]
Resource Group: HA-Group
VIP (ocf::heartbeat:IPaddr2): Started server01
HA-postfix (lsb:postfix): Started server01[/code]
Looks good !
If an resource fails, for some reason, like postfix crashes, and cannot start again, we want to migrate to another server.
Per default the migration-threshold is not defined/set to infinity, which will never migrate it.
When we have 3 fails, migrate the node, and expire the failed resource after 60 seconds. This will allow it to automatically to move it back to this node.
[code]primitive HA-postfix lsb:postfix \
op monitor interval="15s" \
meta target-role="Started" migration-threshold="3" failure-timeout=60s
Now we are DONE!
Some extra commands that might be usefull when managing the cluster:
Deleting a resource
[code]crm resource stop HA-XXXX
crm configure delete HA-XXXX[/code]
Where XXXX is the name of the HA cluster.
Migrate / Move Resource
[code]crm_resource –resource HA-Group –move –node server02[/code]
[code]crm configure show[/code]
View status and fail counts
[code]crm_mon -1 –fail[/code]