VCS Learning : Learn about Cluster Hearbeats

Cluster heartbeat connections are important components of VCS cluster, to allow cluster to work as expected for fail over and failback operations. In this post i will be showing to some basic verification that helps unix admins to confirm the health status of the cluster heartbeats.

 
Once we have the hardware in place and VCS installed, we can perform below health checks for cluster heartbeats
 
 
 
 
In this configuration our network interfaces are referred as below:
 
e1 –  e1000g:0
e2 –  e1000g:1
e3 –  e1000g:2
e4 –  e1000g:3
 
Before connecting the heartbeat we will have the llt status as below
 
 i1
root@gurkulvcs1#/sbin/lltstat -nvv
LLT node information:
    Node                 State    Link  Status  Address
   * 0 gurkulvcs1        OPEN
                                  e1000g2   UP      08:00:27:2F:2B:00
                                  e1000g3   UP      08:00:27:89:5A:0E
     1 gurkulvcs2        CONNWAIT
                                  e1000g2   DOWN
                                  e1000g3   DOWN
 
root@gurkulvcs2#/sbin/lltstat -nvv
LLT node information:
    Node                 State    Link  Status  Address
     0 gurkulvcs1        CONNWAIT
                                  e1000g2   DOWN
                                  e1000g3   DOWN
   * 1 gurkulvcs2        OPEN
                                  e1000g2   UP      08:00:27:75:65:A5
                                  e1000g3   UP      08:00:27:03:A9:C7
 
Now we want the heartbeat connections ( Note : we have to use cross over cables , if we want to connect the heartbeat cables directly from one host to another host ) to be connected as follows
 
 
 
i2
root@gurkulvcs1#/sbin/lltstat -nvv
LLT node information:
    Node                 State    Link  Status  Address
   * 0 gurkulvcs1        OPEN
                                  e1000g2   UP      08:00:27:2F:2B:00
                                  e1000g3   UP      08:00:27:89:5A:0E
     1 gurkulvcs2        OPEN
                                  e1000g2   UP      08:00:27:75:65:A5
                                  e1000g3   UP      08:00:27:03:A9:C7
 
 
 
 
root@gurkulvcs2#/sbin/lltstat -nvv
LLT node information:
    Node                 State    Link  Status  Address
     0 gurkulvcs1        OPEN
                                  e1000g2   UP      08:00:27:2F:2B:00
                                  e1000g3   UP      08:00:27:89:5A:0E
   * 1 gurkulvcs2        OPEN
                                  e1000g2   UP      08:00:27:75:65:A5
                                  e1000g3   UP      08:00:27:03:A9:C7
 
 
 
Note : Make sure you connect the heartbeat connections to same type of  and interface level from both sides. Example 
 
gurkulvcs1 : eth2   <—>  eth1:gurkuvcs2
gurkulvcs1 : eth3   <—>  eth2:gurkulvcs2
 
 
How to detect when one of the heartbeat connection fails?
 
For example, from the below diagram you can see, the link connected to e4 is missing.
 
 
i3
 
Let’s check the llt status in this scenario:
 
 
root@gurkulvcs1#/sbin/lltstat -nvv
LLT node information:
    Node                 State    Link  Status  Address
   * 0 gurkulvcs1        OPEN
                                  e1000g2   UP      08:00:27:2F:2B:00
                                  e1000g3   UP      08:00:27:89:5A:0E
     1 gurkulvcs2        OPEN
                                  e1000g2   UP      08:00:27:75:65:A5
                                  e1000g3   DOWN
 
 
 
root@gurkulvcs2#/sbin/lltstat -nvv
LLT node information:
    Node                 State    Link  Status  Address
     0 gurkulvcs1        OPEN
                                  e1000g2   UP      08:00:27:2F:2B:00
                                  e1000g3   DOWN
   * 1 gurkulvcs2        OPEN
                                  e1000g2   UP      08:00:27:75:65:A5
                                  e1000g3   UP      08:00:27:03:A9:C7
 
If you notice properly , the lltstat in each node sees that e4 the interface from other node down.In this case it is easy to understand that the link between the e4 interfaces is faulty. 
 
So far so good, but consider that we have cables connected wrongly , just shown as below
 
 
i4
 
 
in our cluster configuration we have defined the heartbeat configuration as below
 
======================
root@gurkulvcs1#cat /etc/llttab
set-node gurkulvcs1
set-cluster 1
link e1000g2 /dev/e1000g:2 – ether – –
link e1000g3 /dev/e1000g:3 – ether – –
=====================
 
the two link statement says our heartbeats as 
 
gurkulvcs1 : eth2   <—>  eth1:gurkuvcs2
gurkulvcs1 : eth3   <—>  eth2:gurkulvcs2
 
Where as our physical network cable connected 
 
gurkulvcs1 : e3   <–>  gurkulvcs2 : e4
gurkulvcs1 : e4  <—> gurkulvcs2 : e3
 
 
 
In the above Scenario, it is difficult realize the connection mistake using the lltstatus  output. Because it shows that both the heartbeats up in both nodes
 
root@gurkulvcs1#/sbin/lltstat -nvv
LLT node information:
    Node                 State    Link  Status  Address
   * 0 gurkulvcs1        OPEN
                                  e1000g2   UP      08:00:27:2F:2B:00
                                  e1000g3   UP      08:00:27:89:5A:0E
     1 gurkulvcs2        OPEN
                                  e1000g2   UP      08:00:27:03:A9:C7
                                  e1000g3   UP      08:00:27:75:65:A5
 
 
 
root@gurkulvcs2#/sbin/lltstat -nvv
LLT node information:
    Node                 State    Link  Status  Address
     0 gurkulvcs1        OPEN
                                  e1000g2   UP      08:00:27:89:5A:0E
                                  e1000g3   UP      08:00:27:2F:2B:00
   * 1 gurkulvcs2        OPEN
                                  e1000g2   UP      08:00:27:75:65:A5
                                  e1000g3   UP      08:00:27:03:A9:C7
 
 
Then how do we troubleshoot these wrong connections?
 
we have a builtin ping tool in VCS to test physical connectivity of Cluster heartbeat connections  i.e.  dlpiping. But this tool works in client-server mode, that means you have to start the dlpiping in server mode on one node, and start the client on other node. Below Steps explains the usage of dlpiping
 
 
step1 :  Get the MAC address of the for gurkulvcs1’s e3 ( i.e. e1000g2) interface , and then start the dlpiping in server mode 
 
root@gurkulvcs1#/opt/VRTSllt/getmac /dev/e1000g:2
/dev/e1000g:2   08:00:27:2F:2B:00
 
root@gurkulvcs1#/opt/VRTSllt/dlpiping -vs /dev/e1000g:2
dlpiping: opening network device: /dev/e1000g (unit 2)
dlpiping: binding ping SAP 0xf00e
 
step2 : start the dlpiping in client from the gurkulvcs2’s e3  ( i.e. e1000g2) interface using the MAC address of the interface where the server started ( i.e.  , as below gurkulvcs1 : e1000g2 –> 08:00:27:2F:2B:00 )
 
 
if those two interfaces connected properly with each other you will see below output
 
 
root@gurkulvcs2#/opt/VRTSllt/dlpiping -vc /dev/e1000g:2 08:00:27:2F:2B:00
dlpiping: opening network device: /dev/e1000g (unit 3)
dlpiping: binding ping SAP 0xf00e
dlpiping: sent a request to 08:00:27:2F:2B:00:0E:FFFFFFF0
dlpiping: received a packet from 08:00:27:2F:2B:00:0E:FFFFFFF0
08:00:27:2F:2B:00 is alive
root@gurkulvcs2#
 
Incase if they are not connected to each other you will see below output
 
root@gurkulvcs2#/opt/VRTSllt/dlpiping -vc /dev/e1000g:2 08:00:27:2F:2B:00
dlpiping: opening network device: /dev/e1000g (unit 3)
dlpiping: binding ping SAP 0xf00e
dlpiping: sent a request to 08:00:27:2F:2B:00:0E:FFFFFFF0
dlpiping: sent a request to 08:00:27:2F:2B:00:0E:FFFFFFF0
dlpiping: sent a request to 08:00:27:2F:2B:00:0E:FFFFFFF0
dlpiping: sent a request to 08:00:27:2F:2B:00:0E:FFFFFFF0
dlpiping: sent a request to 08:00:27:2F:2B:00:0E:FFFFFFF0
 
And if you suspect cross connection use the other heartbeat interface to ping the dlpiping server interface. And if the outputt shows sent / received success as below, then you know that the heartbeat cables connected wrongly.
 
 
root@gurkulvcs2#/opt/VRTSllt/dlpiping -vc /dev/e1000g:3 08:00:27:2F:2B:00
dlpiping: opening network device: /dev/e1000g (unit 3)
dlpiping: binding ping SAP 0xf00e
dlpiping: sent a request to 08:00:27:2F:2B:00:0E:FFFFFFF0
dlpiping: received a packet from 08:00:27:2F:2B:00:0E:FFFFFFF0
08:00:27:2F:2B:00 is alive
root@gurkulvcs2#
 
 
 
 
Ramdev

Ramdev

I have started unixadminschool.com ( aka gurkulindia.com) in 2009 as my own personal reference blog, and later sometime i have realized that my leanings might be helpful for other unixadmins if I manage my knowledge-base in more user friendly format. And the result is today's' unixadminschool.com. You can connect me at - https://www.linkedin.com/in/unixadminschool/

17 Responses

  1. Kiran says:

    Ramdev ,

    Its really helpful in troubleshooting VCS heartbeat issue Thanks for nice post !!!
    Keep posting on good topic thank you very much !!!!

  2. Ramdev Ramdev says:

    Hi Kiran, Thanks for your regular feedback. Appreciate it.

  3. ramakrishna says:

    Hi Ram anna….

    Thank you for VERY GOOD POSTING….

  4. RAMA RAO VASIMALLA says:

    Hi RamDev superb post..

    Thank you.

  5. Kapil Kamboj says:

    Simple and very clear details.

    Thanks a lot Ramdev !

  6. sreenivas says:

    Hi Ram,

    Thanks for this nice post,

    I have some confusion on these outputs , please correct me if i am wrong,
    step2 : if those two interfaces connected properly with each other you will see below output and

    And if you suspect cross connection use the other heartbeat interface to ping the dlpiping server interface. And if the outputt shows sent / received success as below, then you know that the heartbeat cables connected wrongly

    Thanks in advance

  7. Ramdev Ramdev says:

    Hi Srinivas –

    actual connection:

    gurkulvcs1 : e3   < ----> gurkulvcs2 : e3
    gurkulvcs1 : e4   < ----> gurkulvcs2 : e4

    Initially i have started the dlpiping service on “gurkulvcs1 : e3”

    test 1 : ping from gurkulvcs2 : e3  to gurkulvcs1:e3 — if success the connections are connected straight and as expected

    test 2: ping from gurkulvcs2 : e3  to gurkulvcs1:e3 — if failed then there might be two issues one is the nework cable  second one is cross connection. In the next step we will see if there is any cross connection

    test 3 : ping from gurkulvcs2 : e4  to gurkulvcs1:e3 : -if success , then it means the cables are crossed as below.

    gurkulvcs1 : e3   < ----> gurkulvcs2 : e4
    gurkulvcs1 : e4   < ----> gurkulvcs2 : e3

    hope that helps.

  8. Sreenivas says:

    Hi Ram, Thank your very much for your quick reply. Now clearly understood the concepts

  9. Rahul says:

    Hi All,

    I just want to know . is there any way to implement the same setup for practice in Laptop (mean virtually ) for just practicing purpose.

    please assist me on this.
    Thanks in Advance.

  10. syed says:

    Hello Ramdev,

    This site really helps even a novice to learn and understand the technology well. Really great work. Appreciate that!!!.

    Feedback from my side is that there is no flow with the topics. I hope it is good to put the next and previous article link that will be great.

    This site really helps a lot. :)

  11. V Kumar says:

    Hi Ram,
    Nice to see such good posts; Please keep posting any new information or updated ones;
    It can help whole IT community.

  12. V Kumar says:

    a query : if we are planning to configure such heartbeat network using HP Blade running on c7000 enclosure, how it can be done;
    Please shed some light on this

  1. September 15, 2015

    […] VCS Learning : Learn about Cluster Heartbeats […]

  2. September 17, 2015

    […] VCS Learning : Learn about Cluster Heartbeats […]

  3. September 24, 2015

    […] Read – VCS for Beginners – Learn about Cluster Hearbeats […]

What is in your mind, about this post ? Leave a Reply

Close
  Our next learning article is ready, subscribe it in your email

What is your Learning Goal for Next Six Months ? Talk to us