I thought that it would be better to have the SRX clustering post in multiple posts, as my first post got pretty long! So here is part 2 😀
Lets dive straight in!
Having configured the cluster in my previous post, we will see how the failover process works. I will be using two methods for failover testing will:
i) A manual failover, where I will manually failover redundancy group 1 from node0 to node1
ii) An interface failover (hard failover), where I will shutdown the ports on node0 and the cluster should failover to node1
Pre Testing Checks
[email protected]_SRX220_Top> show chassis cluster status Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 5 node0 100 primary no no node1 1 secondary no no Redundancy group: 1 , Failover count: 31 node0 100 primary yes no node1 1 secondary yes no
[email protected]_SRX220_Top> show security flow session node0: -------------------------------------------------------------------------- Session ID: 5932, Policy name: ping/4, State: Active, Timeout: 2, Valid In: 172.16.0.2/11 --> 192.168.0.2/10498;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Out: 192.168.0.2/10498 --> 172.16.0.2/11;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Session ID: 5933, Policy name: ping/5, State: Active, Timeout: 2, Valid In: 192.168.0.2/3 --> 172.16.0.2/10500;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Out: 172.16.0.2/10500 --> 192.168.0.2/3;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Session ID: 5934, Policy name: ping/4, State: Active, Timeout: 2, Valid In: 172.16.0.2/12 --> 192.168.0.2/10498;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Out: 192.168.0.2/10498 --> 172.16.0.2/12;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Session ID: 5935, Policy name: ping/5, State: Active, Timeout: 2, Valid In: 192.168.0.2/4 --> 172.16.0.2/10500;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Out: 172.16.0.2/10500 --> 192.168.0.2/4;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Session ID: 5936, Policy name: ping/4, State: Active, Timeout: 4, Valid In: 172.16.0.2/13 --> 192.168.0.2/10498;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Out: 192.168.0.2/10498 --> 172.16.0.2/13;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Session ID: 5937, Policy name: ping/5, State: Active, Timeout: 4, Valid In: 192.168.0.2/5 --> 172.16.0.2/10500;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Out: 172.16.0.2/10500 --> 192.168.0.2/5;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Total sessions: 6 node1: -------------------------------------------------------------------------- Total sessions: 0
Test A (Manual failover)
To perform a manual failover you will need to run the command request chassis cluster failover redundancy-group {0|1} node {0|1}
[email protected]_SRX220_Top> request chassis cluster failover redundancy-group 0 node 1 node1: -------------------------------------------------------------------------- Initiated manual failover for redundancy group 0 {primary:node0} [email protected]_SRX220_Top> request chassis cluster failover redundancy-group 1 node 1 node1: -------------------------------------------------------------------------- Initiated manual failover for redundancy group 1 {secondary-hold:node0}
Once the command has been run we can see that both redundancy groups have failed over, as Node1 has the higher priority now. We can also see that with Redundancy Group 0 has Node0 has secondary-hold status. Secondary-hold status is when the device is in passive state and cannot be promoted to active/primary state. The secondary-hold has a 5 minute interval time, this means you will have wait until after this interval before you can failover Redundancy Group 0 back to the Node0
{secondary-hold:node0} [email protected]_SRX220_Top> show chassis cluster status Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 6 node0 100 secondary-hold no yes node1 255 primary no yes Redundancy group: 1 , Failover count: 32 node0 100 secondary yes yes node1 255 primary yes yes {secondary-hold:node0}
After the 5 minute interval, you can see that Node0 has moved from the secondary-hold to secondary now
[email protected]_SRX220_Top> show chassis cluster status Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 6 node0 100 secondary no yes node1 255 primary no yes Redundancy group: 1 , Failover count: 32 node0 100 secondary yes yes node1 255 primary yes yes {secondary:node0}
As we can see from the rolling pings in total 3 packets out of 2138 packets were dropped and there was no packet loss. Not a noticeable drop of traffic
{master:0} root> ping 172.16.0.2 routing-instance trust --- 192.168.0.2 ping statistics --- 1071 packets transmitted, 1069 packets received, 0% packet loss round-trip min/avg/max/stddev = 1.870/2.275/9.126/0.523 ms --------------------------------------------------------------------------------- {master:0} root> ping 172.16.0.2 routing-instance trust --- 172.16.0.2 ping statistics --- 1067 packets transmitted, 1066 packets received, 0% packet loss round-trip min/avg/max/stddev = 1.887/2.509/5.126/0.351 ms
Having failed over Node1, we can clear the manual failover by using the command request chassis cluster failover reset redundancy-group 1. This will reset the node’s priority to the configured values. This command can be used as well, if the device becomes unreachable or the redundancy group threshold reaches zero.
{secondary:node0} [email protected]_SRX220_Top> request chassis cluster failover reset redundancy-group 1 node0: -------------------------------------------------------------------------- No reset required for redundancy group 1. node1: -------------------------------------------------------------------------- Successfully reset manual failover for redundancy group 1 {secondary:node0} [email protected]_SRX220_Top> request chassis cluster failover reset redundancy-group 0 node0: -------------------------------------------------------------------------- No reset required for redundancy group 0. node1: -------------------------------------------------------------------------- Successfully reset manual failover for redundancy group 0
As we have preempt on Redundancy Group 1, it will automatically fail back to Node0.
[email protected]_SRX220_Top> show chassis cluster status Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 6 node0 100 secondary no no node1 1 primary no no Redundancy group: 1 , Failover count: 33 node0 100 primary yes no node1 1 secondary yes no
Whereas with Redundancy Group 0, as you can’t enable to preempt, you will need to do another manual failover to get Node0 to become the master of the cluster.
My next post will look at Test B, Interface Failover. See you on the other side 😀
Keeran Marquis
Latest posts by Keeran Marquis (see all)
- Life and Times of an Unemployed Professional Speed Dater #3 - August 5, 2018
- Life and Times of an Unemployed Professional Speed Dater #2 - August 5, 2018
- Life and Times of an Unemployed Professional Speed Dater #1 - August 5, 2018
Superb
Thanks for sharing this Info. I have tried failover for reth1 which holds my untrust connection & it works good without any packet loss, however when I failover the reth0 I see complete loss for 30 secs. I tweaked the heartbeat to minimum but still no success. I am using srx 240’s not sure if this is base behaviour of the HA setup.
Wow! So glad I found your blog. I’m struggling at getting my SRX340 cluster and EX3400 (Virtual Chassis) setup up and running and you’ve helped a bunch. I’m wondering if you would be willing to assist me further? I’m not a network guru, unfortunately, so I’m struggling to fully understand what exactly I’m doing. If you would be able to spare some time, that would be awesome. Let me know.
Thanks,
Jesse
Great and clear information. Thanks for sharing!