I thought that it would be better to have the SRX clustering post in multiple posts, as my first post got pretty long! So here is part 2 :D

Lets dive straight in!

Having configured the cluster in my previous post, we will see how the failover process works. I will be using two methods for failover testing will:

  • A manual failover, where I will manually failover redundancy group 1 from node0 to node1
  • An interface failover (hard failover), where I will shutdown the ports on node0 and the cluster should failover to node1

Pre Testing Checks

Cluster Status

Before doing each test, I checked that the status of chassis cluster was as expected with Node0 as primary and Node1 as secondary:

root@lab_SRX220_Top> show chassis cluster status        
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 5
    node0                   100         primary        no       no  
    node1                   1           secondary      no       no  

Redundancy group: 1 , Failover count: 31
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no

Cluster Flow Session

To check that all the traffic was flowing via Node0, as this cluster is an Active/Standby set up. I had started a rolling ping from trust untrust in 2 seperate windows. As we can see flows are going through Node0 as expected

root@lab_SRX220_Top> show security flow session    
node0:
--------------------------------------------------------------------------

Session ID: 5932, Policy name: ping/4, State: Active, Timeout: 2, Valid
  In: 172.16.0.2/11 --> 192.168.0.2/10498;icmp, If: reth0.10, Pkts: 1, Bytes: 84
  Out: 192.168.0.2/10498 --> 172.16.0.2/11;icmp, If: reth1.20, Pkts: 1, Bytes: 84

Session ID: 5933, Policy name: ping/5, State: Active, Timeout: 2, Valid
  In: 192.168.0.2/3 --> 172.16.0.2/10500;icmp, If: reth1.20, Pkts: 1, Bytes: 84
  Out: 172.16.0.2/10500 --> 192.168.0.2/3;icmp, If: reth0.10, Pkts: 1, Bytes: 84

Session ID: 5934, Policy name: ping/4, State: Active, Timeout: 2, Valid
  In: 172.16.0.2/12 --> 192.168.0.2/10498;icmp, If: reth0.10, Pkts: 1, Bytes: 84
  Out: 192.168.0.2/10498 --> 172.16.0.2/12;icmp, If: reth1.20, Pkts: 1, Bytes: 84

Session ID: 5935, Policy name: ping/5, State: Active, Timeout: 2, Valid
  In: 192.168.0.2/4 --> 172.16.0.2/10500;icmp, If: reth1.20, Pkts: 1, Bytes: 84
  Out: 172.16.0.2/10500 --> 192.168.0.2/4;icmp, If: reth0.10, Pkts: 1, Bytes: 84

Session ID: 5936, Policy name: ping/4, State: Active, Timeout: 4, Valid
  In: 172.16.0.2/13 --> 192.168.0.2/10498;icmp, If: reth0.10, Pkts: 1, Bytes: 84
  Out: 192.168.0.2/10498 --> 172.16.0.2/13;icmp, If: reth1.20, Pkts: 1, Bytes: 84

Session ID: 5937, Policy name: ping/5, State: Active, Timeout: 4, Valid
  In: 192.168.0.2/5 --> 172.16.0.2/10500;icmp, If: reth1.20, Pkts: 1, Bytes: 84
  Out: 172.16.0.2/10500 --> 192.168.0.2/5;icmp, If: reth0.10, Pkts: 1, Bytes: 84
Total sessions: 6

node1:
--------------------------------------------------------------------------
Total sessions: 0

Connectivity Verification

I will have rolling pings from trust untrust zones in separate terminal windows, it will show if any packets are dropped during the failover

Failover Groups

I will be failing over both redundancy groups 0 and 1

Test A (Manual failover)

To perform a manual failover you will need to run the command request chassis cluster failover redundancy-group {0|1} node {0|1}

root@lab\_SRX220\_Top> request chassis cluster failover redundancy-group 0 node 1    
node1:
--------------------------------------------------------------------------
Initiated manual failover for redundancy group 0

{primary:node0}
root@lab\_SRX220\_Top> request chassis cluster failover redundancy-group 1 node 1    
node1:
--------------------------------------------------------------------------
Initiated manual failover for redundancy group 1

{secondary-hold:node0}

Once the command has been run we can see that both redundancy groups have failed over, as Node1 has the higher priority now. We can also see that with Redundancy Group 0 has Node0 has secondary-hold status. Secondary-hold status is when the device is in passive state and cannot be promoted to active/primary state. The secondary-hold has a 5 minute interval time, this means you will have wait until after this interval before you can failover Redundancy Group 0 back to the Node0

{secondary-hold:node0}
root@lab\_SRX220\_Top> show chassis cluster status 
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 6
    node0                   100         secondary-hold no       yes 
    node1                   255         primary        no       yes 

Redundancy group: 1 , Failover count: 32
    node0                   100         secondary      yes      yes 
    node1                   255         primary        yes      yes 

{secondary-hold:node0}

After the 5 minute interval, you can see that Node0 has moved from the secondary-hold to secondary now

root@lab\_SRX220\_Top> show chassis cluster status    
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 6
    node0                   100         secondary      no       yes 
    node1                   255         primary        no       yes 

Redundancy group: 1 , Failover count: 32
    node0                   100         secondary      yes      yes 
    node1                   255         primary        yes      yes 

{secondary:node0}

As we can see from the rolling pings in total 3 packets out of 2138 packets were dropped and there was no packet loss. Not a noticeable drop of traffic

{master:0}
root> ping 172.16.0.2 routing-instance trust 

--- 192.168.0.2 ping statistics ---
1071 packets transmitted, 1069 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.870/2.275/9.126/0.523 ms

---------------------------------------------------------------------------------

{master:0}
root> ping 172.16.0.2 routing-instance trust    

--- 172.16.0.2 ping statistics ---
1067 packets transmitted, 1066 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.887/2.509/5.126/0.351 ms

Having failed over Node1, we can clear the manual failover by using the command request chassis cluster failover reset redundancy-group 1. This will reset the node’s priority to the configured values. This command can be used as well, if the device becomes unreachable or the redundancy group threshold reaches zero.

{secondary:node0}
root@lab_SRX220_Top> request chassis cluster failover reset redundancy-group 1    
node0:
--------------------------------------------------------------------------
No reset required for redundancy group 1.

node1:
--------------------------------------------------------------------------
Successfully reset manual failover for redundancy group 1

{secondary:node0}
root@lab_SRX220_Top> request chassis cluster failover reset redundancy-group 0    
node0:
--------------------------------------------------------------------------
No reset required for redundancy group 0.

node1:
--------------------------------------------------------------------------
Successfully reset manual failover for redundancy group 0

As we have preempt on Redundancy Group 1, it will automatically fail back to Node0.

root@lab_SRX220_Top> show chassis cluster status   
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 6
    node0                   100         secondary      no       no  
    node1                   1           primary        no       no  

Redundancy group: 1 , Failover count: 33
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no

Whereas with Redundancy Group 0, as you can’t enable to preempt, you will need to do another manual failover to get Node0 to become the master of the cluster.

Manual Failover of Node0 output

root@lab_SRX220_Top> request chassis cluster failover redundancy-group 0 node 0 
node0:
--------------------------------------------------------------------------
Initiated manual failover for redundancy group 0

{secondary:node0}                                   
root@lab\_SRX220\_Top> show chassis cluster status    
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 7
    node0                   255         primary        no       yes 
    node1                   1           secondary      no       yes 

Redundancy group: 1 , Failover count: 33
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no
{primary:node0}
root@lab\_SRX220\_Top> request chassis cluster failover reset redundancy-group 0     
node0:
--------------------------------------------------------------------------
Successfully reset manual failover for redundancy group 0

node1:
--------------------------------------------------------------------------
No reset required for redundancy group 0.
{primary:node0}
root@lab\_SRX220\_Top> show chassis cluster status                                  
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 7
    node0                   100         primary        no       no  
    node1                   1           secondary      no       no  

Redundancy group: 1 , Failover count: 33
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no

My next post will look at Test B, Interface Failover. See you on the other side :D

Share on LinkedIn
Share on Reddit