Tag Archives: jncis-sec

Configuring IPv4 DHCP Juniper SRX

Reading Time: 1 minute

After configuring a Dual Stacked DHCP server and DHCPv6 on Juniper SRX, it’s only right that I did something on Configuring DCHPv4 on a Juniper SRX.

This wont be a long or detailed post, as the configuration is very much the same as my previous post on how to configure DHCPv6 on a SRX, and I’ve went thought quite a lot before about how DHCP works etc.

First, under the system services dhcp-local-server stanza, you will need to create group and set a physical or logical interface that will have DHCP enabled

[email protected]# show system services dhcp-local-server    
group dhcpv4-group {
    interface vlan.3407;
}

Next, under the access address-assignment stanza, you will need to set the network, the DHCP range and set the IP address that the router will be using within the DCHP pool. The propagate-settings will take configuration from the client DHCP on vlan.3407, if not otherwise specified, most importantly name-server which changes from ISP to ISP and is very important otherwise name resolutions on the LAN won’t work.

[email protected]# show access   
address-assignment {
    pool v4 {
        family inet {
            network 172.31.106.16/29;
            range v4-range {
                low 172.31.106.18;
                high 172.31.106.22;
            }
            dhcp-attributes {
                router {
                    172.31.106.17;
                }
                propagate-settings vlan.3407;
            }
        }
    }
}

This will be all configuration needed to have DHCPv4 on Juniper SRX220. For troubleshooting DHCP you will be able to use the commands below:

[email protected]> show dhcp ? 
Possible completions:
  client               Show DHCP client information
  relay                Show DHCP relay information
  server               Show DHCP server information
  snooping             Show DHCP snooping information
  statistics           Show DHCP service statistics

As I said, this is the quick post :p

I have included the set commands used in my example below:

DHCP Set Commands
set system services dhcp-local-server group dhcpv4-group interface vlan.3407
set access address-assignment pool v4 family inet network 172.31.106.16/29
set access address-assignment pool v4 family inet range v4-range low 172.31.106.18
set access address-assignment pool v4 family inet range v4-range high 172.31.106.22
set access address-assignment pool v4 family inet dhcp-attributes router 172.31.106.17
set access address-assignment pool v4 family inet dhcp-attributes propagate-settings vlan.3407
Share this:
Share

Disabling a SRX Chassis Cluster

Reading Time: 1 minute

My final post on SRX Chassis Clustering, if you been with me from the start, it has been emotional πŸ˜€ haha

If you wanted to disable chassis cluster and have the SRX firewall back as a standalone devices, you will need to run the following command. From Operational mode:

{primary:node0}
[email protected]_SRX220_Top> set chassis cluster disable reboot

(If you remember from the first post, this was the first command I used)

You will get a message, saying the chassis cluster has been disabled and the device is going to reboot. Once the reboot has completed you will have your SRX back to standalone device!

As straightforward as that!!!

I hope that you have enjoyed my series of posts SRX Chassis Clustering process. For myself, at the time of writing, this was the first time I had ever done chassis cluster! If you have any comments, questions or feedback, drop a comment as im all ears!

Cheers πŸ˜€

For a greater insight and further in-depth understanding and knowledge on Chassis Clustering on SRX Series, I would recommend having read of the Juniper Security Device documentation

Share this:
Share

Upgrading a SRX Chassis Cluster

Reading Time: 4 minutes

In my previous post, I had successfully failed over the redundancy groups on the cluster using Manual Failover and Interface Failure methods. This post will look into the methods that can be used, when upgrading a SRX Chassis Cluster.

Testing Information
i)I had the scp latest recommended version of Junos (12.1X44-D45.2) onto both Node0 and Node1. The package is located under the /var/tmp file. You can get this folder via cli. From Operation Mode start shell then cd /var/tmp
ii) I will have rolling pings from trust <--> untrust zones in separate terminal windows, so I can see when the outage starts and will be timing the length
iii) All command will run from Node0, unless stated otherwise

You have two methods of updating a SRX Cluster:

Method A (Individual Node upgrades)

Disclaimer
Using this method of chassis cluster upgrade, as a SERVICE DISRUPTION of 3-5 minutes minimum. You will need to ensure that you have considered the business impact of this method of upgrade.

This method can also be used for downgrading Junos, as well as upgrading and has no Junos version limitation. With this method you will be simply upgrading both individual nodes at the same time. As I have already uploaded the Junos image onto both nodes. I will need to run the command on BOTH Node0 and Node1 from Operational Mode

{primary:node0}
[email protected]_SRX220_Top> request system software add /var/tmp/junos-srxsme-12.1X44-D45.2-domestic.tgz
{secondary:node1}
[email protected]_SRX220_Top> request system software add /var/tmp/junos-srxsme-12.1X44-D45.2-domestic.tgz

Once they have been added, you will need to reboot both Nodes simultaneously. You can use request system reboot node all from Node0

After the reboot, you will need to update the backup image of Junos on both Nodes, to have a consistent primary and backup image.

Method B (In Service Software Upgrades)

Before I begin, with in-service updates, Juniper have two types of in-service upgrade. For the High-End Data Centre SRX models SRX1400, SRX3400, SRX5600 and SRX5800 will use In-Service Software Upgrade (ISSU) and the Small/Medium Branch SRX models SRX100, SRX110, SRX220, SRX240 and SRX650 will use In-Band Cluster Upgrade (ICU). Although the commands are near enough the same; the pre-upgrade requirement, service impacts and the minimum Junos firmware version that supporting in-service upgrades are different.

As I’m using 2x SRX220H2 model firewalls, I will be upgrading via ICU. When I get chance to upgrade a High-End SRX model, I will update the post with my findings :p

Even before you consider using the ISSU/ICU method, I am telling you (no recommendation here!!) to check the Juniper page Limitation on ISSU and ICU. The page will confirm what version of Junos is supported by ISSU/ICU and (more importantly) services that are not supported by ISSU/ICU. In essence, you will need to change if/what services you are running on your SRX cluster to see if they are supported. If they are not supported then you are told DO NOT perform an upgrade with this method.

With that out of the way and if you have checked that your cluster is fully supported (firmware and service) by ISSU/ICU you can proceed with the pre-checks πŸ˜€

Pre-Upgrade Checks ICU
Junos VersionNo-sync optionDowngrade Method?Disk Space
You will need to be running Junos version 11.2R2 minimum. This can be checked by running show version on both Nodes.
ICU is available with the no-sync options only. The no-sync option disables the flow state from syncing with the second node when it boots with the new Junos image.
You CAN NOT use ICU to downgrade Junos to version lower than 11.2R2
You will need to check the disk space available in the /var/tmp file on the SRX. From Operational Mode start shell then enter the command df -h and you will get disk spaces available.

Having confirmed all the pre-checks are good, we can proceed with the upgrade. It is important to note that during an ICU, there WILL BE A SERVICE DISRUPTION! will be approximately 30 seconds with no-sync option. During this 30 seconds traffic will be dropped and flow session will be lost. You will need to keep this in mind, if you are doing this upgrade in-hours or you need to have a good record on your flow session for any reason.

To start the upgrade, we need to run request system software in-service-upgrade /path/to/package no-sync

{primary:node0}
[email protected]_SRX220_Top> request system software in-service-upgrade /var/tmp/junos-srxsme-12.1X44-D45.2-domestic.tgz no-sync
ICU Console observations
RebootingUpgrade OrderNode0 to Node1 failover processEnd Host View Point
It is important to note that during the ICU process, you won’t need do any manual reboots, all the reboots are automated within the process

WARNING: in-service-upgrade shall reboot both the nodes
         in your cluster. Please ignore any subsequent 
         reboot request message
Once the process has started Node1 is upgraded first:

Node1 is upgraded first
ISSU: start downloading software package on secondary node
Pushing bundle to node1
{.......}
JUNOS 12.1X44-D45.2 will become active at next reboot
WARNING: A reboot is required to load this software correctly
WARNING:     Use the 'request system reboot' command
WARNING:         when software installation is complete
Saving state for rollback ...
ISSU: failover all redundancy-groups 1...n to primary node
Successfully reset all redundancy-groups priority back to configured priority.
Successfully reset all redundancy-groups priority back to configured priority.
Initiated manual failover for all redundancy-groups to node0
Redundancy-groups-0 will not failover and the primaryship remains unchanged.
ISSU: rebooting Secondary Node
Shutdown NOW!
[pid 13353]
ISSU: Waiting for secondary node node1 to reboot.
ISSU: node 1 went down
ISSU: Waiting for node 1 to come up
It takes few minutes for node0 to reboot after node1 comes back online if you have console connection on both SRXs, you will need to be patient before aborting the upgrade. If you have rolling ping going for each nodes fxp interface you will when the node0 is about to reboot as node1 pings will return. Once node1 is up and booted, Node0 will start to reboot.

ISSU: node 1 came up
ISSU: secondary node node1 booted up.
Shutdown NOW!
From hitting enter to having both firewalls upgraded it had taken 22:45min. Although the documentation said were will be an outage of 30 seconds the rolling ping between trust <--> untrust shows that there was no packet-loss and only 6 packets out of 1600 transmitted weren’t received. (Saying that, for my testing I was unable to get live flow session information.)

root> ping 172.16.0.2 routing-instance trust 
--- 172.16.0.2 ping statistics ---
1600 packets transmitted, 1594 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.720/2.640/13.673/0.652 ms
--------------------------------------------------------------------------
root> ping 192.168.0.2 routing-instance untrust
--- 192.168.0.2 ping statistics ---
1600 packets transmitted, 1594 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.838/2.535/13.669/0.681 ms
To verify that the upgrade has been successful, we can run the commands show version

{secondary:node0}
[email protected]_SRX220_Top> show version 
node0:
--------------------------------------------------------------------------
Hostname: lab_SRX220_Top
Model: srx220h2
JUNOS Software Release [12.1X44-D45.2]

node1:
--------------------------------------------------------------------------
Hostname: lab_SRX220_Top
Model: srx220h2
JUNOS Software Release [12.1X44-D45.2]

And show chassis cluster status, to see that chassis status is as expected

[email protected]_SRX220_Top> show chassis cluster status 
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 0
    node0                   100         secondary      no       no  
    node1                   1           primary        no       no  

Redundancy group: 1 , Failover count: 1
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no 

We can see that we are running the upgraded version of Junos. As expected Redundancy Group 0 is primary on Node1 and Redundancy Group 1 is primary on Node0. As discussed in my previous post, with preempt enabled Redundancy Group 1 will automatically failover to Node0, once it is available. We will have to do a manual failover of redundancy group 0 back to Node0 from Node1 and we will need to upgrade the backup image of Junos to have a consistent primary and backup image.

If you had a case where you had to abort the ICU process you will need to run request system software abort in-service-upgrade on the primary node. It is important to note, if you do use the abort command, you will put the cluster into an inconsistent state, where the secondary node will be running a newer version of Junos to the Primary node. To recover the cluster into a consistent state you will need to do the following all on the secondary node:

Recovering from an Inconsistent State
1. You will need to abort the upgrade: request system software abort in-service-upgrade
2. Rollback to the older version of Junos, that will be on the primary node request system software rollback node {node-id}
3. Perform a reboot of the node request system reboot

**UPDATE 29/4/2015**
Lucky enough, as I was finishing up this series of posts, my colleague had finished working on the SRX1400 we have in our lab! So I was able to run testing on doing ISSU upgrade on High End SRX Series device πŸ˜€ Happy Days!!!

SRX1400 testing differences
1. The SRX1400 doesn’t have any routing protocols, I will not need to configure graceful restart.
2. I will be upgrading from 12.1X44-D40.2 to 12.1X46-D10.2
3. The topology will be the same, however the IP addressing will be different. Trust will be 192.168.13.0/24 and Untrust will be 172.31.13.0/24
Pre-Upgrade Checks ISSU
Junos VersionDowngrade Method?RoutingRedundancy GroupsRedundancy Group 0
You will need to check to see, if the version of Junos code supports ISSU. This can be checked by running show version on both Nodes. You will need to be using Junos version 9.6 and later
ISSU DOES NOT support firmware downgrade!
Juniper recommend that a graceful restart for routing protocols be enabled Before starting an ISSU
Manually failing over all redundancy groups to one active only (For my example, as I have a active/backup setup, you won’t need to change anything. However, if you have active/active setup, you will need to change you configuration changes)
Once the upgrade has been completed you will need to Manual Failover Redundancy Group 0 back to Node0 (see Failover on SRX cluster pt1)

To start the upgrade, firstly all the redundancy groups need to fail over to one active node. As I have an active/backup setup, all my redundancy groups are on node0

{primary:node0}
[email protected]_be-rtr0-h3> show chassis cluster status        
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 3
    node0                   100         primary        no       no  
    node1                   99          secondary      no       no  

Redundancy group: 1 , Failover count: 5
    node0                   100         primary        yes      no  
    node1                   99          secondary      yes      no

To be the upgrade process, we need to run request system software in-service-upgrade /path/to/package reboot

Important note
Unlike with the ICU upgrade process, you have to enter the option reboot to confirm that you want a reboot after. If you don’t use the option reboot, the command will fail. This only applies to the High End SRX devices, SRX1400, SRX3400, SRX3600, SRX5600 and SRX5800.
ISSU Console observations
Patience neededNode1 FailoverEnd Host View Point
It does take quite a while from this point before more output will come from the console on node0, so you will need to be patience.

Validation succeeded
failover all RG 1+ groups to node 0 
Initiated manual failover for all redundancy-groups to node0
Redundancy-groups-0 will not failover and the primaryship remains unchanged.
ISSU: Preparing Backup RE
Pushing bundle to node1
Once Node1 is up and you see the output below

ISSU: Backup RE Prepare Done
Waiting for node1 to reboot.
node1 booted up.
Waiting for node1 to become secondary
node1 became secondary.
Waiting for node1 to be ready for failover
ISSU: Preparing Daemons

It takes around 5-10mins before you see anymore output to say the upgrade process is still going on! Again you will need to be patient as this does take its time!

Secondary node1 ready for failover.
{.......}
Failing over all redundancy-groups to node1
ISSU: Preparing for Switchover
Initiated failover for all the redundancy groups to node1
Waiting for node1 take over all redundancy groups
From hitting enter to having both firewalls upgraded it had taken 30:18min. The rolling ping between trust <--> untrust shows that they was no packet-loss and only 2 packets out of 3639 transmitted weren’t received. (As like before, unfortunately I was unable to get live flow session information)

root> ping 172.31.13.2 routing-instance trust 
--- 172.31.13.2 ping statistics ---
1818 packets transmitted, 1817 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.769/3.080/44.226/3.536 ms
--------------------------------------------------------------------------
root> ping 192.168.13.2 routing-instance untrust 
--- 192.168.13.2 ping statistics ---
1821 packets transmitted, 1820 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.831/3.071/44.524/3.244 ms

To verify that the upgrade has been successful, we can run the commands show version

{secondary:node0}
[email protected]_be-rtr0-h3> show version 
node0:
--------------------------------------------------------------------------
Hostname: lab_be-rtr0-h3
Model: srx1400
JUNOS Software Release [12.1X46-D10.2]

node1:
--------------------------------------------------------------------------
Hostname: lab_be-rtr0-i3
Model: srx1400
JUNOS Software Release [12.1X46-D10.2]

And show chassis cluster status, to see that chassis status is as expected

{secondary:node0}
[email protected]_be-rtr0-h3> show chassis cluster status 
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 0
    node0                   100         secondary      no       no  
    node1                   99          primary        no       no  

Redundancy group: 1 , Failover count: 1
    node0                   100         primary        yes      no  
    node1                   99          secondary      yes      no 

We can see that we are running the upgraded version of Junos. As expected Redundancy Group 0 is primary on Node1 and Redundancy Group 1 is primary on Node0. As discussed in my previous post, with preempt enabled Redundancy Group 1 will automatically failover to Node0, once it is available. We will have to do a manual failover of redundancy group 0 back to Node0 from Node1 and we will need to upgrade the backup image of Junos to have a consistent primary and backup image.

Unexpected output
During the reboot and manual failover of redundancy group 0 on Node0, I had got the output below on my console terminal

Message from [email protected]_be-rtr0-h3 at Apr 29 12:26:40  ...
lab_be-rtr0-h3 node0.fpc1.pic0 PFEMAN: Shutting down , PFEMAN Resync aborted! No peer info on reconnect or master rebooted?  

Message from [email protected]_be-rtr0-h3 at Apr 29 12:26:40  ...
lab_be-rtr0-h3 node0.cpp0 RDP: Remote side closed connection: rdp.(17825794:13321).(serverRouter:chassis)

[email protected]_be-rtr0-i3> Apr 29 12:27:04 init: can not access /usr/sbin/ipmid: No such file or directory

Message from [email protected]_be-rtr0-i3 at Apr 29 12:27:05  ...
lab_be-rtr0-i3 node1.cpp0 RDP: Remote side closed connection: rdp.(34603010:33793).(serverRouter:pfe) 

Message from [email protected]_be-rtr0-i3 at Apr 29 12:27:05  ...
lab_be-rtr0-i3 node1.cpp0 RDP: Remote side closed connection: rdp.(34603010:33792).(serverRouter:chassis) 

Message from [email protected]_be-rtr0-i3 at Apr 29 12:27:17  ...
lab_be-rtr0-i3 node1.cpp0 RDP: Remote side reset connection: rdp.(34603010:33794).(primaryRouter:1008) 

Message from [email protected]_be-rtr0-i3 at Apr 29 12:27:18  ...
lab_be-rtr0-i3 node1.cpp0 RDP: Remote side reset connection: rdp.(34603010:33795).(primaryRouter:1007)

I had raised this with Juniper and they sent this article. The article confirms that the error messages are expected if you are connected via the console or fxp0 interface. “The above mentioned messages, which are generated on the console session, states that the routing-engine [control plane(RG0)] has become active on the other node….These messages are due to the following syslog user configuration: system syslog user *.

You can stop this error by deactivating system syslog user *.

Note: It is recommended by Juniper for you keep the ‘syslog user (‘any emergency’)’ configuration and ignore these informational messages, as they might show certain useful information to the user.

Phew that was a lot of work and quite a bit to take in there!! Time for a break, (a drink or 6 lol)

My next post will be the last post in the SRX Chassis Cluster Series (sad times πŸ™ ). It will be nice simple one on how to disable chassis cluster!

Share this:
Share

Juniper SRX Failover Testing Part 2

Reading Time: 2 minutes

Having completed a manual failover of the redundancy groups in my previous post, this test go through the process of what would have happen if we had a link fault.

Test B (Interface Failure)

In my first post creating srx cluster, I had configured Interface Monitoring. Interface monitoring can be used to trigger a failover in the event link status on an interface goes down. For this test I will be disconnecting interface ge-0/0/1, once this has been disconnected we should see that redundancy group 1 failover to Node1 from Node0.

We will check the status of the cluster and the interfaces before proceed:

Chassis Cluster StatusChassis Cluster Interfaces
{primary:node0}[email protected]_SRX220_Top# run show chassis cluster status 
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 7
    node0                   100         primary        no       no  
    node1                   1           secondary      no       no  

Redundancy group: 1 , Failover count: 33
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no
{primary:node0}
[email protected]_SRX220_Top> show chassis cluster interfaces    
Control link status: Up

Control interfaces: 
    Index   Interface        Status
    0       fxp1             Up    

Fabric link status: Up

Fabric interfaces: 
    Name    Child-interface    Status
                               (Physical/Monitored)
    fab0    ge-0/0/5           Up   / Up  
    fab0   
    fab1    ge-3/0/5           Up   / Up  
    fab1   

Redundant-ethernet Information:     
    Name         Status      Redundancy-group
    reth0        Up          1                
    reth1        Up          1                
   
Redundant-pseudo-interface Information:
    Name         Status      Redundancy-group
    lo0          Up          0                

Interface Monitoring:
    Interface         Weight    Status    Redundancy-group
    ge-0/0/2          255       Up        1   
    ge-3/0/2          255       Up        1   
    ge-3/0/1          255       Up        1   
    ge-0/0/1          255       Up        1   

{primary:node0}

As everything is as expected, I have disconnected interface ge-0/0/1 and by running the same commands we are able to see that the link failure has been detected by running show chassis cluster interface command

{primary:node0}[edit]
[email protected]_SRX220_Top# run show chassis cluster interfaces       
Control link status: Up

Control interfaces: 
    Index   Interface        Status
    0       fxp1             Up    

Fabric link status: Up

Fabric interfaces: 
    Name    Child-interface    Status
                               (Physical/Monitored)
    fab0    ge-0/0/5           Up   / Up  
    fab0   
    fab1    ge-3/0/5           Up   / Up  
    fab1   

Redundant-ethernet Information:     
    Name         Status      Redundancy-group
    reth0        Up          1                
    reth1        Up          1                
   
Redundant-pseudo-interface Information:
    Name         Status      Redundancy-group
    lo0          Up          0                

Interface Monitoring:
    Interface         Weight    Status    Redundancy-group
    ge-0/0/2          255       Up        1   
    ge-3/0/2          255       Up        1   
    ge-3/0/1          255       Up        1   
    ge-0/0/1          255       Down      1

And running show chassis cluster status we can see that redundancy group 1 has failed over from Node0 to Node1.

{primary:node0}[edit]
[email protected]_SRX220_Top# run show chassis cluster status            
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 7
    node0                   100         primary        no       no  
    node1                   1           secondary      no       no  

Redundancy group: 1 , Failover count: 34
    node0                   0           secondary      yes      no  
    node1                   1           primary        yes      no

As I have configured preempt on redundancy group 1, once the link (ge-0/0/1) is reconnected, it will automatically fail back to Node0

Chassis Cluster InterfaceChassis Cluster Status
{primary:node0}[edit]
[email protected]_SRX220_Top# run show chassis cluster interfaces    
Control link status: Up

Control interfaces: 
    Index   Interface        Status
    0       fxp1             Up    

Fabric link status: Up

Fabric interfaces: 
    Name    Child-interface    Status
                               (Physical/Monitored)
    fab0    ge-0/0/5           Up   / Up  
    fab0   
    fab1    ge-3/0/5           Up   / Up  
    fab1   

Redundant-ethernet Information:     
    Name         Status      Redundancy-group
    reth0        Up          1                
    reth1        Up          1                
   
Redundant-pseudo-interface Information:
    Name         Status      Redundancy-group
    lo0          Up          0                

Interface Monitoring:
    Interface         Weight    Status    Redundancy-group
    ge-0/0/2          255       Up        1   
    ge-3/0/2          255       Up        1   
    ge-3/0/1          255       Up        1   
    ge-0/0/1          255       Up        1
{primary:node0}[edit]
[email protected]_SRX220_Top# run show chassis cluster status        
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 7
    node0                   100         primary        no       no  
    node1                   1           secondary      no       no  

Redundancy group: 1 , Failover count: 35
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no

My next post in this series, will demonstrate the methods of upgrading Junos version on a SRX cluster.

Share this:
Share

Juniper SRX Failover Testing Part 1

Reading Time: 3 minutes

I thought that it would be better to have the SRX clustering post in multiple posts, as my first post got pretty long! So here is part 2 πŸ˜€

Lets dive straight in!

Having configured the cluster in my previous post, we will see how the failover process works. I will be using two methods for failover testing will:

i) A manual failover, where I will manually failover redundancy group 1 from node0 to node1
ii) An interface failover (hard failover), where I will shutdown the ports on node0 and the cluster should failover to node1

Pre Testing Checks
Cluster StatusCluster Flow SessionConnectivity VerificationFailover Groups
Before doing each test, I checked that the status of chassis cluster was as expected with Node0 as primary and Node1 as secondary:

[email protected]_SRX220_Top> show chassis cluster status        
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 5
    node0                   100         primary        no       no  
    node1                   1           secondary      no       no  

Redundancy group: 1 , Failover count: 31
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no
To check that all the traffic was flowing via Node0, as this cluster is an Active/Standby set up. I had started a rolling ping from trust untrust in 2 seperate windows. As we can see flows are going through Node0 as expected

[email protected]_SRX220_Top> show security flow session    
node0:
--------------------------------------------------------------------------

Session ID: 5932, Policy name: ping/4, State: Active, Timeout: 2, Valid
  In: 172.16.0.2/11 --> 192.168.0.2/10498;icmp, If: reth0.10, Pkts: 1, Bytes: 84
  Out: 192.168.0.2/10498 --> 172.16.0.2/11;icmp, If: reth1.20, Pkts: 1, Bytes: 84

Session ID: 5933, Policy name: ping/5, State: Active, Timeout: 2, Valid
  In: 192.168.0.2/3 --> 172.16.0.2/10500;icmp, If: reth1.20, Pkts: 1, Bytes: 84
  Out: 172.16.0.2/10500 --> 192.168.0.2/3;icmp, If: reth0.10, Pkts: 1, Bytes: 84

Session ID: 5934, Policy name: ping/4, State: Active, Timeout: 2, Valid
  In: 172.16.0.2/12 --> 192.168.0.2/10498;icmp, If: reth0.10, Pkts: 1, Bytes: 84
  Out: 192.168.0.2/10498 --> 172.16.0.2/12;icmp, If: reth1.20, Pkts: 1, Bytes: 84

Session ID: 5935, Policy name: ping/5, State: Active, Timeout: 2, Valid
  In: 192.168.0.2/4 --> 172.16.0.2/10500;icmp, If: reth1.20, Pkts: 1, Bytes: 84
  Out: 172.16.0.2/10500 --> 192.168.0.2/4;icmp, If: reth0.10, Pkts: 1, Bytes: 84

Session ID: 5936, Policy name: ping/4, State: Active, Timeout: 4, Valid
  In: 172.16.0.2/13 --> 192.168.0.2/10498;icmp, If: reth0.10, Pkts: 1, Bytes: 84
  Out: 192.168.0.2/10498 --> 172.16.0.2/13;icmp, If: reth1.20, Pkts: 1, Bytes: 84

Session ID: 5937, Policy name: ping/5, State: Active, Timeout: 4, Valid
  In: 192.168.0.2/5 --> 172.16.0.2/10500;icmp, If: reth1.20, Pkts: 1, Bytes: 84
  Out: 172.16.0.2/10500 --> 192.168.0.2/5;icmp, If: reth0.10, Pkts: 1, Bytes: 84
Total sessions: 6

node1:
--------------------------------------------------------------------------
Total sessions: 0
I will have rolling pings from trust untrust zones in separate terminal windows, it will show if any packets are dropped during the failover
I will be failing over both redundancy groups 0 and 1

Test A (Manual failover)

To perform a manual failover you will need to run the command request chassis cluster failover redundancy-group {0|1} node {0|1}

[email protected]_SRX220_Top> request chassis cluster failover redundancy-group 0 node 1    
node1:
--------------------------------------------------------------------------
Initiated manual failover for redundancy group 0

{primary:node0}
[email protected]_SRX220_Top> request chassis cluster failover redundancy-group 1 node 1    
node1:
--------------------------------------------------------------------------
Initiated manual failover for redundancy group 1

{secondary-hold:node0}

Once the command has been run we can see that both redundancy groups have failed over, as Node1 has the higher priority now. We can also see that with Redundancy Group 0 has Node0 has secondary-hold status. Secondary-hold status is when the device is in passive state and cannot be promoted to active/primary state. The secondary-hold has a 5 minute interval time, this means you will have wait until after this interval before you can failover Redundancy Group 0 back to the Node0

{secondary-hold:node0}
[email protected]_SRX220_Top> show chassis cluster status 
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 6
    node0                   100         secondary-hold no       yes 
    node1                   255         primary        no       yes 

Redundancy group: 1 , Failover count: 32
    node0                   100         secondary      yes      yes 
    node1                   255         primary        yes      yes 

{secondary-hold:node0}

After the 5 minute interval, you can see that Node0 has moved from the secondary-hold to secondary now

[email protected]_SRX220_Top> show chassis cluster status    
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 6
    node0                   100         secondary      no       yes 
    node1                   255         primary        no       yes 

Redundancy group: 1 , Failover count: 32
    node0                   100         secondary      yes      yes 
    node1                   255         primary        yes      yes 

{secondary:node0}

As we can see from the rolling pings in total 3 packets out of 2138 packets were dropped and there was no packet loss. Not a noticeable drop of traffic

{master:0}
root> ping 172.16.0.2 routing-instance trust 

--- 192.168.0.2 ping statistics ---
1071 packets transmitted, 1069 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.870/2.275/9.126/0.523 ms

---------------------------------------------------------------------------------

{master:0}
root> ping 172.16.0.2 routing-instance trust    

--- 172.16.0.2 ping statistics ---
1067 packets transmitted, 1066 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.887/2.509/5.126/0.351 ms

Having failed over Node1, we can clear the manual failover by using the command request chassis cluster failover reset redundancy-group 1. This will reset the node’s priority to the configured values. This command can be used as well, if the device becomes unreachable or the redundancy group threshold reaches zero.

{secondary:node0}
[email protected]_SRX220_Top> request chassis cluster failover reset redundancy-group 1    
node0:
--------------------------------------------------------------------------
No reset required for redundancy group 1.

node1:
--------------------------------------------------------------------------
Successfully reset manual failover for redundancy group 1

{secondary:node0}
[email protected]_SRX220_Top> request chassis cluster failover reset redundancy-group 0    
node0:
--------------------------------------------------------------------------
No reset required for redundancy group 0.

node1:
--------------------------------------------------------------------------
Successfully reset manual failover for redundancy group 0

As we have preempt on Redundancy Group 1, it will automatically fail back to Node0.

[email protected]lab_SRX220_Top> show chassis cluster status   
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 6
    node0                   100         secondary      no       no  
    node1                   1           primary        no       no  

Redundancy group: 1 , Failover count: 33
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no

Whereas with Redundancy Group 0, as you can’t enable to preempt, you will need to do another manual failover to get Node0 to become the master of the cluster.

Manual Failover of Node0 output
[email protected]_SRX220_Top> request chassis cluster failover redundancy-group 0 node 0 
node0:
--------------------------------------------------------------------------
Initiated manual failover for redundancy group 0
{secondary:node0}                                   
[email protected]_SRX220_Top> show chassis cluster status    
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 7
    node0                   255         primary        no       yes 
    node1                   1           secondary      no       yes 

Redundancy group: 1 , Failover count: 33
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no
{primary:node0}
[email protected]_SRX220_Top> request chassis cluster failover reset redundancy-group 0     
node0:
--------------------------------------------------------------------------
Successfully reset manual failover for redundancy group 0

node1:
--------------------------------------------------------------------------
No reset required for redundancy group 0.
{primary:node0}
[email protected]_SRX220_Top> show chassis cluster status                                  
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 7
    node0                   100         primary        no       no  
    node1                   1           secondary      no       no  

Redundancy group: 1 , Failover count: 33
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no

My next post will look at Test B, Interface Failover. See you on the other side πŸ˜€

Share this:
Share