Tag Archives: clustering

Juniper EX Virtual Chassis Part 2

Reading Time: 2 minutes

I’ve already written a post on how to create a Virtual Chassis by using the 1/10GB uplink modules. If you have a switch in production and want to add another switch for additional ports or redundancy, you can easily create a virtual chassis. This time I’ll be using the dedicated VC ports and cables and adding a new switch to a production switch.

I’ll be using the preprovisioned method, and before I do any virtual chassis configuration I’ll need to add some features to the master member to minimize failover times:

set system commit synchronize
set chassis redundancy graceful-switchover
set routing-options nonstop-routing
set ethernet-switching-options nonstop-bridging

Having added these features, we can now configure preprovisioned virtual chassis onto the master switch, which will become member 0. Because this is only a 2 member VC, I’ve added the no-split-detection command as recommended by Juniper, and to help with the failover times fast-failover on all ports ge/xe that have been enabled.

set virtual-chassis preprovisioned
set virtual-chassis no-split-detection
set virtual-chassis member 0 role routing-engine
set virtual-chassis member 0 serial-number BP0214340104
set virtual-chassis member 1 role routing-engine
set virtual-chassis member 1 serial-number BP0215090120
set virtual-chassis fast-failover ge
set virtual-chassis fast-failover xe

For now, that’s everything on the master member. On the new switch (member 1), you need to clear all config from the switch and set the root password to allow you to commit your changes:

root> edit 
Entering configuration mode
 
{master:0}[edit]
root# delete 
This will delete the entire configuration
Delete everything under this level? [yes,no] (no) yes 
{master:0}[edit]
root# set system root-authentication plain-text-password    
New password:
Retype new password:
root# commit 
configuration check succeeds
commit complete

You need to ensure there are no past virtual chassis configurations, and you can do this by entering the shell cli of the switch and removing anything in the vchassis folder:

root> start shell 
[email protected]:RE:0% rm -rf /config/vchassis/*
[email protected]:RE:0% cd /config/vchassis/
[email protected]:RE:0% ls -la
total 8
drwxr-xr-x  2 root  wheel  512 Sep 13 07:26 .
drwxr-xr-x  5 root  wheel  512 Sep 13 06:57 ..
[email protected]:RE:0% exit
exit

Now you will need to power off the backup member for at least a minute, to ensure that the other switch is elected as master.

After the minute, patch the VC-cable into the dedicated VCP-Ports at the back of the chassis and power on the backup switch. Once member 1 has booted you will be able to verify the new member by running: show virtual-chassis status

[email protected]> show virtual-chassis status     
 
Preprovisioned Virtual Chassis
Virtual Chassis ID: f1a1.ca8e.bbba
Virtual Chassis Mode: Enabled
                                           Mstr           Mixed Neighbor List
Member ID  Status   Serial No    Model     prio  Role      Mode ID  Interface
0 (FPC 0)  Prsnt    BP0214340104 ex4200-48t 129  Master*      N  1  vcp-0      
                                                                 1  vcp-1      
1 (FPC 1)  Prsnt    BP0215090120 ex4200-48t 129  Backup       N  0  vcp-0      
                                                                 0  vcp-1  

And you can verify the health of the VCP ports by running: show virtual-chassis vc-port

[email protected]> show virtual-chassis vc-port    
fpc0:
--------------------------------------------------------------------------
Interface   Type              Trunk  Status       Speed        Neighbor
or                             ID                 (mbps)       ID  Interface
PIC / Port
vcp-0       Dedicated           1    Up           32000        1   vcp-0  
vcp-1       Dedicated           2    Up           32000        1   vcp-1  
 
fpc1:
--------------------------------------------------------------------------
Interface   Type              Trunk  Status       Speed        Neighbor
or                             ID                 (mbps)       ID  Interface
PIC / Port
vcp-0       Dedicated           1    Up           32000        0   vcp-0  
vcp-1       Dedicated           2    Up           32000        0   vcp-1  
Share this:
Share

Disabling a SRX Chassis Cluster

Reading Time: 1 minute

My final post on SRX Chassis Clustering, if you been with me from the start, it has been emotional 😀 haha

If you wanted to disable chassis cluster and have the SRX firewall back as a standalone devices, you will need to run the following command. From Operational mode:

{primary:node0}
[email protected]_SRX220_Top> set chassis cluster disable reboot

(If you remember from the first post, this was the first command I used)

You will get a message, saying the chassis cluster has been disabled and the device is going to reboot. Once the reboot has completed you will have your SRX back to standalone device!

As straightforward as that!!!

I hope that you have enjoyed my series of posts SRX Chassis Clustering process. For myself, at the time of writing, this was the first time I had ever done chassis cluster! If you have any comments, questions or feedback, drop a comment as im all ears!

Cheers 😀

For a greater insight and further in-depth understanding and knowledge on Chassis Clustering on SRX Series, I would recommend having read of the Juniper Security Device documentation

Share this:
Share

Upgrading a SRX Chassis Cluster

Reading Time: 4 minutes

In my previous post, I had successfully failed over the redundancy groups on the cluster using Manual Failover and Interface Failure methods. This post will look into the methods that can be used, when upgrading a SRX Chassis Cluster.

Testing Information
i)I had the scp latest recommended version of Junos (12.1X44-D45.2) onto both Node0 and Node1. The package is located under the /var/tmp file. You can get this folder via cli. From Operation Mode start shell then cd /var/tmp
ii) I will have rolling pings from trust <--> untrust zones in separate terminal windows, so I can see when the outage starts and will be timing the length
iii) All command will run from Node0, unless stated otherwise

You have two methods of updating a SRX Cluster:

Method A (Individual Node upgrades)

Disclaimer
Using this method of chassis cluster upgrade, as a SERVICE DISRUPTION of 3-5 minutes minimum. You will need to ensure that you have considered the business impact of this method of upgrade.

This method can also be used for downgrading Junos, as well as upgrading and has no Junos version limitation. With this method you will be simply upgrading both individual nodes at the same time. As I have already uploaded the Junos image onto both nodes. I will need to run the command on BOTH Node0 and Node1 from Operational Mode

{primary:node0}
[email protected]_SRX220_Top> request system software add /var/tmp/junos-srxsme-12.1X44-D45.2-domestic.tgz
{secondary:node1}
[email protected]_SRX220_Top> request system software add /var/tmp/junos-srxsme-12.1X44-D45.2-domestic.tgz

Once they have been added, you will need to reboot both Nodes simultaneously. You can use request system reboot node all from Node0

After the reboot, you will need to update the backup image of Junos on both Nodes, to have a consistent primary and backup image.

Method B (In Service Software Upgrades)

Before I begin, with in-service updates, Juniper have two types of in-service upgrade. For the High-End Data Centre SRX models SRX1400, SRX3400, SRX5600 and SRX5800 will use In-Service Software Upgrade (ISSU) and the Small/Medium Branch SRX models SRX100, SRX110, SRX220, SRX240 and SRX650 will use In-Band Cluster Upgrade (ICU). Although the commands are near enough the same; the pre-upgrade requirement, service impacts and the minimum Junos firmware version that supporting in-service upgrades are different.

As I’m using 2x SRX220H2 model firewalls, I will be upgrading via ICU. When I get chance to upgrade a High-End SRX model, I will update the post with my findings :p

Even before you consider using the ISSU/ICU method, I am telling you (no recommendation here!!) to check the Juniper page Limitation on ISSU and ICU. The page will confirm what version of Junos is supported by ISSU/ICU and (more importantly) services that are not supported by ISSU/ICU. In essence, you will need to change if/what services you are running on your SRX cluster to see if they are supported. If they are not supported then you are told DO NOT perform an upgrade with this method.

With that out of the way and if you have checked that your cluster is fully supported (firmware and service) by ISSU/ICU you can proceed with the pre-checks 😀

Pre-Upgrade Checks ICU
Junos VersionNo-sync optionDowngrade Method?Disk Space
You will need to be running Junos version 11.2R2 minimum. This can be checked by running show version on both Nodes.
ICU is available with the no-sync options only. The no-sync option disables the flow state from syncing with the second node when it boots with the new Junos image.
You CAN NOT use ICU to downgrade Junos to version lower than 11.2R2
You will need to check the disk space available in the /var/tmp file on the SRX. From Operational Mode start shell then enter the command df -h and you will get disk spaces available.

Having confirmed all the pre-checks are good, we can proceed with the upgrade. It is important to note that during an ICU, there WILL BE A SERVICE DISRUPTION! will be approximately 30 seconds with no-sync option. During this 30 seconds traffic will be dropped and flow session will be lost. You will need to keep this in mind, if you are doing this upgrade in-hours or you need to have a good record on your flow session for any reason.

To start the upgrade, we need to run request system software in-service-upgrade /path/to/package no-sync

{primary:node0}
[email protected]_SRX220_Top> request system software in-service-upgrade /var/tmp/junos-srxsme-12.1X44-D45.2-domestic.tgz no-sync
ICU Console observations
RebootingUpgrade OrderNode0 to Node1 failover processEnd Host View Point
It is important to note that during the ICU process, you won’t need do any manual reboots, all the reboots are automated within the process

WARNING: in-service-upgrade shall reboot both the nodes
         in your cluster. Please ignore any subsequent 
         reboot request message
Once the process has started Node1 is upgraded first:

Node1 is upgraded first
ISSU: start downloading software package on secondary node
Pushing bundle to node1
{.......}
JUNOS 12.1X44-D45.2 will become active at next reboot
WARNING: A reboot is required to load this software correctly
WARNING:     Use the 'request system reboot' command
WARNING:         when software installation is complete
Saving state for rollback ...
ISSU: failover all redundancy-groups 1...n to primary node
Successfully reset all redundancy-groups priority back to configured priority.
Successfully reset all redundancy-groups priority back to configured priority.
Initiated manual failover for all redundancy-groups to node0
Redundancy-groups-0 will not failover and the primaryship remains unchanged.
ISSU: rebooting Secondary Node
Shutdown NOW!
[pid 13353]
ISSU: Waiting for secondary node node1 to reboot.
ISSU: node 1 went down
ISSU: Waiting for node 1 to come up
It takes few minutes for node0 to reboot after node1 comes back online if you have console connection on both SRXs, you will need to be patient before aborting the upgrade. If you have rolling ping going for each nodes fxp interface you will when the node0 is about to reboot as node1 pings will return. Once node1 is up and booted, Node0 will start to reboot.

ISSU: node 1 came up
ISSU: secondary node node1 booted up.
Shutdown NOW!
From hitting enter to having both firewalls upgraded it had taken 22:45min. Although the documentation said were will be an outage of 30 seconds the rolling ping between trust <--> untrust shows that there was no packet-loss and only 6 packets out of 1600 transmitted weren’t received. (Saying that, for my testing I was unable to get live flow session information.)

root> ping 172.16.0.2 routing-instance trust 
--- 172.16.0.2 ping statistics ---
1600 packets transmitted, 1594 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.720/2.640/13.673/0.652 ms
--------------------------------------------------------------------------
root> ping 192.168.0.2 routing-instance untrust
--- 192.168.0.2 ping statistics ---
1600 packets transmitted, 1594 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.838/2.535/13.669/0.681 ms
To verify that the upgrade has been successful, we can run the commands show version

{secondary:node0}
[email protected]_SRX220_Top> show version 
node0:
--------------------------------------------------------------------------
Hostname: lab_SRX220_Top
Model: srx220h2
JUNOS Software Release [12.1X44-D45.2]

node1:
--------------------------------------------------------------------------
Hostname: lab_SRX220_Top
Model: srx220h2
JUNOS Software Release [12.1X44-D45.2]

And show chassis cluster status, to see that chassis status is as expected

[email protected]_SRX220_Top> show chassis cluster status 
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 0
    node0                   100         secondary      no       no  
    node1                   1           primary        no       no  

Redundancy group: 1 , Failover count: 1
    node0                   100         primary        yes      no  
    node1                   1           secondary      yes      no 

We can see that we are running the upgraded version of Junos. As expected Redundancy Group 0 is primary on Node1 and Redundancy Group 1 is primary on Node0. As discussed in my previous post, with preempt enabled Redundancy Group 1 will automatically failover to Node0, once it is available. We will have to do a manual failover of redundancy group 0 back to Node0 from Node1 and we will need to upgrade the backup image of Junos to have a consistent primary and backup image.

If you had a case where you had to abort the ICU process you will need to run request system software abort in-service-upgrade on the primary node. It is important to note, if you do use the abort command, you will put the cluster into an inconsistent state, where the secondary node will be running a newer version of Junos to the Primary node. To recover the cluster into a consistent state you will need to do the following all on the secondary node:

Recovering from an Inconsistent State
1. You will need to abort the upgrade: request system software abort in-service-upgrade
2. Rollback to the older version of Junos, that will be on the primary node request system software rollback node {node-id}
3. Perform a reboot of the node request system reboot

**UPDATE 29/4/2015**
Lucky enough, as I was finishing up this series of posts, my colleague had finished working on the SRX1400 we have in our lab! So I was able to run testing on doing ISSU upgrade on High End SRX Series device 😀 Happy Days!!!

SRX1400 testing differences
1. The SRX1400 doesn’t have any routing protocols, I will not need to configure graceful restart.
2. I will be upgrading from 12.1X44-D40.2 to 12.1X46-D10.2
3. The topology will be the same, however the IP addressing will be different. Trust will be 192.168.13.0/24 and Untrust will be 172.31.13.0/24
Pre-Upgrade Checks ISSU
Junos VersionDowngrade Method?RoutingRedundancy GroupsRedundancy Group 0
You will need to check to see, if the version of Junos code supports ISSU. This can be checked by running show version on both Nodes. You will need to be using Junos version 9.6 and later
ISSU DOES NOT support firmware downgrade!
Juniper recommend that a graceful restart for routing protocols be enabled Before starting an ISSU
Manually failing over all redundancy groups to one active only (For my example, as I have a active/backup setup, you won’t need to change anything. However, if you have active/active setup, you will need to change you configuration changes)
Once the upgrade has been completed you will need to Manual Failover Redundancy Group 0 back to Node0 (see Failover on SRX cluster pt1)

To start the upgrade, firstly all the redundancy groups need to fail over to one active node. As I have an active/backup setup, all my redundancy groups are on node0

{primary:node0}
[email protected]_be-rtr0-h3> show chassis cluster status        
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 3
    node0                   100         primary        no       no  
    node1                   99          secondary      no       no  

Redundancy group: 1 , Failover count: 5
    node0                   100         primary        yes      no  
    node1                   99          secondary      yes      no

To be the upgrade process, we need to run request system software in-service-upgrade /path/to/package reboot

Important note
Unlike with the ICU upgrade process, you have to enter the option reboot to confirm that you want a reboot after. If you don’t use the option reboot, the command will fail. This only applies to the High End SRX devices, SRX1400, SRX3400, SRX3600, SRX5600 and SRX5800.
ISSU Console observations
Patience neededNode1 FailoverEnd Host View Point
It does take quite a while from this point before more output will come from the console on node0, so you will need to be patience.

Validation succeeded
failover all RG 1+ groups to node 0 
Initiated manual failover for all redundancy-groups to node0
Redundancy-groups-0 will not failover and the primaryship remains unchanged.
ISSU: Preparing Backup RE
Pushing bundle to node1
Once Node1 is up and you see the output below

ISSU: Backup RE Prepare Done
Waiting for node1 to reboot.
node1 booted up.
Waiting for node1 to become secondary
node1 became secondary.
Waiting for node1 to be ready for failover
ISSU: Preparing Daemons

It takes around 5-10mins before you see anymore output to say the upgrade process is still going on! Again you will need to be patient as this does take its time!

Secondary node1 ready for failover.
{.......}
Failing over all redundancy-groups to node1
ISSU: Preparing for Switchover
Initiated failover for all the redundancy groups to node1
Waiting for node1 take over all redundancy groups
From hitting enter to having both firewalls upgraded it had taken 30:18min. The rolling ping between trust <--> untrust shows that they was no packet-loss and only 2 packets out of 3639 transmitted weren’t received. (As like before, unfortunately I was unable to get live flow session information)

root> ping 172.31.13.2 routing-instance trust 
--- 172.31.13.2 ping statistics ---
1818 packets transmitted, 1817 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.769/3.080/44.226/3.536 ms
--------------------------------------------------------------------------
root> ping 192.168.13.2 routing-instance untrust 
--- 192.168.13.2 ping statistics ---
1821 packets transmitted, 1820 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.831/3.071/44.524/3.244 ms

To verify that the upgrade has been successful, we can run the commands show version

{secondary:node0}
[email protected]_be-rtr0-h3> show version 
node0:
--------------------------------------------------------------------------
Hostname: lab_be-rtr0-h3
Model: srx1400
JUNOS Software Release [12.1X46-D10.2]

node1:
--------------------------------------------------------------------------
Hostname: lab_be-rtr0-i3
Model: srx1400
JUNOS Software Release [12.1X46-D10.2]

And show chassis cluster status, to see that chassis status is as expected

{secondary:node0}
[email protected]_be-rtr0-h3> show chassis cluster status 
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 0
    node0                   100         secondary      no       no  
    node1                   99          primary        no       no  

Redundancy group: 1 , Failover count: 1
    node0                   100         primary        yes      no  
    node1                   99          secondary      yes      no 

We can see that we are running the upgraded version of Junos. As expected Redundancy Group 0 is primary on Node1 and Redundancy Group 1 is primary on Node0. As discussed in my previous post, with preempt enabled Redundancy Group 1 will automatically failover to Node0, once it is available. We will have to do a manual failover of redundancy group 0 back to Node0 from Node1 and we will need to upgrade the backup image of Junos to have a consistent primary and backup image.

Unexpected output
During the reboot and manual failover of redundancy group 0 on Node0, I had got the output below on my console terminal

Message from [email protected]_be-rtr0-h3 at Apr 29 12:26:40  ...
lab_be-rtr0-h3 node0.fpc1.pic0 PFEMAN: Shutting down , PFEMAN Resync aborted! No peer info on reconnect or master rebooted?  

Message from [email protected]_be-rtr0-h3 at Apr 29 12:26:40  ...
lab_be-rtr0-h3 node0.cpp0 RDP: Remote side closed connection: rdp.(17825794:13321).(serverRouter:chassis)

[email protected]_be-rtr0-i3> Apr 29 12:27:04 init: can not access /usr/sbin/ipmid: No such file or directory

Message from [email protected]_be-rtr0-i3 at Apr 29 12:27:05  ...
lab_be-rtr0-i3 node1.cpp0 RDP: Remote side closed connection: rdp.(34603010:33793).(serverRouter:pfe) 

Message from [email protected]_be-rtr0-i3 at Apr 29 12:27:05  ...
lab_be-rtr0-i3 node1.cpp0 RDP: Remote side closed connection: rdp.(34603010:33792).(serverRouter:chassis) 

Message from [email protected]_be-rtr0-i3 at Apr 29 12:27:17  ...
lab_be-rtr0-i3 node1.cpp0 RDP: Remote side reset connection: rdp.(34603010:33794).(primaryRouter:1008) 

Message from [email protected]_be-rtr0-i3 at Apr 29 12:27:18  ...
lab_be-rtr0-i3 node1.cpp0 RDP: Remote side reset connection: rdp.(34603010:33795).(primaryRouter:1007)

I had raised this with Juniper and they sent this article. The article confirms that the error messages are expected if you are connected via the console or fxp0 interface. “The above mentioned messages, which are generated on the console session, states that the routing-engine [control plane(RG0)] has become active on the other node….These messages are due to the following syslog user configuration: system syslog user *.

You can stop this error by deactivating system syslog user *.

Note: It is recommended by Juniper for you keep the ‘syslog user (‘any emergency’)’ configuration and ignore these informational messages, as they might show certain useful information to the user.

Phew that was a lot of work and quite a bit to take in there!! Time for a break, (a drink or 6 lol)

My next post will be the last post in the SRX Chassis Cluster Series (sad times 🙁 ). It will be nice simple one on how to disable chassis cluster!

Share this:
Share

Creating HA Juniper SRX Chassis Cluster

Reading Time: 6 minutes

This guide is for a clean clustering of 2 Juniper SRX Series firewalls

Topology

The topology that will be used, in the series of new posts, based on configuring, failing over and upgrading a High Availability (HA) Juniper SRX Chassis Cluster. The hardware used were: 2x Juniper SRX220H2 (brand new with factory-default settings) and 1x Juniper EX4200. As I’m using a single EX4200, I configured two routing-instances “Trust” and “Untrust”. By using the routing-instances’ I’m able to have multiple routing-tables on a single device without creating routing loops. The tabs below will provide diagrams of the physical, logical and the full configuration on EX4200.

Physical TopologyLogical TopologyEX4200 Configuration
set interfaces ge-0/0/0 description "SRX220 Bottom untrust interface"
set interfaces ge-0/0/0 enable
set interfaces ge-0/0/0 unit 0 family ethernet-switching port-mode trunk
set interfaces ge-0/0/0 unit 0 family ethernet-switching vlan members untrust

set interfaces ge-0/0/1 description "SRX220 Top untrust interface"
set interfaces ge-0/0/1 enable
set interfaces ge-0/0/1 unit 0 family ethernet-switching port-mode trunk
set interfaces ge-0/0/1 unit 0 family ethernet-switching vlan members untrust

set interfaces ge-0/0/2 description "SRX220 Bottom trust interface"
set interfaces ge-0/0/2 enable
set interfaces ge-0/0/2 unit 0 family ethernet-switching port-mode trunk
set interfaces ge-0/0/2 unit 0 family ethernet-switching vlan members trust

set interfaces ge-0/0/3 description "SRX220 Top trust interface"
set interfaces ge-0/0/3 enable
set interfaces ge-0/0/3 unit 0 family ethernet-switching port-mode trunk
set interfaces ge-0/0/3 unit 0 family ethernet-switching vlan members trust


set interfaces vlan unit 10 description untrust
set interfaces vlan unit 10 family inet address 172.16.0.2/24

set interfaces vlan unit 20 description trust
set interfaces vlan unit 20 family inet address 192.168.0.2/24

set routing-instances trust instance-type virtual-router
set routing-instances trust interface vlan.20
set routing-instances trust routing-options static route 172.16.0.0/24 next-hop 192.168.0.1

set routing-instances untrust instance-type virtual-router
set routing-instances untrust interface vlan.10
set routing-instances untrust routing-options static route 192.168.0.0/24 next-hop 172.16.0.1

set vlans trust vlan-id 20
set vlans trust l3-interface vlan.20

set vlans untrust vlan-id 10
set vlans untrust l3-interface vlan.10

Some of the pre-checks that will need to be done before you start:

Chassis ClusterHardwareJunos Version
Remove chassis cluster. (You don’t need to this brand new firewalls but I just to do, better to save that sorry) This is done from operational mode and will reboot the device.

set chassis cluster disable reboot
Check that you are using the same hardware as you can’t have mixed chassis clustered firewalls

[email protected]_SRX220_Bottom> show chassis hardware        
Hardware inventory:
Item             Version  Part number  Serial number     Description
Chassis                                CF4713AK0219      SRX220H2
Routing Engine   REV 04   750-048778   ACKS2263          RE-SRX220H2
FPC 0                                                    FPC
  PIC 0                                                  8x GE Base PIC
Power Supply 0
Check that you have the same code version of Junos

[email protected]_SRX220_Bottom> show version 
Hostname: lab_SRX220_Bottom
Model: srx220h2
JUNOS Software Release [12.1X44-D40.2]

Once you have confirmed that the hardware and software versions are the same you can start with the chassis cluster

Having confirmed that both SRX220’s identical starting configuration, we can begin the clustering:

1. Physically connect the 2 devices together to Create the control and fabric (data) links. Nodes in cluster use these links to communicate between each other about the cluster health, status and other traffic information. Control link is used to configure the nodes in the cluster and the Data link allows session synchronization between nodes. The Control and Fabric interfaces are hardware specific, so different models have will use different ports. You can see each specific model’s control and fabric ports via the Juniper Knowledge Centre

On the SRX220H for the Control link:

You will need to connect ge-0/0/7 on SRX A (node 0) to ge-0/0/7 on SRX B (node1). This will change to ge-3/0/7 once the chassis cluster has been completed

On the SRX220H for the Fabric Link

You will need to connect ge-0/0/5 on node 0 to ge-0/0/5 on node 1. As with the control link, this interface will change to ge-3/0/5 once the chassis cluster has been completed

2. Next, we need to cluster mode. As with removing the chassis cluster configuration from before, this will reboot the firewalls and will need to done from operational mode.

set chassis cluster cluster-id 1 node 0 reboot
set chassis cluster cluster-id 1 node 1 reboot
Important Pre Check Notes
Notes:
a) The cluster ID on the firewalls will need to be the same, however the node ID has to be different. This is numbered between 0 and 1
b) The command above has been done on both devices
c) Although you are given the option to pick a cluster ID from 0-15, using ID 0 is the same as disabling the cluster mode. You will need to pick a number between 1-15, this has to do with how virtual MACs are calculated
We can verify that chassis cluster was successful by running

[email protected]_SRX220_Top> show chassis cluster status 
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   1           primary        no       no  
    node1                   1           secondary      no       no

Now that we have the chassis cluster completed, we can start with the configuration. We can do the entire configuration on the primary node0 and anything that is committed on the primary node0 will be copied onto the secondary node1

3. We sent the management interfaces (fxp0) on each of the nodes. This will allow us to have remote SSH access onto each node.

set groups node0 system host-name SRXA
set groups node0 interfaces fxp0 unit 0 family inet address 10.1.0.201/24
set groups node1 system host-name SRXB
set groups node1 interfaces fxp0 unit 0 family inet address 10.1.0.202/24
set apply-groups "${node}"
Device Management Note
Adding the command set apply-groups “${node}” is mandatory, as it ensures that the node specific configuration is only committed on that specific node

4. Now, its time to configure the Fabric links in the cluster

set interfaces fab0 fabric-options member-interfaces ge-0/0/5
set interfaces fab1 fabric-options member-interfaces ge-3/0/5

We can check the interfaces, we have just committed

[email protected]_SRX220_Top# run show chassis cluster interfaces 
Control link status: Up

Control interfaces: 
    Index   Interface        Status
    0       fxp1             Up    

Fabric link status: Up

Fabric interfaces: 
    Name    Child-interface    Status
                               (Physical/Monitored)
    fab0    ge-0/0/5           Up   / Up  
    fab0   
    fab1    ge-3/0/5           Up   / Up  
    fab1

5. Configure the Redundancy Groups 0 and 1. The purpose of the redundancy groups is that in a failure situation the control panel (Routing-Engine) can be failed over to the secondary node. In a HA Cluster, Redundancy group 0, by default, represents the control plane. The node that is the master of Redundancy Group 0 (in this example node0) will be the Active Routing-Engine (RE). The Active RE is master of the Cluster; it is responsible for pushing any new configuration changes and controlling the data plane. Any changes that need to be done in the cluster will have to be done via the Active RE. If node0 was to failover, Node1 will be the new Active RE, although you can only have one Active RE node, a single node can be the primary node for a number redundancy groups. By setting the priority higher on node0, ensures that the node0 is the master of both redundancy groups. By using Preempt on the redundancy group 1 means that if node0 fail and a failover to node1 occured, once node0 became active it will automatically take ownership of the chassis cluster and become the Active RE again.

set chassis cluster redundancy-group 0 node 0 priority 100
set chassis cluster redundancy-group 0 node 1 priority 1
set chassis cluster redundancy-group 1 node 0 priority 100
set chassis cluster redundancy-group 1 node 1 priority 1
set chassis cluster redundancy-group 1 preempt

6. Next, step will be to configure interface monitoring. This will check the health and physical status of the each of the interfaces. Interface monitoring can be used to trigger a failover in the event link status on an interface goes down. By default interface monitoring has a threshold of 255, once this number is reached the redundancy group priority will be changed to ‘0’ for the specific node. If one or more interfaces monitored fail the redundancy group will fail over to other node.

set chassis cluster redundancy-group 1 interface-monitor ge-0/0/1 weight 255
set chassis cluster redundancy-group 1 interface-monitor ge-3/0/1 weight 255
set chassis cluster redundancy-group 1 interface-monitor ge-0/0/2 weight 255
set chassis cluster redundancy-group 1 interface-monitor ge-3/0/2 weight 255

7. Setting the interfaces. With SRX you need set Redundancy Ethernet (Reth) count before you are able to assign physical interfaces. The Reth interface is a logical aggregated interface that allows port bundling between the nodes. For this example, I only will only need 2 reth0 (1 for the trust and 1 for untrust). Once the reth number has been applied, you will be able to assign the physical interfaces.

set chassis cluster reth-count 2
set interfaces ge-0/0/1 gigether-options redundant-parent reth1
set interfaces ge-3/0/1 gigether-options redundant-parent reth1
set interfaces ge-0/0/2 gigether-options redundant-parent reth0
set interfaces ge-3/0/2 gigether-options redundant-parent reth0
Reth Interface Note
It’s recommended that you only provision reth interfaces, as you need them. This is so you conserve resources on the firewall

8. Similarly with Aggregated Ethernet interfaces on EX or MX Series, you will do the entire configuration for the reth under the logical interface. You need to define the interfaces redundancy group. As redundancy group 0 is control panel for this example both reth interfaces will be in redundancy group 1.

set interfaces reth0 vlan-tagging
set interfaces reth0 redundant-ether-options redundancy-group 1
set interfaces reth0 unit 10 description Untrust
set interfaces reth0 unit 10 vlan-id 10
set interfaces reth0 unit 10 family inet address 172.16.0.1/24

set interfaces reth1 vlan-tagging
set interfaces reth1 redundant-ether-options redundancy-group 1
set interfaces reth1 unit 20 description trust
set interfaces reth1 unit 20 vlan-id 20
set interfaces reth1 unit 20 family inet address 192.168.0.1/24
Interface Configuration Note
For my topology, I used VLAN interfaces, vlan-tagging had to be enabled and the links downstream were trunk interfaces. I also used Routing-Instances for the trust and untrust zones, as I used the global routing table as management of the device. I have added a diagram and configuration file of testing setup

To ensure that end-to-end connectivity was as expected, I had created these security zones and security policies, to get the communication between the two reth interface. The zones and policies are very vanilla, as I just need to be able to ping across.

Zones and Policies
set security policies from-zone untrust to-zone trust policy ping match source-address any
set security policies from-zone untrust to-zone trust policy ping match destination-address any
set security policies from-zone untrust to-zone trust policy ping match application junos-icmp-all
set security policies from-zone untrust to-zone trust policy ping then permit

set security policies from-zone trust to-zone untrust policy ping match source-address any
set security policies from-zone trust to-zone untrust policy ping match destination-address any
set security policies from-zone trust to-zone untrust policy ping match application junos-icmp-all
set security policies from-zone trust to-zone untrust policy ping then permit

set security zones security-zone trust tcp-rst
set security zones security-zone trust host-inbound-traffic system-services all
set security zones security-zone trust interfaces reth1.20

set security zones security-zone untrust tcp-rst
set security zones security-zone untrust host-inbound-traffic system-services all
set security zones security-zone untrust interfaces reth0.10

set routing-instances Testing instance-type virtual-router
set routing-instances Testing interface reth0.10
set routing-instances Testing interface reth1.20

From my end device, I had end-to-end reachability

root> ping 172.16.0.2 routing-instance trust 
--- 172.16.0.2 ping statistics ---
31 packets transmitted, 31 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.851/1.964/2.273/0.105 ms

root> ping 192.168.0.2 routing-instance untrust
--- 192.168.0.2 ping statistics ---
30 packets transmitted, 30 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.842/1.971/2.675/0.163 ms

And from the firewall, I was able to see the pings going across as flow sessions

[email protected]_SRX220_Top> show security flow session    
node0:
--------------------------------------------------------------------------

Session ID: 621, Policy name: ping/5, State: Active, Timeout: 2, Valid
  In: 192.168.0.2/7 --> 172.16.0.2/6279;icmp, If: reth1.20, Pkts: 1, Bytes: 84
  Out: 172.16.0.2/6279 --> 192.168.0.2/7;icmp, If: reth0.10, Pkts: 1, Bytes: 84

Session ID: 622, Policy name: ping/4, State: Active, Timeout: 2, Valid
  In: 172.16.0.2/9 --> 192.168.0.2/6277;icmp, If: reth0.10, Pkts: 1, Bytes: 84
  Out: 192.168.0.2/6277 --> 172.16.0.2/9;icmp, If: reth1.20, Pkts: 1, Bytes: 84

Session ID: 623, Policy name: ping/5, State: Active, Timeout: 2, Valid
  In: 192.168.0.2/8 --> 172.16.0.2/6279;icmp, If: reth1.20, Pkts: 1, Bytes: 84
  Out: 172.16.0.2/6279 --> 192.168.0.2/8;icmp, If: reth0.10, Pkts: 1, Bytes: 84

Session ID: 624, Policy name: ping/4, State: Active, Timeout: 2, Valid
  In: 172.16.0.2/10 --> 192.168.0.2/6277;icmp, If: reth0.10, Pkts: 1, Bytes: 84
  Out: 192.168.0.2/6277 --> 172.16.0.2/10;icmp, If: reth1.20, Pkts: 1, Bytes: 84

Session ID: 625, Policy name: ping/5, State: Active, Timeout: 4, Valid
  In: 192.168.0.2/9 --> 172.16.0.2/6279;icmp, If: reth1.20, Pkts: 1, Bytes: 84
  Out: 172.16.0.2/6279 --> 192.168.0.2/9;icmp, If: reth0.10, Pkts: 1, Bytes: 84
Total sessions: 5

node1:
--------------------------------------------------------------------------
Total sessions: 0

Having now got the cluster up and working, it was time to get to some proper failover testing! In my next post will note how that went as this post is pretty long now haha

Useful Side Notes
I) Make sure there NO configuration on port g0/0/5 – 7, I had configured on port ge-0/0/6 as need to SCP the correct version of Junos onto both firewalls and as I read that only ge-0/05 and ge-0/0/7 will be used, I assumed using ge-0/0/6 would be fine… This is why you should never assume. So if you need to upgrade Junos, upgrade the firewalls then delete all configuration under the interface stanza

II) once chassis cluster has been completed and you enter configuration mode, you will get this warning

[email protected]_SRX220_Top> edit 
warning: Clustering enabled; using private edit
warning: uncommitted changes will be discarded on exit
Entering configuration mode

III) When doing a cluster reboot, I used the command request system reboot node all and oddly had the node0 reboot as expected however the node1 couldn’t be accessed via ssh. I tried to reboot from node0 and got this:

[email protected]_SRX220_Top> request system reboot node 1    
error: Could not connect to node1 : No route to host
error: Unable to send command

Doing a chassis check I saw that node1 was lost

[email protected]_SRX220_Top> show chassis cluster status     
Cluster ID: 1 
Node                  Priority          Status    Preempt  Manual failover

Redundancy group: 0 , Failover count: 1
    node0                   1           primary        no       no  
    node1                   0           lost           n/a      n/a

I luckily had both SRXs connected to console and when I checked, I saw that node1 got struck in the bootloader process. I ran the “boot” command from the bootloader continued the boot process and when SRX fully booted res-ynced with node0. Everything was re-synced I ran the command again to see if this is a common issue or just a one off
and it looks like it was a one off. This something to note and could be time saver if are stuck with what to do

IV) When doing connectivity test I was able to ping from untrust -> trust, however when I did a ping from trust -> untrust packets were being dropped. After creating a trace-option on security flows, I saw this message:

Apr 23 02:51:49 02:51:49.287041:CID-1:RT:  reth1.20:192.168.0.2/77->172.16.0.1/6141,1, icmp 8/0 
Apr 23 02:51:49 02:51:49.287041:CID-1:RT:  packet dropped,  policy deny.

I was under the assumption that when you create a security policy it was symmetrical however I was wrong security policies are asymmetrical. When i created a new policy trust -> untrust everything went as expected. (Probably straightforward fix and why I’m working more with firewall, as this is still all new to me :p)

Full Chassis Cluster SRX Configuration
set groups node0 system host-name SRXA
set groups node0 interfaces fxp0 unit 0 family inet address 10.1.0.201/24
set groups node1 system host-name SRXB
set groups node1 interfaces fxp0 unit 0 family inet address 10.1.0.202/24
set apply-groups “${node}”

set chassis cluster reth-count 2
set chassis cluster redundancy-group 0 node 0 priority 100
set chassis cluster redundancy-group 0 node 1 priority 1
set chassis cluster redundancy-group 1 node 0 priority 100
set chassis cluster redundancy-group 1 node 1 priority 1

set chassis cluster redundancy-group 1 interface-monitor ge-0/0/1 weight 255
set chassis cluster redundancy-group 1 interface-monitor ge-3/0/1 weight 255
set chassis cluster redundancy-group 1 interface-monitor ge-3/0/2 weight 255
set chassis cluster redundancy-group 1 interface-monitor ge-0/0/2 weight 255

set interfaces ge-0/0/1 description “trust interface to ge-0/0/3 EX4200”
set interfaces ge-0/0/1 gigether-options redundant-parent reth1
set interfaces ge-0/0/2 description “untrust interface to ge-0/0/1 EX4200”
set interfaces ge-0/0/2 gigether-options redundant-parent reth0
set interfaces ge-3/0/1 description “untrust interface to ge-0/0/0 EX4200”
set interfaces ge-3/0/1 gigether-options redundant-parent reth0
set interfaces ge-3/0/2 description “trust interface to ge-0/0/2 EX4200”
set interfaces ge-3/0/2 gigether-options redundant-parent reth1

set interfaces fab0 fabric-options member-interfaces ge-0/0/5
set interfaces fab1 fabric-options member-interfaces ge-3/0/5

set interfaces reth0 vlan-tagging
set interfaces reth0 redundant-ether-options redundancy-group 1
set interfaces reth0 unit 10 description Untrust
set interfaces reth0 unit 10 vlan-id 10
set interfaces reth0 unit 10 family inet address 172.16.0.1/24

set interfaces reth1 vlan-tagging
set interfaces reth1 redundant-ether-options redundancy-group 1
set interfaces reth1 unit 20 vlan-id 20
set interfaces reth1 unit 20 family inet address 192.168.0.1/24

set security forwarding-options family inet6 mode flow-based

set security policies from-zone untrust to-zone trust policy ping match source-address any
set security policies from-zone untrust to-zone trust policy ping match destination-address any
set security policies from-zone untrust to-zone trust policy ping match application junos-icmp-all
set security policies from-zone untrust to-zone trust policy ping then permit

set security policies from-zone trust to-zone untrust policy ping match source-address any
set security policies from-zone trust to-zone untrust policy ping match destination-address any
set security policies from-zone trust to-zone untrust policy ping match application junos-icmp-all
set security policies from-zone trust to-zone untrust policy ping then permit

set security zones security-zone trust tcp-rst
set security zones security-zone trust host-inbound-traffic system-services all
set security zones security-zone trust interfaces reth1.20

set security zones security-zone untrust tcp-rst
set security zones security-zone untrust host-inbound-traffic system-services all
set security zones security-zone untrust interfaces reth0.10

set routing-instances Testing instance-type virtual-router
set routing-instances Testing interface reth0.10
set routing-instances Testing interface reth1.20

Share this:
Share

How to enable IPv6 flow (or packet) mode on SRX

Reading Time: 1 minute

By default IPv6 traffic is dropped by Juniper SRX Series firewall. We can see this by running show security flow status command

[email protected]_SRX220_Top> show security flow status 
node0:
--------------------------------------------------------------------------
  Flow forwarding mode:
    Inet forwarding mode: flow based
    Inet6 forwarding mode: drop
    MPLS forwarding mode: drop
    ISO forwarding mode: drop
  Flow trace status
    Flow tracing status: off
  Flow session distribution
    Distribution mode: RR-based

node1:
--------------------------------------------------------------------------
  Flow forwarding mode:
    Inet forwarding mode: flow based
    Inet6 forwarding mode: drop
    MPLS forwarding mode: drop
    ISO forwarding mode: drop
  Flow trace status
    Flow tracing status: off
  Flow session distribution
    Distribution mode: RR-based

To allow flow or packet based traffic to pass through the SRX you will need run the command:

set security forwarding-options family inet6 mode (flow-based|packet-based)

Once this is committed you will get a warning explaining a reboot is needed for the change to be applied.

[email protected]_SRX220_Top# commit 
warning: You have enabled/disabled inet6 flow.
You must reboot the system for your change to take effect.
If you have deployed a cluster, be sure to reboot all nodes.

After the reboot, we can check that flow (or packet) based IPv6 traffic is passing by checking the show security flow status

[email protected]_SRX220_Top> show security flow status                
node0:
--------------------------------------------------------------------------
  Flow forwarding mode:
    Inet forwarding mode: flow based
    Inet6 forwarding mode: flow based
    MPLS forwarding mode: drop
    ISO forwarding mode: drop
  Flow trace status
    Flow tracing status: off
  Flow session distribution
    Distribution mode: RR-based

node1:
--------------------------------------------------------------------------
  Flow forwarding mode:
    Inet forwarding mode: flow based
    Inet6 forwarding mode: flow based
    MPLS forwarding mode: drop
    ISO forwarding mode: drop
  Flow trace status
    Flow tracing status: off
  Flow session distribution
    Distribution mode: RR-based
Share this:
Share