This guide is for a clean clustering of 2 Juniper SRX Series firewalls
Topology
The topology that will be used, in the series of new posts, based on configuring, failing over and upgrading a High Availability (HA) Juniper SRX Chassis Cluster. The hardware used were: 2x Juniper SRX220H2 (brand new with factory-default settings) and 1x Juniper EX4200. As I’m using a single EX4200, I configured two routing-instances “Trust” and “Untrust”. By using the routing-instances’ I’m able to have multiple routing-tables on a single device without creating routing loops. The tabs below will provide diagrams of the physical, logical and the full configuration on EX4200.


set interfaces ge-0/0/0 description "SRX220 Bottom untrust interface" set interfaces ge-0/0/0 enable set interfaces ge-0/0/0 unit 0 family ethernet-switching port-mode trunk set interfaces ge-0/0/0 unit 0 family ethernet-switching vlan members untrust set interfaces ge-0/0/1 description "SRX220 Top untrust interface" set interfaces ge-0/0/1 enable set interfaces ge-0/0/1 unit 0 family ethernet-switching port-mode trunk set interfaces ge-0/0/1 unit 0 family ethernet-switching vlan members untrust set interfaces ge-0/0/2 description "SRX220 Bottom trust interface" set interfaces ge-0/0/2 enable set interfaces ge-0/0/2 unit 0 family ethernet-switching port-mode trunk set interfaces ge-0/0/2 unit 0 family ethernet-switching vlan members trust set interfaces ge-0/0/3 description "SRX220 Top trust interface" set interfaces ge-0/0/3 enable set interfaces ge-0/0/3 unit 0 family ethernet-switching port-mode trunk set interfaces ge-0/0/3 unit 0 family ethernet-switching vlan members trust set interfaces vlan unit 10 description untrust set interfaces vlan unit 10 family inet address 172.16.0.2/24 set interfaces vlan unit 20 description trust set interfaces vlan unit 20 family inet address 192.168.0.2/24 set routing-instances trust instance-type virtual-router set routing-instances trust interface vlan.20 set routing-instances trust routing-options static route 172.16.0.0/24 next-hop 192.168.0.1 set routing-instances untrust instance-type virtual-router set routing-instances untrust interface vlan.10 set routing-instances untrust routing-options static route 192.168.0.0/24 next-hop 172.16.0.1 set vlans trust vlan-id 20 set vlans trust l3-interface vlan.20 set vlans untrust vlan-id 10 set vlans untrust l3-interface vlan.10
Some of the pre-checks that will need to be done before you start:
set chassis cluster disable reboot
[email protected]_SRX220_Bottom> show chassis hardware Hardware inventory: Item Version Part number Serial number Description Chassis CF4713AK0219 SRX220H2 Routing Engine REV 04 750-048778 ACKS2263 RE-SRX220H2 FPC 0 FPC PIC 0 8x GE Base PIC Power Supply 0
[email protected]_SRX220_Bottom> show version Hostname: lab_SRX220_Bottom Model: srx220h2 JUNOS Software Release [12.1X44-D40.2]
Once you have confirmed that the hardware and software versions are the same you can start with the chassis cluster
Having confirmed that both SRX220’s identical starting configuration, we can begin the clustering:
1. Physically connect the 2 devices together to Create the control and fabric (data) links. Nodes in cluster use these links to communicate between each other about the cluster health, status and other traffic information. Control link is used to configure the nodes in the cluster and the Data link allows session synchronization between nodes. The Control and Fabric interfaces are hardware specific, so different models have will use different ports. You can see each specific model’s control and fabric ports via the Juniper Knowledge Centre
On the SRX220H for the Control link:
You will need to connect ge-0/0/7 on SRX A (node 0) to ge-0/0/7 on SRX B (node1). This will change to ge-3/0/7 once the chassis cluster has been completed
On the SRX220H for the Fabric Link
You will need to connect ge-0/0/5 on node 0 to ge-0/0/5 on node 1. As with the control link, this interface will change to ge-3/0/5 once the chassis cluster has been completed
2. Next, we need to cluster mode. As with removing the chassis cluster configuration from before, this will reboot the firewalls and will need to done from operational mode.
set chassis cluster cluster-id 1 node 0 reboot set chassis cluster cluster-id 1 node 1 rebootWe can verify that chassis cluster was successful by running
[email protected]_SRX220_Top> show chassis cluster status Cluster ID: 1 Node Priority Status Preempt Manual failover Redundancy group: 0 , Failover count: 1 node0 1 primary no no node1 1 secondary no no
Now that we have the chassis cluster completed, we can start with the configuration. We can do the entire configuration on the primary node0 and anything that is committed on the primary node0 will be copied onto the secondary node1
3. We sent the management interfaces (fxp0) on each of the nodes. This will allow us to have remote SSH access onto each node.
set groups node0 system host-name SRXA set groups node0 interfaces fxp0 unit 0 family inet address 10.1.0.201/24 set groups node1 system host-name SRXB set groups node1 interfaces fxp0 unit 0 family inet address 10.1.0.202/24 set apply-groups "${node}"
4. Now, its time to configure the Fabric links in the cluster
set interfaces fab0 fabric-options member-interfaces ge-0/0/5 set interfaces fab1 fabric-options member-interfaces ge-3/0/5
We can check the interfaces, we have just committed
[email protected]_SRX220_Top# run show chassis cluster interfaces Control link status: Up Control interfaces: Index Interface Status 0 fxp1 Up Fabric link status: Up Fabric interfaces: Name Child-interface Status (Physical/Monitored) fab0 ge-0/0/5 Up / Up fab0 fab1 ge-3/0/5 Up / Up fab1
5. Configure the Redundancy Groups 0 and 1. The purpose of the redundancy groups is that in a failure situation the control panel (Routing-Engine) can be failed over to the secondary node. In a HA Cluster, Redundancy group 0, by default, represents the control plane. The node that is the master of Redundancy Group 0 (in this example node0) will be the Active Routing-Engine (RE). The Active RE is master of the Cluster; it is responsible for pushing any new configuration changes and controlling the data plane. Any changes that need to be done in the cluster will have to be done via the Active RE. If node0 was to failover, Node1 will be the new Active RE, although you can only have one Active RE node, a single node can be the primary node for a number redundancy groups. By setting the priority higher on node0, ensures that the node0 is the master of both redundancy groups. By using Preempt on the redundancy group 1 means that if node0 fail and a failover to node1 occured, once node0 became active it will automatically take ownership of the chassis cluster and become the Active RE again.
set chassis cluster redundancy-group 0 node 0 priority 100 set chassis cluster redundancy-group 0 node 1 priority 1 set chassis cluster redundancy-group 1 node 0 priority 100 set chassis cluster redundancy-group 1 node 1 priority 1 set chassis cluster redundancy-group 1 preempt
6. Next, step will be to configure interface monitoring. This will check the health and physical status of the each of the interfaces. Interface monitoring can be used to trigger a failover in the event link status on an interface goes down. By default interface monitoring has a threshold of 255, once this number is reached the redundancy group priority will be changed to ‘0’ for the specific node. If one or more interfaces monitored fail the redundancy group will fail over to other node.
set chassis cluster redundancy-group 1 interface-monitor ge-0/0/1 weight 255 set chassis cluster redundancy-group 1 interface-monitor ge-3/0/1 weight 255 set chassis cluster redundancy-group 1 interface-monitor ge-0/0/2 weight 255 set chassis cluster redundancy-group 1 interface-monitor ge-3/0/2 weight 255
7. Setting the interfaces. With SRX you need set Redundancy Ethernet (Reth) count before you are able to assign physical interfaces. The Reth interface is a logical aggregated interface that allows port bundling between the nodes. For this example, I only will only need 2 reth0 (1 for the trust and 1 for untrust). Once the reth number has been applied, you will be able to assign the physical interfaces.
set chassis cluster reth-count 2 set interfaces ge-0/0/1 gigether-options redundant-parent reth1 set interfaces ge-3/0/1 gigether-options redundant-parent reth1 set interfaces ge-0/0/2 gigether-options redundant-parent reth0 set interfaces ge-3/0/2 gigether-options redundant-parent reth0
8. Similarly with Aggregated Ethernet interfaces on EX or MX Series, you will do the entire configuration for the reth under the logical interface. You need to define the interfaces redundancy group. As redundancy group 0 is control panel for this example both reth interfaces will be in redundancy group 1.
set interfaces reth0 vlan-tagging set interfaces reth0 redundant-ether-options redundancy-group 1 set interfaces reth0 unit 10 description Untrust set interfaces reth0 unit 10 vlan-id 10 set interfaces reth0 unit 10 family inet address 172.16.0.1/24 set interfaces reth1 vlan-tagging set interfaces reth1 redundant-ether-options redundancy-group 1 set interfaces reth1 unit 20 description trust set interfaces reth1 unit 20 vlan-id 20 set interfaces reth1 unit 20 family inet address 192.168.0.1/24
To ensure that end-to-end connectivity was as expected, I had created these security zones and security policies, to get the communication between the two reth interface. The zones and policies are very vanilla, as I just need to be able to ping across.
root> ping 172.16.0.2 routing-instance trust --- 172.16.0.2 ping statistics --- 31 packets transmitted, 31 packets received, 0% packet loss round-trip min/avg/max/stddev = 1.851/1.964/2.273/0.105 ms root> ping 192.168.0.2 routing-instance untrust --- 192.168.0.2 ping statistics --- 30 packets transmitted, 30 packets received, 0% packet loss round-trip min/avg/max/stddev = 1.842/1.971/2.675/0.163 ms
And from the firewall, I was able to see the pings going across as flow sessions
[email protected]_SRX220_Top> show security flow session node0: -------------------------------------------------------------------------- Session ID: 621, Policy name: ping/5, State: Active, Timeout: 2, Valid In: 192.168.0.2/7 --> 172.16.0.2/6279;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Out: 172.16.0.2/6279 --> 192.168.0.2/7;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Session ID: 622, Policy name: ping/4, State: Active, Timeout: 2, Valid In: 172.16.0.2/9 --> 192.168.0.2/6277;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Out: 192.168.0.2/6277 --> 172.16.0.2/9;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Session ID: 623, Policy name: ping/5, State: Active, Timeout: 2, Valid In: 192.168.0.2/8 --> 172.16.0.2/6279;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Out: 172.16.0.2/6279 --> 192.168.0.2/8;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Session ID: 624, Policy name: ping/4, State: Active, Timeout: 2, Valid In: 172.16.0.2/10 --> 192.168.0.2/6277;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Out: 192.168.0.2/6277 --> 172.16.0.2/10;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Session ID: 625, Policy name: ping/5, State: Active, Timeout: 4, Valid In: 192.168.0.2/9 --> 172.16.0.2/6279;icmp, If: reth1.20, Pkts: 1, Bytes: 84 Out: 172.16.0.2/6279 --> 192.168.0.2/9;icmp, If: reth0.10, Pkts: 1, Bytes: 84 Total sessions: 5 node1: -------------------------------------------------------------------------- Total sessions: 0
Having now got the cluster up and working, it was time to get to some proper failover testing! In my next post will note how that went as this post is pretty long now haha
Keeran Marquis
Latest posts by Keeran Marquis (see all)
- Life and Times of an Unemployed Professional Speed Dater #3 - August 5, 2018
- Life and Times of an Unemployed Professional Speed Dater #2 - August 5, 2018
- Life and Times of an Unemployed Professional Speed Dater #1 - August 5, 2018
Dear Keeran
What if we use two downstream links from the SRX to EX for each vlan. Should we also aggregate the links on the EX switch or just use simple trunks and an upstream cluster will aggregate them by itself. Please post yourcpnfiguration in that case also.
Waiting eagerly for your kind reply.
Regards.
Hi ausafali88
In my example I have 2 links to each member which are downstream to the switch. If you wanted to have run vlans down from the SRX to EX, then you would need to enable vlan-tagging on the reth interface and then create the sub-interfaces accordingly, then on the switch side i wouldnt see the value of not having the links as aggregated personally, however you could have them have 2 seperate trunk links but it does depend on your environment. I hope that answer your question π
Cheers
Keeran
Can the SRX reth interfaces be connected to Cisco switches instead? What will the configuration needed on Cisco switch?
Hi Kenneth
although i havent tried it myself i dont see why it should be an issue, from the srx ill assume you are running vlans down? if so you would just need to have the those ports configured as trunks and set vlan-tagging on the reth interface in question.
Hope that helps
Cheers
Keeran
No problem with Cisco switch. Now I am trying to trunk the vlans over to another srx100, I am following your configuration of EX4200. But it’s not working, is there any additional settings needed on srx100 instead of EX4200?
Do you have the SRX still in flow mode or have you enabled it into packet mode? As this could one of the issues as it will still be stateful device. If you’re looking to use as end hosts like i was then you should use enable packet mode. Once in packet-mode it will be a stateless router then the configure i was using should work
Keeran
Hi Keeran,
great post. thanks for sharing.
how the Fw identifiy ge-0/0/5 & ge-3/0/5 are control link connections ? is there any config needed for ge-0/0/5 & ge-3/0/5 (control link) ?
Hi Keeran
excellent work I must say, very well detaiiled !
I have a question for you of you don’t mind :
You put all physical interfaces into the redundancy-group 1.
If node0 is primary and 0/0/1 goes down, node1 will get primary. Fine. Trafic can go through , using path : 3/0/1 – node1 – 3/0/2
Now, what happens If node0 is primary and 3/0/1 goes down ?
1/ Node1 will get primary. Trafic can go through , using path : 1/0/1 – node0 – node1 – 3/0/2 ?
2/ Or the SRX is ‘smart enough’ not to switchover, because the new primary node (node1) would have a port down, and then this would lead to extra inter-SRX traffic (as shwon in the path in the option1 above) ?
Indeed I have a issue with the redundancy on my SRX’s and need to understand how this works.
Thanks!
Fabien
Pingback: EDU-JUN-JSEC-12.A: LAB 8: IMPLEMENTING HIGH AVAILABILITY TECHNIQUES | calmdownpony
Hey Keeran,
Just thought I should drop a line and sya what a good job youve done. I run into some troubles on SRX HA and did a google search came across your blog and found all the answers i needed. You saved the day. An unsung hero . Thank you bro. Keep up the good work!!
Br,
Isaac
cheers! I’m happy to help π