At work we were looking to do a firmware upgrade of our junos going from 12.3 to 13.2X and we got a few VC switches. The plan was to use the NSSU method so that we didn’t get any downtime however, when doing testing I would kick off the NSSU and the backup member would upgrade, reboot and come up as expected:
{master:0} [email protected]> ...p/jinstall-ex-4200-13.2X51-D35.3-domestic-signed.tgz Chassis ISSU Check Done [Dec 18 04:32:13]:ISSU: Validating Image [Dec 18 04:32:41]:ISSU: Preparing Backup RE [Dec 18 04:32:42]: Installing image on other FPC's along with the backup [Dec 18 04:32:42]: Checking pending install on fpc1 [Dec 18 04:33:41]: Pushing bundle to fpc1 NOTICE: Validating configuration against mchassis-install.tgz. NOTICE: Use the 'no-validate' option to skip this if desired. WARNING: A reboot is required to install the software WARNING: Use the 'request system reboot' command immediately [Dec 18 04:34:42]: Completed install on fpc1 [Dec 18 04:34:53]: Backup upgrade done [Dec 18 04:34:53]: Rebooting Backup RE Rebooting fpc1 [Dec 18 04:34:54]:ISSU: Backup RE Prepare Done [Dec 18 04:34:54]: Waiting for Backup RE reboot
After an hour of looking at this on the master, I consoled into the backup to see what had booted and was up, and I clearly had an issue. I aborted the NSSU and checked to see what was going; the backup member had upgraded and had connected with the master:
{master:0} [email protected]> show version fpc0: -------------------------------------------------------------------------- Hostname: EX4200-A Model: ex4200-48t JUNOS Base OS boot [12.3R5.7] JUNOS Base OS Software Suite [12.3R5.7] JUNOS Kernel Software Suite [12.3R5.7] JUNOS Crypto Software Suite [12.3R5.7] JUNOS Online Documentation [12.3R5.7] JUNOS Enterprise Software Suite [12.3R5.7] JUNOS Packet Forwarding Engine Enterprise Software Suite [12.3R5.7] JUNOS Routing Software Suite [12.3R5.7] JUNOS Web Management [12.3R5.7] JUNOS FIPS mode utilities [12.3R5.7] fpc1: -------------------------------------------------------------------------- Hostname: EX4200-A Model: ex4200-48t JUNOS EX Software Suite [13.2X51-D35.3] JUNOS FIPS mode utilities [13.2X51-D35.3] JUNOS Online Documentation [13.2X51-D35.3] JUNOS EX 4200 Software Suite [13.2X51-D35.3] JUNOS Web Management [13.2X51-D35.3]
I thought this was very odd so I checked the logs to see if anything was out of the norm and saw that VCP ports had come up however, the attempts to backup member had timed out :/
It was Friday and I had a planned upgrade for the following week, so I didn’t have the time to raise a JTAC case (which I should have probably done but that could come later). With this in mind I thought I should be able to manually failover the Routing-Engines and upgrade each member the same way without all of the magic of the NSSU:
Soooooooo this is what this post will be about, the success or failure of manually failing over a VC with minimal downtime 🙂
Let’s get cracking!
I was using 2x EX4200 with JUNOS 12.3R5.7; it’s the same setup I had in my previous Virtual Chassis post. I used the preprovisioned method of stacking the switches, and had the following VC specific configuration applied:
[email protected]# show routing-options nonstop-routing; static { route 0.0.0.0/0 { next-hop 10.1.0.1; no-readvertise; } }
[email protected]# show chassis redundancy { graceful-switchover; }
[email protected]# show virtual-chassis preprovisioned; no-split-detection; member 0 { role routing-engine; serial-number BP0214340104; } member 1 { role routing-engine; serial-number BP0215090120; } fast-failover { ge; xe; }
It’s important to make sure you have nonstop-routing, graceful-switchover and no-split-detection configured without these or you will most likely get a split brain affect and that’s not a good thing!
I’ve got a VM connected to both switches in LACP bond configured
[email protected]> show lldp neighbors Local Interface Parent Interface Chassis Id Port info System Name ge-0/0/2.0 ae1.0 00:0c:29:4f:26:bb eth1 km-vm1 ge-1/0/2.0 ae1.0 00:0c:29:4f:26:bb eth2 km-vm1
and I have the VM pinging it default gateway (192.31.1.1), which is the l3-interface on the switch
[email protected]:~$ route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.31.1.1 0.0.0.0 UG 0 0 0 bond0 10.1.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 192.31.1.0 0.0.0.0 255.255.255.0 U 0 0 0 bond0
Now everything is sorted, let’s try some stuff!
As the VM is dual connected to both members, I’ll shutdown the interfaces and the VCP ports of backup switch, upgrade it and then do the same on the master switch. In essence, I’ll be breaking the VC to upgrade each switch individually. I’ll be running a continuous ping from the VM switch and will be able see if any packets are dropped during this work.
I start with the backup member. I have to disable the data and break the virtual chassis by disabling the VCP ports. I had to copy over the junos package from member 0 to member 1, as I’d have no access to member 0 once the virtual chassis had been broken.
[email protected]> file copy /tmp/jinstall-ex-4200-13.2X51-D35.3-domestic-signed.tgz fpc1:/tmp/
This will copy the package from the member 0 to member 1. Confirmed by entering the shell cli and checking the /tmp folder on member 1
{backup:1} [email protected]> start shell [email protected]:BK:1% cd /tmp/ [email protected]:BK:1% ls -la total 234744 drwxrwxrwt 3 root wheel 512 Dec 18 14:57 . drwxr-xr-x 23 root wheel 512 Dec 18 04:06 .. -rw-r--r-- 1 root wheel 92 Dec 18 12:13 .clnpkg.LCK -rw-r--r-- 1 root wheel 92 Dec 18 12:13 .pkg.LCK drwxrwxr-x 2 root operator 512 Dec 18 12:10 .snap -rw-r--r-- 1 root wheel 120120669 Dec 18 14:58 jinstall-ex-4200-13.2X51-D35.3-domestic-signed.tgz -rw-r--r-- 1 root wheel 393 Dec 18 12:10 partitions.spec [email protected]:BK:1% exit
Next disable the member 1 port, in my case ge-1/0/2, deactivate interfaces ge-1/0/2
[email protected]# run show interfaces ge-1/0/2 Physical interface: ge-1/0/2, Administratively down, Physical link is Down
The server dropped 3 packets, which is acceptable to most; so far so good. Next I disabled the VCP on the member 1 and member 0 and then console onto member 1.
[email protected]> request virtual-chassis vc-port set interface vcp-0 member 1 disable [email protected]> request virtual-chassis vc-port set interface vcp-1 member 1 disable [email protected]> request virtual-chassis vc-port set interface vcp-0 disable [email protected]> request virtual-chassis vc-port set interface vcp-1 disable
On member 1, it automatically took mastership and doesn’t member 0 anymore
{master:1} [email protected]> show virtual-chassis status Preprovisioned Virtual Chassis Virtual Chassis ID: e8a9.d27b.0f05 Virtual Chassis Mode: Enabled Mstr Mixed Neighbor List Member ID Status Serial No Model prio Role Mode ID Interface 0 (FPC 0) NotPrsnt BP0214340104 ex4200-48t 1 (FPC 1) Prsnt BP0215090120 ex4200-48t 129 Master* N
The server is still pinging along, so now we can upgrade the backup member as if it was a standalone device. We’ll run request system software add /tmp/jinstall-ex-4200-13.2X51-D35.3-domestic-signed.tgz reboot validate reboot
Once member 1 rebooted I had to wait for a bit as it was looking for the master (due to the preprovisioned config) and it initial booted as a linecard however, it changed back to master after I entered the operational mode.
Next I enabled the member 1 port, activate interfaces ge-1/0/2
To double check and confirm it was up, I checked the lldp neighbor
[email protected]> show interfaces ge-1/0/2 Physical interface: ge-1/0/2, Enabled, Physical link is Up
{master:1} [email protected]> show lldp neighbors Local Interface Parent Interface Chassis Id Port info System Name ge-1/0/2.0 ae1.0 00:0c:29:4f:26:bb eth2 km-vm1
Now disable the member 0 port, in my case ge-0/0/2, deactivate interfaces ge-0/0/2
The Server had dropped 47 packets after the interface was disabled. This was most likely due to the convergence time for the LACP bond and the port going down, and this is shown in the log messages
With the server passing traffic over member 1, I could upgrade member 0 which was the same as before request system software add /tmp/jinstall-ex-4200-13.2X51-D35.3-domestic-signed.tgz reboot validate reboot
Same as member 1, it came back up after its reboot but the switch took an age to find the master and just as long to commit the activation of interface ge-0/0/2! Extreme Patience’s Needed!
Confirmation of the link is up and I have lldp neighbor
{master:0} [email protected]> show lldp neighbors Local Interface Parent Interface Chassis Id Port info System Name ge-0/0/2.0 ae1.0 00:0c:29:4f:26:bb eth1 km-vm1
{master:0} [email protected]> show interfaces ge-0/0/2 Physical interface: ge-0/0/2, Enabled, Physical link is Up
Having both members are on the same code as expected:
{master:0} [email protected]> show version fpc0: -------------------------------------------------------------------------- Hostname: EX4200-A Model: ex4200-48t JUNOS EX Software Suite [13.2X51-D35.3] JUNOS FIPS mode utilities [13.2X51-D35.3] JUNOS Online Documentation [13.2X51-D35.3] JUNOS EX 4200 Software Suite [13.2X51-D35.3] JUNOS Web Management [13.2X51-D35.3] {master:1} [email protected]> show version fpc1: -------------------------------------------------------------------------- Hostname: EX4200-A Model: ex4200-48t JUNOS EX Software Suite [13.2X51-D35.3] JUNOS FIPS mode utilities [13.2X51-D35.3] JUNOS Online Documentation [13.2X51-D35.3] JUNOS EX 4200 Software Suite [13.2X51-D35.3] JUNOS Web Management [13.2X51-D35.3]
To get them joined together into the virtual chassis I enabled the VCP ports on member 0 and hoped this would bring them back together with no issues (He says!!!)
{master:0} [email protected]> request virtual-chassis vc-port set interface vcp-0 {master:0} [email protected]> request virtual-chassis vc-port set interface vcp-1
To finish off, I ran the command request system snapshot slice alternate all-members to make sure the backup partition image was consistent with the primary
And finally everything is complete! I confirmed the virtual-chassis, firmware version, lldp neighbors and Upgraded the Backup Partition! Never forget to do this!
[email protected]> show virtual-chassis Preprovisioned Virtual Chassis Virtual Chassis ID: e8a9.d27b.0f05 Virtual Chassis Mode: Enabled Mstr Mixed Route Neighbor List Member ID Status Serial No Model prio Role Mode Mode ID Interface 0 (FPC 0) Prsnt BP0214340104 ex4200-48t 129 Master* N VC 1 vcp-0 1 vcp-1 1 (FPC 1) Prsnt BP0215090120 ex4200-48t 129 Backup N VC 0 vcp-0 0 vcp-1
[email protected]> show version fpc0: -------------------------------------------------------------------------- Hostname: EX4200-A Model: ex4200-48t JUNOS EX Software Suite [13.2X51-D35.3] JUNOS FIPS mode utilities [13.2X51-D35.3] JUNOS Online Documentation [13.2X51-D35.3] JUNOS EX 4200 Software Suite [13.2X51-D35.3] JUNOS Web Management [13.2X51-D35.3] fpc1: -------------------------------------------------------------------------- Hostname: EX4200-A Model: ex4200-48t JUNOS EX Software Suite [13.2X51-D35.3] JUNOS FIPS mode utilities [13.2X51-D35.3] JUNOS Online Documentation [13.2X51-D35.3] JUNOS EX 4200 Software Suite [13.2X51-D35.3] JUNOS Web Management [13.2X51-D35.3]
[email protected]> show lldp neighbors Local Interface Parent Interface Chassis Id Port info System Name ge-0/0/2.0 ae1.0 00:0c:29:4f:26:bb eth1 km-vm1 ge-1/0/2.0 ae1.0 00:0c:29:4f:26:bb eth2 km-vm1
[email protected]> request system snapshot slice alternate fpc0: -------------------------------------------------------------------------- Formatting alternate root (/dev/da0s1a)... Copying '/dev/da0s2a' to '/dev/da0s1a' .. (this may take a few minutes) The following filesystems were archived: / fpc1: -------------------------------------------------------------------------- Formatting alternate root (/dev/da0s2a)... Copying '/dev/da0s1a' to '/dev/da0s2a' .. (this may take a few minutes) The following filesystems were archived: /
From the running pings:
--- 192.31.1.1 ping statistics --- 9365 packets transmitted, 9234 received, +42 errors, 1% packet loss, time 9377278ms rtt min/avg/max/mdev = 0.771/1.162/11.807/0.370 ms, pipe 3 [email protected]:~$
There was 1% packet over the whole time of the test (156 minutes), working out as a 93.77 second outage which isn't too bad. Considering this was the first time I tried this method I’ll be going over it again because it took far too long, but overall this method works!
I also messed about with the different types of bonding methods available:
With the round-robin or bond-type 0, the switch was configured as two access ports and I saw high packet loss during the testing.
--- 192.31.1.1 ping statistics --- 6106 packets transmitted, 3125 received, 48% packet loss, time 6128448ms rtt min/avg/max/mdev = 0.814/1.484/902.641/16.131 ms
This was due to the nature of the round-robin bonding method.
With the active-backup or bond-type 1, the switch was configured as two access ports and I saw no packet loss during the testing. A sight difference when using active-backup (as expected to be honest) when you check the lldp neighbors is that you’ll only see one interface up at a time.
This is due to the nature of the bond-type
--- 192.31.1.1 ping statistics --- 2905 packets transmitted, 2892 received, 0% packet loss, time 2908023ms rtt min/avg/max/mdev = 0.846/1.214/20.269/0.758 ms
[email protected]> show lldp neighbors Local Interface Parent Interface Chassis Id Port info System Name ge-0/0/2.0 - 00:0c:29:4f:26:bb eth1 km-vm1 vme.0 - 00:19:06:cd:8f:80 GigabitEthernet1/0/36 oob-sw0-10.lab xe-0/1/0.0 ae0.0 78:fe:3d:46:2a:c0 xe-0/0/2.0 EX4500
Having got a method that worked, the tabs below show some of the methods I tried and failed on. Looking back on some of the methods, the two methods I used were never going to work however, this is why you have a lab and it’s always good to see things for yourself to see if you can troubleshoot your way out! With all that being said I’ve actually picked up a few things I didn’t know, so this was a good exercise!
[email protected]> request system software rollback member 1 reboot fpc1: -------------------------------------------------------------------------- Junos version '12.3R5.7' will become active at next reboot Rebooting ... shutdown: [pid 1280] Shutdown NOW!
Then once member 1 has rebooted, I checked to make sure it is present into the virtual chassis
[email protected]> show virtual-chassis Preprovisioned Virtual Chassis Virtual Chassis ID: e8a9.d27b.0f05 Virtual Chassis Mode: Enabled Mstr Mixed Neighbor List Member ID Status Serial No Model prio Role Mode ID Interface 0 (FPC 0) Prsnt BP0214340104 ex4200-48t 129 Master* N 1 vcp-0 1 vcp-1 1 (FPC 1) Prsnt BP0215090120 ex4200-48t 129 Backup N 0 vcp-0 0 vcp-1
Keeran Marquis
Latest posts by Keeran Marquis (see all)
- Life and Times of an Unemployed Professional Speed Dater #3 - August 5, 2018
- Life and Times of an Unemployed Professional Speed Dater #2 - August 5, 2018
- Life and Times of an Unemployed Professional Speed Dater #1 - August 5, 2018
Hi, I read you post, maybe you know about this message ” Warning: configuration block ignored: unsupported platform (ex4550-32f)” , I configure two three S&W juniper for Virtual Chassis with this version of JunOS : 13.2X51-D35.3.. but when I configure the LACP, LAGG, it is shown on the interfaces… and I dont know,, i will appreciate your comment, Many thanks.
Hi Jhon
Im not 100% what you mean by S&W unfortunately. I hadn’t seen that error myself before but ive had a quick look on the juniper KB page and found this topic, that maybe able to help you with your issue: http://kb.juniper.net/InfoCenter/index?page=content&id=KB23421&actp=search
Whats the configuration on your bonded interface? Have you check to make sure its as it should be?