This is the scenario, it’s 2am on a saturday morning and you get that dreaded call from your NOC saying that there has been a power outage at one of the DCs. Luckily, its only a single rack with an top of rack EX4200 that you’re worried about and you’re company has remote hands, so you don’t have to leave your house! But you need to hop online to check that everything is ok. You connect to your EX4200 by your terminal server via the Out-of-Band network. You are greeted by this error message:

--- JUNOS 12.3R5.7 built 2013-12-18 01:32:43 UTC

***********************************************************************
**                                                                   **
**  WARNING: THIS DEVICE HAS BOOTED FROM THE BACKUP JUNOS IMAGE      **
**                                                                   **
**  It is possible that the primary copy of JUNOS failed to boot up  **
**  properly, and so this device has booted from the backup copy.    **
**                                                                   **
**  Please re-install JUNOS to recover the primary copy in case      **
**  it has been corrupted and if auto-snapshot feature is not        **
**  enabled.                                                         **
**                                                                   **
***********************************************************************

Oh lawd, you think this is going to be a big issue. Luckily, it’s not as big of an issue as you think (If your backup junos image is the same as your primary image, if its not, well i don’t have a clue what you could do! But i will look into this and write a post haha). You are actually able correct this issue quite easily but it will require a reboot.

This is normally caused by a non-grateful shutdown (see what I did there with the scenario :P), which could have caused corruption of the partition, but really there could be a number of reasons, that I wont list!

This page will show how you can fix your primary partition, and after you reboot, how you can check you have booted off the correct partition.

First let’s check what partitions we have:

keeran@lab-EX4200> show system storage partitions  
fpc0:
--------------------------------------------------------------------------
Boot Media: internal (da0)
Active Partition: da0s1a
Backup Partition: da0s2a
Currently booted from: backup (da0s2a)

Partitions information:
  Partition  Size   Mountpoint
  s1a        183M   altroot   
  s2a        183M   /         
  s3d        369M   /var/tmp  
  s3e        123M   /var      
  s4d        62M    /config   
  s4e               unused (backup config)

As we can see, the switch is running off the backup partition. Let’s get this sorted now:

  • We need to ensure that both the primary and backup partitions are consistent with each other. In turn fixing the corrupted partition.

request system snapshot media slice alternate

  • Now the primary partition is sorted, we will need to ensure that, we are running off the primary partition. By using the command below, after the reboot the switch will boot off the primary partition and will clear the system alarms.

request system reboot slice alternate media internal

This could be an out-of-hours change, as you can run off your backup partition without an issue but it’s not ideal

  • Having rebooted, we need to verify that everything is ok
keeran@lab-EX4200> show system storage partitions 
fpc0:
--------------------------------------------------------------------------
Boot Media: internal (da0)
Active Partition: da0s1a
Backup Partition: da0s2a
Currently booted from: active (da0s1a)

Partitions information:
  Partition  Size   Mountpoint
  s1a        183M   /         
  s2a        183M   altroot   
  s3d        369M   /var/tmp  
  s3e        123M   /var      
  s4d        62M    /config   
  s4e               unused (backup config)

Boom sorted :D

This is why it is VERY IMPORTANT to make sure when you do a Junos upgrade that you update your backup image as well.

UPDATE 20/4/2015

Went into the lab today and when I consoled onto a spare EX4200 I was greeted with:

\--- JUNOS 11.4R1.6 built 2011-11-15 11:14:01 UTC

***********************************************************************
**                                                                   **
**  WARNING: THIS DEVICE HAS BOOTED FROM THE BACKUP JUNOS IMAGE      **
**                                                                   **
**  It is possible that the primary copy of JUNOS failed to boot up  **
**  properly, and so this device has booted from the backup copy.    **
**                                                                   **
**  Please re-install JUNOS to recover the primary copy in case      **
**  it has been corrupted.                                           **
**                                                                   **
***********************************************************************

I did snapshot check and saw that the backup image wasn’t upgraded

root@EX4200> show system snapshot media internal 
Information for snapshot on       internal (/dev/da0s1a) (backup)
Creation date: Nov 15 13:40:51 2011
JUNOS version on snapshot:
  jbase  : 11.4R1.6
  jcrypto-ex: 11.4R1.6
  jdocs-ex: 11.4R1.6
  jkernel-ex: 11.4R1.6
  jroute-ex: 11.4R1.6
  jswitch-ex: 11.4R1.6
  jweb-ex: 11.4R1.6
  jpfe-ex42x: 11.4R1.6
Information for snapshot on       internal (/dev/da0s2a) (primary)
Creation date: Dec 18 04:06:12 2013
JUNOS version on snapshot:
  jbase  : ex-12.3R5.7
  jcrypto-ex: 12.3R5.7
  jdocs-ex: 12.3R5.7
  jkernel-ex: 12.3R5.7
  jroute-ex: 12.3R5.7
  jswitch-ex: 12.3R5.7
  jweb-ex: 12.3R5.7

I started to mess about to see if there was any way of fixing the primary without essentially doing a system downgrade (as the image was an older one). To cut a long story short, I couldn’t :(

Neither through reading juniper documentation, googling and random troubleshooting as I hoped for the best!

So always keep in mind, when you’re doing a firmware upgrade, make sure you upgrade the backup image or you will have to downgrade then re-upgrade. This will be a major issue to explain to the business, during an unexpected outage.

Share on LinkedIn
Share on Reddit