SUMMARY:

This article describes the issue of an EX Switch booting from the backup root partition, after a file corruption occurs on the primary root partition.

PROBLEM OR GOAL:

EX switches running Junos Release 10.4R3, or later, have added resiliency based on the “resilient dual-root partition”, which if the switch detects a corruption on the primary root file system, it boots from the alternate root partition.

When this occurs, you are notified in two ways: Alarm and Warning Banner

Alarm:

The following alarm message is generated:

user@switch> show chassis alarms
1 alarms currently active
Alarm time Class Description
2011-02-17 05:48:49 PST Minor Host 0 Boot from backup root


Warning:

****************************************************************************************
** **
** WARNING: THIS DEVICE HAS BOOTED FROM THE BACKUP JUNOS IMAGE **
** **
** It is possible that the primary copy of JUNOS failed to boot up **
** properly, and so this device has booted from the backup copy. **
** **
** Please re-install JUNOS to recover the primary copy in case **
** it has been corrupted. **
** **
****************************************************************************************

CAUSE:

It is likely that the file system became corrupted due to a sudden power loss, or ungraceful shutdown of the EX Switch.

SOLUTION:

Repairing the primary partition when it is corrupted:

  • When the primary partition detects a corrupt, the device boots from the backup partition; which then becomes the active partition. Remember that after every successive reboot, the switch will try to reboot from the current active partition.
  • You can repair the primary partition, by using request system snapshot media internal slice alternate without any downtime. No reboot is required after running this command.  However the Alarm and Banner will be displayed.

Note: As long as both of the partitions are healthy, there is no issue with running the switch on either of them. You only have to ensure that both the partitions are healthy, so that fail over can be done transparently between the two partitions, in case of any file corruption.
Verification:

To verify if the primary partition is rebuilt, run one of the following show commands. The same commands also inform about which partition is the current active partition.

show system storage partitions

Sample output:

root> show system storage partitions
fpc0:
--------------------------------------------------------------------------
Boot Media: internal (da0)
Active Partition: da0s1a
Backup Partition: da0s2a <-- this is the backup slice
Currently booted from: backup (da0s2a) <-- shows booted from that slice

Partitions information:
Partition Size Mountpoint
s1a 184M altroot
s2a 184M /
s3d 369M /var/tmp
s3e 123M /var
s4d 62M /config
s4e unused (backup config)

OR

show system snapshot media internal

Sample output:

root> show system snapshot media internal
Information for snapshot on internal (/dev/da0s1a) (primary)
Creation date: Feb 24 11:32:07 2012
JUNOS version on snapshot:
jbase : 10.4I20120224_1123_bshekar
jcrypto-ex: 10.4I20120224_1123_bshekar
jdocs-ex: 10.4I20120224_1123_bshekar
jkernel-ex: 10.4I20120224_1123_bshekar
jroute-ex: 10.4I20120224_1123_bshekar
jswitch-ex: 10.4I20120224_1123_bshekar
jweb-ex: 10.4I20120224_1123_bshekar
jpfe-ex42x: 10.4I20120224_1123_bshekar
Information for snapshot on internal (/dev/da0s2a) (backup) <-- provides info for this slice/partition the switch booted off of and the date the file system was created
Creation date: Feb 14 05:42:42 2012    <-- if less than alarm date then customer should snapshot (it is a good way to confirm
JUNOS version on snapshot:
jbase : 11.2-20120214.0
jcrypto-ex: 11.2-20120214.0
jdocs-ex: 11.2-20120214.0
jkernel-ex: 11.2-20120214.0
jroute-ex: 11.2-20120214.0
jswitch-ex: 11.2-20120214.0
jweb-ex: 11.2-20120214.0
jpfe-ex42x: 11.2-20120214.0

To go back to the Primary partition, you can use the request system reboot slice alternate media internal command. If you do not use this command, the switch will then boot from the backup partition, which is the current Active partition, on successive reboots.

The switch will automatically reboot from the primary partition, which is now the active partition, only when the backup partition gets corrupted. When a primary partition gets corrupted, you will receive the alarm as mentioned above.

Note: This alarm does not get cleared, even if you repair the primary partition. The purpose of this alarm is to inform the users that the device is rebooted from the backup partition, so tthat he administration should take necessary actions to repair the primary partition.

Step-by-step recovery procedure for this situation:

  1. Copy the Junos image from the backup partition to the primary partition, by using the following snapshotcommand:

    request system snapshot media internal slice alternate

    Note: This step ensures that you have consistent images on both the primary and backup partitions.

  2. The above command ensures that the alternate partition is repaired, without requiring a reboot. You can verify both the partitions by using the following command:

    show system storage partitions

  3. The command used in step 1 will only repair the partition and not clear the alarm. So, you will still see the following alarm:

    root> show system alarms
    2 alarms currently active
    Alarm time Class Description
    2012-03-02 13:01:03 UTC Minor Host 0 Boot from backup root <-- shows date stamp of alarm

  4. To get rid of the above alarm, use the following command to ensure that the switch boots from the primary partition:

request system reboot slice alternate media internal

The system, after the above command is executed, will reboot from the primary partition. The alarm or the warning message will no longer be displayed.      5.    The following commands are issued to verify the Junos image installed on each slice:

user@switch>show system snapshot media internal slice 1
user@switch>show system snapshot media internal slice 2