In my last post I promised to write some more on the details of patching, using patch set update 4 (i.e. Januari 2012 patchset update 13113092) as an example. So let’s get started on patching the infrastructure, by looking at updates for the Infiniband Gateway switches.
I will demonstrate that these switches can be upgraded in a rolling fashion, without interrupting the network services (except for a few seconds) and keeping the Exalogic online while doing so!
First thing to note is that patching of the infrastructure is done under user root, not weblogic. After unzipping the el_infrastructure_10022.zip file (see my previous post on patching) we find the following:
[root@xxxxcn1 ~]# cd /u01/common/patches/todo/13113092/Infrastructure
/184.108.40.206.2/ [root@xxxxcn1 220.127.116.11.2]$ ls BaseImage NM2-36p NM2-GW one-command README.html README.txt
First thing to do when starting this is some careful preparation: by thoroughly checking the provided README.html file and also checking for additional information provided on My Oracle Support (MOS) like upgrade advisors, i.e. the “Exalogic January 2012 PSU Infrastructure Upgrade Guide [ID 1392684.1]“. Be sure to also check the “known issues” document for your PSU.
Then we do some version checking to see whether we need to apply a component update or not, since the patchset is cumulative it is possible that some of the updates have already been applied earlier.
The README.html file for the infrastructure part says:
If you are running either v18.104.22.168.0 or v22.214.171.124.0 of Exalogic Infrastructure, you must apply all the infrastructure patches/upgrades included in this PSU in the following order:
a. InfiniBand Gateway Switch (NM2-GW)
b. InfiniBand Switch 36 ( NM2-36p )
c. ZFS Storage Appliance (ZFS_Storage_7320)
ii. ILOM on the storage head
d. Base Image v126.96.36.199.2 (rolling update, node at a time)
2. Exalogic Configuration Utility (ECU, previously called one-command)
Summarizing, the order of patching is as follows: first the network switches, then the storage appliance, then the OS on the compute nodes. Since we have a quarter rack configuration, there is no MM2-36p switch installed so we don’t have to update it. We only have to update the two NM2-GW switches in our rack.
Now, we first check the current software versions for the IB gateway switches. The README says the following:
This section contains instructions on upgrading NM2-GW InfiniBand Gateway switches in an Exalogic rack from version 1.1.2-3 (factory default on Exalogic X2-2 racks shipped with either v188.8.131.52.0 or v184.108.40.206.0 of the Exalogic Base Image) to version 1.3.2-1.
After logging in as root, we can use the version command to check the software version:
[root@xxxxgw1 ~]# version SUN DCS gw version: 1.3.2-1 Build time: Feb 17 2011 10:02:40 FPGA version: 0x33 SP board info: Manufacturing Date: 2010.12.30 Serial Number: "NCD600077" Hardware Revision: 0x0006 Firmware Revision: 0x0000 BIOS version: SUN0R100 BIOS date: 06/22/2010
[root@xxxxgw2 ~]# version SUN DCS gw version: 1.3.2-1 Build time: Feb 17 2011 10:02:40 FPGA version: 0x33 SP board info: Manufacturing Date: 2010.12.31 Serial Number: "NCD600233" Hardware Revision: 0x0006 Firmware Revision: 0x0000 BIOS version: SUN0R100 BIOS date: 06/22/2010
As it turns out, this particular patchset update is not very suited for demonstration of updates for the Infiniband Gateway switches in our case, as we already arrived at the required patchlevel (1.3.2-1) by doing the october 2011 patchset 12825625. Instead, I will therefore take the upgrade to version 220.127.116.11.0 (patch 13795376) as an example here. For this update, the Infiniband Gateway switches have to be upgraded to SUN DCS version 2.0.4-1.
First we have to do a number of prerequisite checks, which I will not mention here (but which are important to best ensure the update goes through flawlessy). Then we perform the upgrade of the two gateway switches in a rolling fashion, so we don’t interrupt network services and users and applications kan keep working. We do this by first upgrading the switch that is not the active master switch. Let’s find out which of the two has this role:
[root@xxxxgw1 ~]# getmaster Local SM enabled and running 20120117 10:03:08 Master SubnetManager on sm lid 27 sm guid 0x2128be561ac0a0
: SUN IB QDR GW switch xxxxgw2
[root@xxxxgw2 ~]# getmaster Local SM enabled and running 20120117 10:03:20 Master SubnetManager on sm lid 27 sm guid 0x2128be561ac0a0
: SUN IB QDR GW switch xxxxgw2
OK, gateway number 2 (GW02) is the master switch at present. That means we should upgrade the GW01 switch first, have them switch roles and then upgrade GW02 to finish up.
The README for the 18.104.22.168.0 upgrade states the following (very similar to the README for the jan 2012 PSU, but a little more elaborate). The patch file is loaded via FTP from the Exalogic storage, where we have set up an ftp user called patcher for this in advance.
To upgrade the secondary NM2-GW switches, complete the following steps:
1. Switch to the ILOM shell by running the spsh command on the command line:
2. Ensure that you have created the patches share in the ZFS storage appliance, and
enabled the FTP service on the share with the permission for root access, as described in the top-level README file, which is included in the upgrade kit.
Load the firmware upgrade package using the command:
-> load -source ftp://root:<root_password>@<storage_host>//<path_to_NM2-GW_fw_ upgrade_binaries_on_patches_share>/sundcs_gw_repository_2.0.4_1.pkg
OK, easy enough, let’s do that:
[root@xxxxgw1 ~]# spsh Oracle(R) Integrated Lights Out Manager Version ILOM 3.0 r47111 Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved. -> load -source ftp://patcher@xxxxsn-priv//export/common/patches/todo/
13795376/Infrastructure/22.214.171.124.0/NM2-GW/2.0.4-1/sundcs_gw_repository_2.0.4_1.pkg Error: URL should specify IP Address. Hostname is not supported. Firmware image update failed. load: Command Failed
Hmm, I guess we should use the IP address of the storage instead of it’s name. Also I found that we need to supply the password directly, so we try again, and then it goes through:
-> load -source ftp://patcher:mypassword@<ZFS storage VIP address>//export/
sundcs_gw_repository_2.0.4_1.pkg Downloading firmware image. This will take few minutes. NOTE: Firmware upgrade will upgrade firmware on SUN DCS gw Kontron module, I4 and BridgeX. Upgrade takes few minutes to complete. ILOM will enter a special mode to load new firmware. No other tasks should be performed in ILOM until the firmware upgrade is complete. Are you sure you want to load the specified file (y/n)? y Setting up environment for firmware upgrade. This will take few minutes. Starting SUN DCS gw FW update ========================== Performing operation: I4 A ========================== I4 fw upgrade from 7.3.0(INI:1) to 7.4.0(INI:1): Upgrade started... Upgrade completed. INFO: I4 fw upgrade from 7.3.0(INI:1) to 7.4.0(INI:1) succeeded ========================== Performing operation: BX A ========================== BX fw upgrade from 8.3.3166(INI:4) to 8.4.2740(INI:5): Upgrade started... Upgrade completed. INFO: BX fw upgrade from 8.3.3166(INI:4) to 8.4.2740(INI:5) succeeded ========================== Performing operation: BX B ========================== BX fw upgrade from 8.3.3166(INI:4) to 8.4.2740(INI:5): Upgrade started... Upgrade completed. INFO: BX fw upgrade from 8.3.3166(INI:4) to 8.4.2740(INI:5) succeeded =========================== Summary of Firmware update =========================== I4 status : FW UPDATE - SUCCESS I4 update succeeded on : A I4 already up-to-date on : none I4 update failed on : none BX status : FW UPDATE - SUCCESS BX update succeeded on : A, B BX already up-to-date on : none BX update failed on : none ========================================= Performing operation: SUN DCS gw firmware update ========================================= SUN DCS gw Kontron module fw upgrade from 1.3.2-1 to 2.0.4-1: Please reboot the system to enable firmware update of Kontron module. The download
of the Kontron firmware image happens during reboot. After system reboot, Kontron FW update progress can be monitored in browser using
URL [http://GWsystem] OR at OS command line prompt by using command [telnet GWsystem 1234]
where GWsystem is the hostname or IP address of SUN DCS GW. Firmware update is complete.
OK that worked fine, now exit the service processor shell and reboot it:
-> exit [root@xxxxgw1 ~]# reboot -n Broadcast message from root (pts/0) (Tue Mar 20 10:55:25 2012): The system is going down for reboot NOW! [root@xxxxgw1 ~]# Connection to xxxxgw1.qualogy.com closed by remote host. Connection to xxxxgw1.qualogy.com closed.
Wait a bit for the GW02 switch to come back up, then log back in to verify it and check the version:
% ssh firstname.lastname@example.org email@example.com's password: Last login: Tue Mar 20 09:22:49 2012 from 192.168.110.219 FW upgrade completed successfully on Tue Mar 20 11:02:32 CET 2012. Please run the "fwverify" CLI command to verify the new image. This message will be cleared on next reboot. You are now logged in to the root shell. It is recommended to use ILOM shell instead of root shell. All usage should be restricted to documented commands and documented config files. To view the list of documented commands, use "help" at linux prompt. [root@xxxxgw1 ~]# fwverify Checking all present packages: ................................................................................
.............................................................. OK Checking if any packages are missing: .................................................................................
........................................................ OK Verifying installed files: ..................................................................................
.......................................................... OK [root@xxxxgw1 ~]# version SUN DCS gw version: 2.0.4-1 Build time: Oct 17 2011 10:04:07 FPGA version: 0x33 SP board info: Manufacturing Date: 2010.12.30 Serial Number: "NCD600077" Hardware Revision: 0x0006 Firmware Revision: 0x0000 BIOS version: SUN0R100 BIOS date: 06/22/2010
OK, done! There’s more checking to do but I’ll skip it here for both for clarity and brevity.
Now that we have succesfully upgraded GW01, we can now make it the master switch so that GW02 is freed from network control duty and can be upgraded as well. We can do this by temporarely disabling the subnet manager on GW02, forcing a switchover:
[root@xxxxgw2 ~]# disablesm Stopping partitiond daemon. [ OK ] Stopping IB Subnet Manager..-. [ OK ]
Check on both GW01 and GW02 after waiting a few seconds:
[root@xxxxgw1 ~]# getmaster Local SM enabled and running 20120320 10:47:30 Master SubnetManager on sm lid 12 sm guid 0x2128be529ac0a0 :
SUN IB QDR GW switch xxxxgw1 192.168.110.250 [root@xxxxgw2 ~]# getmaster Local SM not enabled 20120320 10:47:39 Master SubnetManager on sm lid 12 sm guid 0x2128be529ac0a0 :
SUN IB QDR GW switch xxxxgw1 192.168.110.250
So now the GW01 has become the master switch and we can upgrade GW02 in the same way. After completing the upgrade for GW02 and checking the version, we should make sure the subnet manager is re-enabled on GW02 so it can again watch GW01′s back and quickly takeover control if the need arises.
[root@xxxxgw2 ~]# enablesm Starting IB Subnet Manager. [ OK ] Starting partitiond daemon. [ OK ]
Cool, we have in fact perfomed a rolling upgrade on the NM2-GW switches, and while we were upgrading them one after the other, the Exalogic stayed online!
Note: ususally there are some small post-upgrade steps to do which I will not mention here.
Next time, we will have a look at how the ZFS 7320 storage appliance kan be upgraded in a similar fashion, using the rolling upgrade principle.
Hebt u vragen of suggesties?
De Bruyn Kopsstraat 9
2288EC Rijswijk (ZH)
+31.(0)70 319 5000