Patching Oracle Exalogic – updating Linux on the compute nodes part I

Patching Exalogic part 4a

Before we dive further into the brave new world of virtualization on the Exalogic, I thought I’d finish up my series of posts on patching. My previous posts on this subject detailed upgrading firstly the network compontents (Infiniband Gateway Switches, indicated in green) and secondly the storage component (ZFS 7320 appliance, in blue).

This post will deal with patching the operating system on the modified Sun Fire X4170 M2 servers (in red), dubbed compute nodes in Exalogic terminology. In our case the OS is Oracle Linux.

As before with the storage and network patching I will demonstrate that patching of the compute nodes can be done in a rolling fashion, maintaining application availability during the upgrade, provided that your application is deployed in redundant (HA) fashion, for example in a Weblogic cluster spread over more that one physical node.

As an example, I will take the installation of the Exalogic 2.0.0.0.1 patchset (patch 13569004). This is the quarterly upgrade from april 2012. After unpacking the patch and thoroughly examining the README and the corresponding upgrade advisor document on MOS we can get to work. First we have to apply the network- and storage patches for the infrastructure as I have described before, but that is not the subject of our current post. After having done so, we can start on the compute nodes, which is our current focus.

You can patch one compute node at a time, or multiple nodes in parallel. The patch script facilitates the latter as well, through the use of the Exalogic’s Distributed Command Line interface (DCLI). It’s always prudent to patch one compute node (with some less critical deployments if possible) first as a test case and carefully evaluate the results before moving on. If you are patching in a rolling fashion you would probably not patch more than a few nodes at a time anyway.

1. Rolling patch procedure

So, basically the rolling patch procedure should look something like this :

1. Do all preparatory steps that can be done without impacting running application services.

2. Stop applications services in one half of your application cluster, make sure users are redirected/failed over to the other half of the cluster.

3. Upgrade the node(s) underlying the first half of your application cluster (that you just took out of service). Check if all went well.

4. Restart the applications services on the freshly patched node(s) and check they have rejoined the cluster. Now take the other half of your cluster nodes out of service.

5. Upgrade the other half of your cluster in the same fashion and restart the application services on the second half, etc.

So, let’s try this out for ourselves now….

2. Preinstallation checks

First we check if we are presently on the correct version of the Exalogic Base Image, the minimal version for a 2.0.0.0.x patchset is of course the 2.0.0.0.0 release. We can check this by running the “imageinfo” command across all the eight nodes in our quarter rack configuration. This is most easily done via a simple DCLI script :

.

[root@xxxxexa01 scripts]# cat check_imageinfo.scl
imageinfo  | head -1

.

[root@xxxxexa01 scripts]# dcli -t -g allnodes.lst -x check_imageinfo.scl
Target nodes: ['xxxxexacn01', 'xxxxexacn02', 'xxxxexacn03', 'xxxxexacn04',
'xxxxexacn05', 'xxxxexacn06', 'xxxxexacn07', 'xxxxexacn08']
xxxxexacn01: Exalogic 2.0.0.0.0 (build:r213841)
xxxxexacn02: Exalogic 2.0.0.0.0 (build:r213841)
xxxxexacn03: Exalogic 2.0.0.0.0 (build:r213841)
xxxxexacn04: Exalogic 2.0.0.0.0 (build:r213841)
xxxxexacn05: Exalogic 2.0.0.0.0 (build:r213841)
xxxxexacn06: Exalogic 2.0.0.0.0 (build:r213841)
xxxxexacn07: Exalogic 2.0.0.0.0 (build:r213841)
xxxxexacn08: Exalogic 2.0.0.0.0 (build:r213841)

The allnodes.lst file contains a list with the nodes we want to check, which is all of them in this case. Now that we have verified that we are OK on current versions, we can proceed with the patching process.

3. Patch preparation

The README tells us how to put things in place before starting the actual patching proces and shutting down the application services on the nodes, thus minimizing downtime for each (set of) node(s):

.

“Copy Base Image patch content from the patches share location to local disk on the compute node; create local node location if required. For example:

[root@compute-node ~]# cp -R /patches/13569004/Infrastructure/2.0.0.0.1/BaseImage/2.0.0.0.1/* /opt/baseimage_patch
The step given above is required for two reasons:
- The process of patching involves intermediate loss of network connectivity over Infiniband, so it is important that the base image patch bits are available on local disk.
- The patching process creates intermediate files and logs in the current directory, so if done from the shared location, there is potential for conflicts and overwriting of files.”
.

Step 1. Ok let’s set this up for compute node 8, which we will patch first in our example. We clean up any precious patch files first, just to be sure.

[root@xxxxexa08 ~]# rm -rf /opt/baseimage_patch/*

[root@xxxxexa08 ~]# cp -R /u01/common/patches/todo/13569004/Infrastructure/2.0.0.0.1/BaseImage/2.0.0.0.1/* /opt/baseimage_patch/.

[root@xxxxexa08 ~]# cd /opt/baseimage_patch

Step 2. Now we stop all application services on node 8 and verify that users and processes have failed over to node 7, where the other half of our cluster resides.

Step 3. As a precaution and to save on downtime, we should unmount any filesystems mounted over NFS. We don’t want any stray user processes barring unmount commands during reboot later and significantly slow down or even frustrate our patch job in the next step.

.

[root@xxxxexa08 baseimage_patch]# umount -avt nfs
mount: trying 192.168.10.30 prog 100005 vers 3 prot tcp port 51606
xxxxexasn-priv:/export/ExalogicDemo1/otd umounted
mount: trying 192.168.10.30 prog 100005 vers 3 prot tcp port 51606
xxxxexasn-priv:/export/ExalogicDemo1/oradata umounted
mount: trying 192.168.10.30 prog 100005 vers 3 prot tcp port 51606
...
...
mount: trying 192.168.10.30 prog 100005 vers 3 prot tcp port 51606
umount: /u01/products/Middleware11gPS3: device is busy
mount: trying 192.168.10.30 prog 100005 vers 3 prot tcp port 51606
xxxxexasn-priv:/export/common/patches umounted
mount: trying 192.168.10.30 prog 100005 vers 3 prot tcp port 51606
xxxxexasn-priv:/export/common/general umounted

Now check if there’s no NFS filesystems left mounted… looks like we might have an issue!

.

[root@xxxxexa08 baseimage_patch]# mount -lt nfs
xxxxexasn-priv:/export/products/Middleware11gPS3 on
/u01/products/Middleware11gPS3 type nfs
(rw,bg,hard,nointr,rsize=131072,wsize=131072,tcp,nfsvers=3,
addr=192.168.10.30)
xxxxexasn-priv:/export/ACSExalogicSystem/nodemgrs on
/u01/ACSExalogicSystem/nodemgrs type nfs
(rw,bg,hard,nointr,rsize=131072,wsize=131072,tcp,nfsvers=3,
addr=192.168.10.30)

Ooops, got it, forgot to shutdown the Weblogic nodemanager on this node! That can be fixed quickly enough. Shutdown the nodemanager and retry :

.

[root@xxxxexa08 baseimage_patch]# umount -avt nfs
mount: trying 192.168.10.30 prog 100005 vers 3 prot tcp port 51606
xxxxexasn-priv:/export/ACSExalogicSystem/nodemgrs umounted
mount: trying 192.168.10.30 prog 100005 vers 3 prot tcp port 51606
xxxxexasn-priv:/export/products/Middleware11gPS3 umounted
[root@xxxxexa08 baseimage_patch]# mount -lt nfs

OK, no NFS filesystems left mounted. We could starting patching now, but to speed things up a bit more it’s (my personal) good practice to minimize mount/unmount times during the patch process by temporarely stripping out unneeded NFS entries in the /etc/fstab. Make sure you have make a good backup of the original /etc/fstab file as you need to restore it after the patch has completed. Also, as a precaution don’t take out the /u01/common/general entry as the patch files reside here (eventhough we made a local copy). I’ve had some problems when I did this before, when doing multiple nodes in parallel, so leave it in.

4. Patch execution

Now that we have a minimal set of entries in our /etc/fstab file, we should have a pretty speedy patch procedure. Since the patch installation involves a least two reboots, it’s handy to follow the proceedings across reboots by logging onto the console using the Integrated Lights Out Management interface in another session (as is mentioned later on in the README).

.

JNs-MBP3-QA-2:~ jnwerk$ ssh root@xxxxexacn08-c.qualogy.com
Password:

Oracle(R) Integrated Lights Out Manager

Version 3.0.16.10.a r68533

Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.

-> start /SP/console
Are you sure you want to start /SP/console (y/n)? y
 
Serial console started.  To stop, type ESC (

From this ILOM console session you can follow what goes on through reboots as well. We can now start the patch script.

Step 4. Start the patch script ebi_patch.sh from /opt/baseimage_patch/scripts

[root@xxxxexa08 baseimage_patch]# cd scripts ;  ./ebi_patch.sh
INFO: Wed Jul  4 13:45:27 CEST 2012: Compute Node Image Version Found: 2.0.0.0.0
INFO: Wed Jul  4 13:45:27 CEST 2012: Patch state file not found; creating file
INFO: Wed Jul  4 13:45:27 CEST 2012: Preparing to update kernel...
INFO: Wed Jul  4 13:45:27 CEST 2012: Backing up configuration files
INFO: Wed Jul  4 13:45:27 CEST 2012: Done backing up configuration files
INFO: Wed Jul  4 13:45:27 CEST 2012: Uninstalling infinibus
INFO: Wed Jul  4 13:45:27 CEST 2012: Done Uninstalling infinibus
INFO: Wed Jul  4 13:45:27 CEST 2012: Uninstalling OFED_IOV
warning: /etc/libsdp.conf saved as /etc/libsdp.conf.rpmsave
warning: /etc/infiniband/openib.conf saved as /etc/infiniband/openib.conf.rpmsave
INFO: Wed Jul  4 13:45:41 CEST 2012: Done uninstalling OFED_IOV
INFO: Wed Jul  4 13:45:41 CEST 2012: Uninstalling OFA
INFO: Wed Jul  4 13:45:41 CEST 2012: Done uninstalling OFA
INFO: Wed Jul  4 13:45:41 CEST 2012: Updating kernel
warning: ../OS/OracleLinux_5.6/Kernel/2.6.32-200.21.2.el5uek
/kernel-uek-2.6.32-200.21.2.el5uek.x86_64.rpm: Header V3 DSA signature: NOKEY, key ID 1e5e0159
WARNING: No module ehci-hcd found for kernel 2.6.32-200.21.2.el5uek, continuing anyway
WARNING: No module ohci-hcd found for kernel 2.6.32-200.21.2.el5uek, continuing anyway
WARNING: No module uhci-hcd found for kernel 2.6.32-200.21.2.el5uek, continuing anyway
rmdir: /lib/modules/2.6.32-200.21.1.el5uek/updates/dkms: No such file or directory
INFO: Wed Jul  4 13:45:57 CEST 2012: Done updating kernel
INFO: Wed Jul  4 13:45:57 CEST 2012: Updating grub.conf
INFO: Wed Jul  4 13:45:57 CEST 2012: Done updating grub.conf
INFO: Wed Jul  4 13:45:57 CEST 2012: Kernel update done on compute node.
INFO: Wed Jul  4 13:45:57 CEST 2012: IMPORTANT: REBOOTING NOW.
This script will AUTO-RUN ONCE after reboot.
Broadcast message from root (pts/0) (Wed Jul  4 13:45:57 2012):
The system is going down for reboot NOW!
[root@xxxxexa08 baseimage_patch]# Connection to xxxxexa08 closed by remote host.
Connection to xxxxexa08 closed.

The README says the following about this step :

“Once the script completes execution, the node will reboot with the updated kernel, and will auto-reboot again to apply patches/upgrades to other components on the compute node. Logs will be available in ebi_20001.log and ebi_dcli.log files in the scripts directory.”
.

Note that in between reboots, there usually is no Infiniband connectivity if the OFED drivers are upgraded.

Next time

In part two of this post we will check if all went OK, finish the patching procedure for this node and complete the rolling upgrade procedure for our Exalogic Compute nodes.

OVER DE AUTEUR

Jos Nijhoff is an experienced Application Infrastructure consultant at Qualogy. Currently he plays a key role as technical presales for Qualogy's exclusive Exalogic partnership with Oracle for the Benelux area. Thus he keeps in close contact with Oracle presales and partner services, but maintains an independent view. He gives technical guidance and designs, reviews, manages and updates the application infrastructure before, during and after the rollout of new and existing Oracle (Fusion) Applications & Fusion Middleware implementations. Jos is also familiar with subjects like high availability, disaster recovery scenarios, virtualization, performance analysis, data security, and identity management integration with respect to Oracle applications.

2 Reacties op Patching Oracle Exalogic – updating Linux on the compute nodes part I

  1. Tony van Esch schreef:

    Hi Jos,
    just wondering wether live patching with ksplice is an option. Then there would be no need for reboots.

    regards, Tony

    • Jos Nijhoff schreef:

      Hi Tony,

      thank you, that’s a good question, I’ll pass it on to the Exalogic engineering team. Note that the patch process now also does things like updating the ILOM firmware via ipmiflash etc., so introducing it might not entirely eliminate reboots. But since Oracle touts ksplice quite a bit, they should also introduce this to their engineered systems, I agree.

      Regards,
      Jos Nijhoff

Geef een reactie

Het e-mailadres wordt niet gepubliceerd. Verplichte velden zijn gemarkeerd met *

De volgende HTML tags en attributen zijn toegestaan: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Contact

Hebt u vragen of suggesties?

Mail info@qualogy.com!


De Bruyn Kopsstraat 9

2288EC Rijswijk (ZH)

The Netherlands

+31.(0)70 319 5000

  • Blog

  • Tags

  • @qualogy_news

  • @qresources

  • Reacties

  • Blijf in contact

    +31.(0)70 319 5000