Investigating CVU Failures

25 October 2010

This week we had a interesting problem with a new two-node cluster. Oracle 11.2 includes an embedded version of the Cluster Verification Utility (CVU) which is executed at the end of the interview process. Since the CVU can fix many problems that it encounters without restarting the installer, I have got out of the habit of running the stand-alone CVU prior to an installation.  This time the OUI failed reporting that shared disks were not sharable. In this post I will describe the investigative process that we followed; the process is much more instructive than the outcome. In the following notes I have assumed that the unsharable multipath device was /dev/mapper/data (partition /dev/mapper/datap1). The servers are server14 and server15.

The first step was to reproduce the problem using the standalone version of the CVU. In Oracle 11.2 the runcluvfy.sh script is in the same directory as the  runInstaller script. We ran this as follows

./runcluvfy.sh comp ssa -n server14,server15  -s /dev/mapper/datap1

This utility produced the following output:

Verifying shared storage accessibility
 Checking shared storage accessibility...
 "/dev/mapper/datap1" is not shared
 Shared storage check failed on nodes "server15,server14"
Verification of shared storage accessibility was unsuccessful on all the specified nodes.

We tried appending the -verbose parameter, but this did not produce any more informative output.

The next step was to enable CVU trace. On Linux this can be enabled using the CV_TRACELOC parameter. For example:

mkdir /tmp/cvutrace
export CV_TRACELOC=/tmp/cvutrace

The environment variable specifies that trace should be written to a file called cvutrace.log.0 in the /tmp/cvutrace directory.

The trace output allowed us to identify exactly where the failure was occuring, but not the actual cause. By searching the output around the failure we discovered that it was occuring following a call to exectask.

Exectask is an executable that is copied to the /tmp/CVU_11.2.0.2.0_grid directory. It appears to contain all of the port specific system calls required by the CVU. It takes around 20 options and each option returns an XML document containing output and status/error messages. From the CVU trace we determined that exectask was being called with the following parameters:

/tmp/CVU_11.2.0.2.0_grid/exectask -getstinfo 
    -getdiskinfo /dev/mapper/data%/dev/mapper/datap1

The CVU is calling exectask with the -getstinfo (get storage information) parameter and the -getdiskinfo (get disk information) subparameter.  Note that the format of the disk name is device%partition.

Executed stand-alone, we observed that exectask returned different XML documents for each node.

On the first node:

<CV_VAL>
  <disk>
    <disk_name>/dev/mapper/data</diskname>
    <disksignature>PAFUCC29SZ4028</disk_signature>
    <NUMPARTS>0</NUMPARTS>
    <disk_state>0</disk_state>
    <disk_size>5362849792</disk_size>
    <disk_owner>grid</disk_owner>
    <disk_group>asmadmin</disk_group>
    <disk_permissions>0660</disk_permissions>
  </disk>
</CVVAL>
<CV_VRES>0</CV_VRES>
<CV_LOG>Exectask:getDiskInfo success</CV_LOG>
<CV_ERES>0</CV_ERES>

On the second node:

<CV_VAL>
  <disk>
    <disk_name>/dev/mapper/data</diskname>
    <disksignature>PAFUCC29SZ4064</disk_signature>
    <NUMPARTS>0</NUMPARTS>
    <disk_state>0</disk_state>
    <disk_size>5362849792</disk_size>
    <disk_owner>grid</disk_owner>
    <disk_group>asmadmin</disk_group>
    <disk_permissions>0660</disk_permissions>
  </disk>
</CVVAL>
<CV_VRES>0</CV_VRES>
<CV_LOG>Exectask:getDiskInfo success</CV_LOG>
<CV_ERES>0</CV_ERES>

The output has been reformatted to improve readability. On closer inspection, the only difference is the disk signature field.

We used the multpath utility to determine the device names of the SCSI disks (there are probably lots of ways to achieve this)

[root@server14]# multipath -ll /dev/mapper/data
data {36001438005dedd1b0000300002260000} dm-4 HP,HSV400
[size=5.0G][features=0][hwhandler=0][rw]
\_ round-robin 0 [prio=0][active]
\_ 1:0:0:5 sda 8:64 [actve][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:1:5 sdf 8:144 [actve][ready]

In the above example the devices are /dev/sda and /dev/sdf. We will use /dev/sda.

We then used the scsi_id utility, first to verify the WWID. The default output is:

[root@server14]# /sbin/scsi_id -g -u -s /block/sda
36001438005dedd1b0000300002260000

The -p option of scsi_id specifies the SCSI field returned in the output. The names are not very user-friendly. The default is 0×83. So the following command is identical to the above:

[root@server14]#/sbin/scsi_id -p 0x83 -g -u -s /block/sda
36001438005dedd1b0000300002260000

In this case we are not interested in the WWID, but in the disk identifier. For an HP EVA SAN this turns out to be derived from the controller ID and is stored in SCSI field 0×80. We can use the following command to obtain this value:

[root@server14]# /sbin/scsi_id -p 0x80 -g -u -s /block/sda
SHV_HSV400_PAFUCC29SZ4028

If we execute the same command on the second node we get the following output:

[root@server15]# /sbin/scsi_id -p 0x80 -g -u -s /block/sda
SHV_HSV400_PAFUCC29SZ4064

So the disk identifiers reported by /sbin/scsi_id are different. If the storage is shared they should be identical.

At this point we have proved that this problem is at storage/operating system level and we can reproduce it without Oracle.

After drawing a diagram of the storage we concluded that the problem was with one of the fabric switches, so we disabled the offender. The scsi_id command now returned the same output for both nodes:

[root@server14]# /sbin/scsi_id -p 0x80 -g -u -s /block/sda
SHV_HSV400_PAFUCC29SZ4028
[root@server15]# /sbin/scsi_id -p 0x80 -g -u -s /block/sda
SHV_HSV400_PAFUCC29SZ4028

We could now retry the CVU with more confidence:

./runcluvfy.sh comp ssa -n server14,server15 -s /dev/mapper/data
Verifying shared storage accessibility
Checking shared storage accessibility...
"/dev/mapper/EOrac1_CRS1p1" shared
Shared storage check was successful on nodes "server15,server14"
Verification of shared storage accessibility was successful

Following the successful execution of the CVU, the Grid Infrastructure installation proceeded without any further problems.


Oracle 11.1 Enterprise Manager Grid Control

21 October 2010

It took us three days, but we finally managed to install the new 11.1 EM Grid Control. To be fair to Oracle once it is installed it works pretty well. However, in my opinion installation is more complex than previous versions.

Our main goal was to verify that, whilst not explicitly supported,  Oracle 11.2.0.2 Grid Infrastructure, Database and ASM was compatible with the latest version of Grid Control which was released around March 2010 (and therefore pre-dates 11.2.0.2 by about six months).

The Oracle 11.1 Grid Control installation is a bit more labour intensive than its predecessors. Previous versions optionally installed a management repository database and a application server; in this version you must pre-create the database and install the application server yourself.

We tested the installation on Linux x86-64. 

The first step is to install a JDK. The only supported JDK is 1.6 version 18.  It took some time to track this down as the current version is 22 so you need to search the archives. Remember that Oracle now owns Java, so the JDK is downloaded from www.oracle.com rather than a Sun site.

The next step is to install a WebLogic server; only 10.3.2 is supported (as the documentation says, not 10.3.1 or 10.3.3). There is no operating system specific version of this product for Linux x86-64; instead download the generic version.

Next is the worst bit; you need to patch the WebLogic server with patch WDJ7 which is ONLY available through My Oracle Support. So if you don’t have a support contract, you cannot fully install Enterprise Manager. Whilst I can understand why Oracle are not keen on unsupported systems, the downside is that EM is going to be less well tested in the user community and ultimately will probably see a lower take-up amongst the customer base as a result.

Even if you have a support contract, you may still have problems with the WebServer patch. It can only be installed using the Smart Update tool (a fancy GUI product for WebLogic which is similar in many ways to OPatch). By default Smart Update will connect to MOS and download the required patches. If you don’t have a direct internet connection this may cause you some issues because althoough Smart Update can install previously downloaded patches, WDJ7 cannot be downloaded directly from MOS. You have two options; the first is to install Smart Update on another server that DOES have a direct internet connection and to download the patch from MOS using the tool; the second is to raise an SR. We did the latter and to be fair we had the patch within an hour. Of course this is a crazy use of Oracle support time; don’t be surprised if MOS incident resolution rates improve dramatically by the time they have sent the WDJ7 patch to 100,000 customers individually….

Next step is to install RDBMS software for the management repository database. We used 11.2.0.1; we were not quite brave enough to try 11.2.0.2 though it would probably have worked. We used DBCA to create a database; there are no restrictions on database name. You can create a RAC database or use Data Guard to provide HA; in the event we did neither as we are using a VMWare virtual machine and the entire server is replicated to another site.  The main complaint here is that the documentation is a bit muddled; it took a while to figure out what was required. For example hidden deep in the documentation it states that the management repository needs the Partitioning Option and therefore you must install Enterprise Edition. Why not say this up front?

Once the database has been created you are ready to install Grid Control. To be fair, now the database and WebLogic are installed first, there is a much better chance of the Grid Control succeeding first time and in fact it did for us. It took around an hour on our VM. If you have installed Oracle 10.2 Enterprise Manager you will remember there were about 30 agents that ran after the root script had been executed; in this release there are only about half a dozen agents though a couple took 10-15 minutes each. Go to lunch or leave it running overnight…

We are using secure http (https); the default port number is 7799 and you need to login as SYSMAN initially

Although an EM agent was installed on the local machine, EM Grid Control could not access the management repository database until we entered and saved its credentials in EM.

Prior to configuring remote targets it is necessary to install the EM Management agent on remote nodes; I will cover this in a separate post. After installing the Management agent, it is still necessary to enter credentials for each database in EM before you can monitor it. The agent automatically registers nodes, ASM instances and listeners.

We could not enter credentials for the standby database until we switched over (using DGMGRL); further investigation is required to know if this is user-error or a feature.

We had already configured the Data Guard Broker. EM discovered this automatically on both the physical and standby databases. Once credentials for both databases had been entered, we could use EM to switchover and switchback without any additional configuration.

So, apart from the WebLogic patch issue, I was quite impressed with 11g EM Grid Control. Don’t expect to install it first time in a couple of hours though.


Linux: Listing users belonging to a specific group

20 October 2010

I have a discovery script that I run on Linux servers to collect configuration files and other information. I use the resulting output to write reports offline.  The script also collects the .bash_profile and ssh configuration files for each Oracle user (which vary from site to site). Up to now I have been customizing the scripts on site, but I have now found a better way to identify Oracle users.

In most environments, all Oracle users (grid, oracle etc) belong to the oinstall operating system group. Therefore all I need is a list of users belonging to this group. I could use awk or perl to extract this information, but I would need to inspect both /etc/passwd and /etc/group which seems a lot of effort.

However, there is a utility called lid (/usr/sbin/lid) in RHEL5 and OEL5 (at least) which can be used for this purpose.

By default lid displays infromation about the groups to which a specific user belongs. It can also display a list of users belonging to a specific group (which is the functionality that I require).

lid takes a couple of options; -g to list users in a specified group and -n to output names only (no UIDs or GIDs).

So I can obtain my list of Oracle users using:

/usr/sbin/lid -g oinstall -n

For example:

[oracle@server14'] $ /usr/sbin/lid -g oinstall -n
grid
oracle
oraclea
oracleb
oraclec

I can use the resulting list to identify files to be collected.


IP Host Names

19 October 2010

I have never seen a definition of an IP host name before, but today I found one in the Oracle Enterprise Manager Grid Control Basic Installation Guide which says:

“According to RFC 952, the following are the assumptions: A”name” (Net, Host, Gateway or Domain name) is a text string up to 24 characters drawn from the alphabet (A-Z), digits (0-9), minus sign (-) and period (.). Note that periods are only allowed when they serve to delimit components of the “domain style names”. No blank or space characters are permitted as part of a name. No distinction is made between upper and lower case. The first character must be an alpha character.”

So, maximum of 24 alphanumeric characters plus hyphens. No underscores or spaces – useful to know when devising naming conventions.


Oracle 11gR2 – Data Guard Broker

18 October 2010

I have installed Data Guard for about 30 customers over the past five or six years, but so far all have chosen to use SQL*Plus to perform standby role management.

This week I have been working on an Oracle 11.2.0.2 single-instance standby configuration which has been an ideal opportunity to become reaquainted with the Data Guard Broker. I have to say I was impressed…

But I probably cheated; prior to configuring the Broker, I spent the best part of a day building and configuring a physical standby database and then testing redo transport, redo apply, switchover and switchback using the traditional SQL*Plus commands. Before I even started to look at the Broker, I had fixed the usual handful of stupid mistakes, mainly typos, in the configuration.

However, given a solid foundation, implementing the Broker was almost trivial. This post assumes that the db_name is PROD and that the db_unique_names are PROD and STBY.  These names follow my customers existing standards; where possible I prefer to use geographic names for the DB_UNIQUE_NAME parameter as I think it is easier to visualize the configuration during role management.

Configure LISTENER.ORA

First, on each node add a static entry to the LISTENER.ORA file. For example:

SID_LIST_LISTENER =
 (SID_LIST =
   (SID_DESC =
     (GLOBAL_DBNAME = PROD_DGMGRL)
     (ORACLE_HOME= /u01/app/oracle/product/11.2.0/dbhome_1)
     (SID_NAME = PROD)
   )
  )

The Data Guard Broker defaults to static service names with the DGMGRL suffix. It is possible to override the default service names, but in this case I chose the lazy option and stuck with the Oracle defaults.

Note that listener.ora will be different on each node, so don’t copy it around.

Configure TNSNAMES.ORA

We also specified entries for both the primary and standby databases in $ORACLE_HOME/network/admin/tnsnames.ora. For example:

PROD =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL=TCP)(HOST=server14.juliandyke.com)(PORT=1521))
   (CONNECT_DATA = 
       (SERVER = DEDICATED)
       (SERVICE_NAME = PROD)
   )
)
STBY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL=TCP)(HOST=server15.juliandyke.com)(PORT=1521))
   (CONNECT_DATA = 
       (SERVER = DEDICATED)
       (SERVICE_NAME = STBY)
   )
)

In this example, the PROD database will run on server14 and the STBY database will run on server15.

Enable Data Guard Broker

Next enable the Data Guard Broker for each database (primary and standby) in the configuration by setting the DG_BROKER_START parameter to TRUE.

ALTER SYSTEM SET dg_broker_start = TRUE;

The Data Guard Broker maintains two copies of its configurations. The locations of these files are specified by the DG_BROKER_CONFIG_FILE1 and DG_BROKER_CONFIG_FILE2 parameters. For a single instance database, the default values are:

  • $ORACLE_HOME/dbs/dr1<db_unique_name>.dat
  • $ORACLE_HOME/dbs/dr2<db_unique_name>.dat

The defaults are probably adequate for a single instance database; in a RAC environment the defaults should be modified to specify a location in a shared file system or ASM diskgroup(s).

Start DGMGRL

On any node in the configuration, start the Data Guard Broker command utility (DGMGRL) and connect as the SYS user.

[oracle@server14]$ dgmgrl
DGMGRL> CONNECT SYS
Password: <Enter Password>

Note that it is not necessary to specify the SYSDBA privilege as this is assumed by DGMGRL. 

Create Data Guard Broker Configuration 

Create a configuration. For example:

DGMGRL> CREATE CONFIGURATION DG1' AS
> PRIMARY DATABASE IS 'PROD'
> CONNECT IDENTIFIER IS 'PROD';
Configuration "DG1" created with primary database "PROD"

You will need to specify a configuration name and the unique name of the primary database.

Add standby database(s)

Add all standby databases to the configuration. In this example we just have one physical standby database:

 DGMGRL> ADD DATABASE ‘STBY’ AS
> CONNECT IDENTIFIER IS ‘STBY’;
Database ‘STBY’ added

Enable the configuration

By default the Data Guard configuration is disabled. Enable the configuration as follows:

DGMGRL> ENABLE CONFIGURATION
Enabled.

Switchover to standby

As we started with a working Data Guard configuration, it should be possible to switchover to the standby database at this stage. Switchover is achieved using a single command e.g.:

DGMGRL> SWITCHOVER TO 'STBY'
Performing switchover NOW, please wait...
New primary database "STBY" is opening...
Operation requires shutdown of instance "PROD" on database "PROD"
Shutting down instance "PROD"...
ORACLE instance shut down.
Operation requires startup of instance "PROD" on database "PROD"
Starting instance "PROD"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "STBY"

Switchover worked first time for us. However, we have experienced problems with the restart of the new standby (old primary) in subsequent tests. As these problems are intermittent we think they may be due to timing issues; the servers have new Intel 6-core processors.

Switchback

Switchback is very similar to switchover.

DGMGRL> SWITCHOVER TO 'PROD'
Performing switchover NOW, please wait...
New primary database "PROD" is opening...
Operation requires shutdown of instance "STBY" on database "STBY"
Shutting down instance "STBY"...
ORACLE instance shut down.
Operation requires startup of instance "STBY" on database "STBY"
Starting instance "STBY"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "PROD"

As with switchover we have seen occasional failures with switchback which again we think are timing-related.

Disabling Configuration

The configuration can be disabled at any time using:

DGMGRL> DISABLE CONFIGURATION
Disabled.

Removing the configuration

The configuration can be removed using:

DGMGRL> REMOVE CONFIGURATION
Removed configuration

We have removed the Data Guard Broker configuration a couple of times to fix problems with underlying parameters; it is possible to edit properties within DGMGRL that update parameters, but we wanted to ensure that the underlying configuration was also correct in case the Broker was subsequently deconfigured.

Checking the Data Guard Configuration

You can check the Data Guard configuration in DGMGRL using the SHOW CONFIGURATION command which has both default and verbose modes. For example:

DGMGRL> SHOW CONFIGURATION
 Configuration - DG1
  Protection Mode: MaxAvailability
  Databases:
    PROD - Primary database
    STBY - Physical standby database
  Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS

The verbose option includes some additional properties:

DGMGRL> SHOW CONFIGURATION VERBOSE
Configuration - DG1 
  Protection Mode: MaxAvailability
  Databases:
    PROD - Primary database
    STBY - Physical standby database   
  Properties:
    FastStartFailoverThreshold      = '30'
    OperationTimeout                = '30'
    FastStartFailoverLagLimit       = '30'
    CommunicationTimeout            = '180'
    FastStartFailoverAutoReinstate  = 'TRUE'
    FastStartFailoverPmyShutdown    = 'TRUE'
    BystandersFollowRoleChange      = 'ALL'
Fast-Start Failover: DISABLED
Configuration Status:
SUCCESS

Checking the Primary Database Configuration

The primary database configuration can be checked using the SHOW DATABASE command. This command also has a default and verbose mode:

DGMGRL> SHOW DATABASE 'PROD' 
Database - PROD

  Role:            PRIMARY
  Intended State:  TRANSPORT-ON
  Instance(s):
    PROD
Database Status:
SUCCESS

 The verbose mode includes properties for the database, many of which reflect the underlying initialization parameters:

DGMGRL> SHOW DATABASE VERBOSE 'PROD'
 Database - PROD
   Role:            PRIMARY
   Intended State:  TRANSPORT-ON
  Instance(s):
    PROD
 Properties:
  DGConnectIdentifier            = 'PROD'
  ObserverConnectIdentifier      = ''
  LogXptMode                     = 'SYNC'
  DelayMins                      = '0'
  Binding                        = 'OPTIONAL'
  MaxFailure                     = '0'
  MaxConnections                 = '1'
  ReopenSecs                     = '300'
  NetTimeout                     = '30'
  RedoCompression                = 'DISABLE'
  LogShipping                    = 'ON'
  PreferredApplyInstance         = ''
  ApplyInstanceTimeout           = '0'
  ApplyParallel                  = 'AUTO'
  StandbyFileManagement          = 'AUTO'
  ArchiveLagTarget               = '0'
  LogArchiveMaxProcesses         = '5'
  LogArchiveMinSucceedDest       = '1'
  DbFileNameConvert              = '+DATA2, +DATA1'
  LogFileNameConvert             = '+FRA2, +FRA1'
  FastStartFailoverTarget        = ''
  InconsistentProperties         = '(monitor)'
  InconsistentLogXptProps        = '(monitor)'
  SendQEntries                   = '(monitor)'
  RecvQEntries                   = '(monitor)'
  SidName                        = 'PROD'
  StaticConnectIdentifier        =
  '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.227.27.1)
   (PORT=1521))(CONNECT_DATA=(SERVICE_NAME=PROD_DGMGRL)
  (INSTANCE_NAME=PROD)(SERVER=DEDICATED)))'
  StandbyArchiveLocation         = '+FRA1'
  AlternateLocation              = ''
  LogArchiveTrace                = '0'
  LogArchiveFormat               = '%t_%s_%r.dbf'
  TopWaitEvents                  = '(monitor)'
Database Status:
SUCCESS

Checking the Physical Standby Database Configuration

The physical standby database configuration can also be checked using the SHOW DATABASE command. Again this command has a default and verbose mode:

DGMGRL> SHOW DATABASE 'STBY'
Database - STBY 
  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    STBY
Database Status:
SUCCESS

I can never remember the names of the Data Guard dynamic performance views so I really like the Transport Lag and Apply Lag fields in the above output.

Again the verbose mode includes properties for the standby database:

DGMGRL> SHOW DATABASE VERBOSE 'STBY'
Database - STBY 
  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   0 seconds
  Apply Lag:       0 seconds
  Real Time Query: OFF
  Instance(s):
    STBY 
Properties:
  DGConnectIdentifier            = 'STBY'
  ObserverConnectIdentifier      = ''
  LogXptMode                     = 'SYNC'
  DelayMins                      = '0'
  Binding                        = 'OPTIONAL'
  MaxFailure                     = '0'
  MaxConnections                 = '1'
  ReopenSecs                     = '300'
  NetTimeout                     = '30'
  RedoCompression                = 'DISABLE'
  LogShipping                    = 'ON'
  PreferredApplyInstance         = ''
  ApplyInstanceTimeout           = '0'
  ApplyParallel                  = 'AUTO'
  StandbyFileManagement          = 'AUTO'
  ArchiveLagTarget               = '0'
  LogArchiveMaxProcesses         = '5'
  LogArchiveMinSucceedDest       = '1'
  DbFileNameConvert              = '+DATA1, +DATA2'
  LogFileNameConvert             = '+FRA1, +FRA2'
  FastStartFailoverTarget        = ''
  InconsistentProperties         = '(monitor)'
  InconsistentLogXptProps        = '(monitor)'
  SendQEntries                   = '(monitor)
  RecvQEntries                   = '(monitor)'
  SidName                        = 'PROD'
  StaticConnectIdentifier        =
  '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=10.221.27.2)
  (PORT=1521))(CONNECT_DATA=(SERVICE_NAME=STBY_DGMGRL)
  (INSTANCE_NAME=STBY)(SERVER=DEDICATED)))'
  StandbyArchiveLocation         = '+FRA2'
  AlternateLocation              = ''
  LogArchiveTrace                = '0'
  LogArchiveFormat               = '%t_%s_%r.dbf'
  TopWaitEvents                  = '(monitor)'
 Database Status:
SUCCESS

V$SQL_FEATURE

16 October 2010

Continuing my investigations into dynamic performance views that are new in Oracle 11.2.0.1, this week I have been looking at V$SQL_FEATURE. This view contains a list of 497 optimizer features.

V$SQL_FEATURE contains three columns; SQL_FEATURE which is the name of the feature, DESCRIPTION which contains a short description and PROPERTY which is only set to a non-zero value for nine features, all of which appear to be query transformations.

On closer inspection, the majority of features listed in V$SQL_FEATURE are actually bug fixes. In Oracle 11.2.0.1 a total of 407 fixes are listed. This leaves us with 90 actual features.

V$SQL_FEATURE has two related dynamic performance views:

  • V$SQL_FEATURE_DEPENDENCY lists dependencies between features. This view contains two columns, SQL_FEATURE (the name of the feature) and DEPEND_ON (the name of the feature on which it depends). Only one feature (QKSFM_OR_EXPAND) is listed as having a dependency (QKSFM_JPPD). I suspect this view has not been fully implemented in Oracle 11.2.0.1
  • V$SQL_FEATURE_HIERARCHY lists the hierarchy of features.  This view contains two columns, SQL_FEATURE (the name of the feature) and PARENT_ID (the name of any parent). In Oracle 11.2.0.1 this view contains 501 rows. The PARENT_ID is NULL for 414 rows including all of the bug fixes. Surprisingly four of the features have two parents (QKSFM_UNNEST, QKSFM_JPPD, QKSFM_CVM and QKSFM_CBQT), the remaining 83 features just have one parent.

In summary these are not the most useful set of dynamic performance views for DBAs, but they will definitely be of interest to students and researchers of the optimizer.

A list of the optimizer features reported by V$SQL_FEATURE in Oracle 11.2.0.1 (excluding the bug fixes) is available on my website here


Oracle 11gR2 – Active Data Guard

14 October 2010

Active Data Guard allows a standby database to be opened for read-only access whilst redo is still being applied. For some applications Active Data Guard can represent a more efficient use of Oracle licenses on the standby database.  However, this benefit is offset to a certain extent by the fact that Active Data Guard is available on Enterprise Edition only and is cost option which must be licensed on both the primary and standby database.

Several of my customers are currently using Active Data Guard; in general they are very happy with it. A few others have discovered that it is very easy to inadvertently enable Active Data Guard. This is not desirable or advisable as Oracle have instigated licence audits with a large number of UK customers over the past couple of years.

To determine whether a standby database is using Active Data Guard use the following query:

SELECT database_role, open_mode FROM v$database;

For example:

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY WITH APPLY

If you start a database in SQL*Plus using the STARTUP command and then invoke managed recovery, the Active Data Guard will be enabled. For example:

[oracle@server14]$ sqlplus / as sysdba
SQL> STARTUP
ORACLE instance started.
Total System Global Area 6497189888 bytes
Fixed Size 2238672 bytes
Variable Size 3372222256 bytes
Database Buffers 3103784960 bytes
Redo Buffers 18944000 bytes
Database mounted
Database opened

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
WITH SESSION SHUTDOWN;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY WITH APPLY

However, if the database is started in SQL*Plus using the STARTUP MOUNT command and then managed recovery is invoked, Active Data Guard will not be enabled.

[oracle@server14]$ sqlplus / as sysdba
SQL> STARTUP MOUNT
ORACLE instance started.
Total System Global Area 6497189888 bytes
Fixed Size 2238672 bytes
Variable Size 3372222256 bytes
Database Buffers 3103784960 bytes
Redo Buffers 18944000 bytes
Database mounted
Database opened

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
WITH SESSION SHUTDOWN;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED

In the database has been started in SQL*Plus using STARTUP MOUNT and the database is subsequently opened read only, then invoking managed recovery will enable Active Data Guard. For example:

[oracle@server14]$ sqlplus / as sysdba
SQL> STARTUP MOUNT
ORACLE instance started.
Total System Global Area 6497189888 bytes
Fixed Size 2238672 bytes
Variable Size 3372222256 bytes
Database Buffers 3103784960 bytes
Redo Buffers 18944000 bytes
Database mounted
Database opened

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED

SQL> ALTER DATABASE OPEN READ ONLY;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
WITH SESSION SHUTDOWN;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY WITH APPLY

Of course not all databases are started using SQL*Plus.

If you start the database using SRVCTL then the default open mode can be specified in the OCR.

You can check the default open mode for a database using SRVCTL CONFIG DATABASE. For example if the database is called PROD:

[oracle@server14]$ srvctl config database -d PROD
Database unique name: PROD
Database name: PROD
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile: +DATA1/PROD/spfilePROD.ora
Domain:
Start options: open
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Server pools: PROD
Disk Groups: DATA1, FRA1
Mount point paths:
Services:
Type: SINGLE
Database is administrator managed

In the above example, if the PROD database is started using SRVCTL then the database will be opened in read-only mode. For example:

[oracle@server14]$ srvctl start database -d PROD
[oracle@server14]$ sqlplus / as sysdba

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
WITH SESSION SHUTDOWN;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY READ ONLY WITH APPLY

The default start mode can be modified in the OCR using the SRVCTL MODIFY DATABASE command.

For example:

[oracle@server14]$ srvctl modify database -d PROD -s mount

The database configuration is updated as follows:

[oracle@server14]$ srvctl config database -d PROD
Database unique name: PROD
Database name: PROD
Oracle home: /u01/app/oracle/product/11.2.0/dbhome_1
Oracle user: oracle
Spfile: +DATA1/PROD/spfilePROD.ora
Domain:
Start options: mount
Stop options: immediate
Database role: PHYSICAL_STANDBY
Management policy: AUTOMATIC
Server pools: PROD
Disk Groups: DATA1, FRA1
Mount point paths:
Services:
Type: SINGLE
Database is administrator managed

When the default start mode is set to mount, Active Data Guard will not be enabled when managed recovery is invoked. For example:

[oracle@server14]$ srvctl start database -d PROD
[oracle@server14]$ sqlplus / as sysdba

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
WITH SESSION SHUTDOWN;

SQL> SELECT database_role, open_mode FROM v$database;

DATABASE_ROLE    OPEN_MODE
---------------- --------------------
PHYSICAL STANDBY MOUNTED

You can also specify the start mode as a parameter to the SRVCTL START DATABASE command

For example:

[oracle@server14] srvctl start database -d PROD -o open
[oracle@server14] srvctl start database -d PROD -o mount

Take care when performing a switchover or switchback that the OCR is updated as part of the procedure.


Configuring an X environment for Grid Infrastructure / RAC installation using VNC

12 October 2010

The Oracle Universal Installer requires an X environment to run on Linux servers. Several products are available to support this environment; for the last couple of years I have been using VNC.  In Oracle 11.2. and above the configuration of VNC is slightly more complex if you follow the recommendation to install Grid Infrastructure under a different operating system user (grid) to the Oracle RAC software (typically oracle). 

VNC is a free client/server product in which a VNC server session runs on the server and a VNC Viewer runs on the client.

 On the server, VNC is supplied with the operating system in Red Hat and Oracle Enterprise Linux distributions. For OEL5R5 the VNC server RPM is vnc-server-4.1.2-14.el5_3.1. In Linux VNC is implemented as a service. You can check if it is currently installed using:

[root@server14]# rpm -q vnc-server

If the package is not installed then install it from the Linux distribution. For example:

[root@server14]# rpm -ivh vnc-server-4.1.2-14.el5_3.1.rpm

Check if the vncserver service is currently running. For example:

[root@server14]# service vncserver status
Xvnc is stopped

If the vncserver service is not currently running then it can be started using:

[root@server14]# service vncserver start

If you receive the following message then configuration of VNC on the server is required.

Starting VNC server: no displays configured     [ OK ]

To configure VNC on the server, login as root and add the following lines to /etc/sysconfig/vncservers

VNCSERVERS="1:grid 2:oracle
VNCSERVERARGS[1]="-geometry 1024x768"
VNCSERVERARGS[2]="-geometry 1024x768"

Set the geometry to reflect the display size available on the client. Note that the server number (in this case 1) must be specified when connecting from the client. For example server14:1

Login as the grid user and run vncpasswd:

[grid@server14]$ vncpasswd
Password: <enter password>
Verify: <enter password again>

Note that the password must be a minimum of six characters in length

Login as the oracle user and run vncpasswd:

[oracle@server14]$ vncpasswd
Password: <enter password>
Verify: <enter password again>

Login as root and start the VNC service:

[root@server14]# service vncserver start

Starting VNC server: 1: grid xauth: creating new authority file /home/grid/.XAuthority

New 'server14.example.com:1 (grid)' desktop is server14.example.com:1

Creating default startup script /home/grid/.vnc/xstartup
Starting applications specified in  /home/grid/.vnc/xstartup
Log file is /home/grid/.vnc/london1.example.com:1.log

Starting VNC server: 2: oracle xauth: creating new authority file /home/oracle/.XAuthority

New 'server14.example.com:2 (oracle)' desktop is server14.example.com:2

Creating default startup script /home/oracle/.vnc/xstartup
Starting applications specified in  /home/oracle/.vnc/xstartup
Log file is /home/oracle/.vnc/server14.example.com:2.log

Edit /home/grid/.vnc/xstartup and uncomment the following two lines:

unset SESSION_MANAGER
exec /etc/X11/xinit/xinitrc

Edit /home/oracle/.vnc/xstartup and uncomment the following two lines:

unset SESSION_MANAGER
exec /etc/X11/xinit/xinitrc

Login as root and restart the vncserver service:

[root@server14]# service vncserver restart

The vncserver service can be permanently enabled using:

[root@server14]# chkconfig vncserver on

The above command will start vncserver at run levels 2,3,4 and 5

For the client, VNC can be downloaded from a number of sources such as http://www.realvnc.com. At the time of this writing the current version was 4.1.3. Versions are available for Windows, Linux and several order UNIX-based operating systems.

For example for Windows based clients VNC can be downloaded as either a ZIP file or an executable file. In this example the zip file was called vnc-4_1_3-x86_win32.zip and contained a single executable file called vnc-4_1_3-x86_win32.exe. This executable runs the VNC Setup Wizard which allows you to install the VNC server, viewer or both.

In this example VNC server has been added to the Windows menus

Start > Programs > RealVNC > VNC Viewer 4 > Run VNC Viewer

Enter the server name when prompted e.g. server14:1 for the grid user or server14:2 for the oracle user. Enter the vnc password for the grid/oracle user.


Upgrading Oracle 11.2.0.1 RAC Clusters to 11.2.0.2 on Linux x86-64

10 October 2010

This post discusses an issue which you may hit if attempting to upgrade an Oracle 11.2.0.1 Linux x86-64 RAC cluster to Oracle 11.2.0.2.

The Grid Infrastructure upgrade must be downloaded from MOS (Patch Number 10098816) This download consists of seven zip files; the Grid Infrastructure file is called:

p10098816_112020_Linux-x86-64_3of7zip

Unzip the file into a staging directory e.g. /home/grid/stage; a subdirectory called grid will be created. Start the upgrade using

[grid@server14] $ cd /home/grid/stage/grid
[grid@server14] $ ./runInstaller

The OUI should detect that Grid Infrastructure has already been installed. The upgrade interview is very similar to installation, though obviously there are fewer options as many decisions cannot be changed without a complete reinstallation.

When you reach the Summary page, press Finish to end the interview process and to start the install.  The installer will copy files to the local node, relink them and distribute the relinked executables to the remote node(s). The installer will then prompt you to run the rootupgrade.sh script as the root user.

This is where we hit a problem. When we ran the rootupgrade.sh script on the first node it failed with the following error:

[root@server14]# /u01/app/11.2.0.2/grid/rootupgrade.sh
Running Oracle 11g root script...
The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0.2/grid  Enter the full pathname of the local bin directory:
[/usr/local/bin]:
The contents of "dbhome" have not changed.
No need to overwrite.
The file "oraenv" already exists in /usr/local/bin. 
Overwrite it? (y/n) [n]:
The file "coraenv" already exists in /usr/local/bin. 
Overwrite it? (y/n) [n]:  
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file:
/u01/app/11.2.0.2/grid/crs/install/crsconfig_params
Creating trace directory
Failed to add (property/value):('OLD_OCR_ID/'-1') for
checkpoint:ROOTCRS_OLDHOMEINFO.Error code is 256
The fixes for bug 9413827 are not present in the 11.2.0.1
crs home
Apply the patches for these bugs in the 11.2.0.1 crs home
and then run rootupgrade.sh
/u01/app/11.2.0.2/grid/perl/bin/perl
  -I/u01/app/11.2.0.2/grid/perl/lib
  -I/u01/app/11.2.0.2/grid/crs/install
  /u01/app/11.2.0.2/grid/crs/install/rootcrs.pl execution failed

Although the script is requesting fixes for 9413827, searches for this bug on MOS reveal that it is actually included in bug 9655006 which turns out to be the 11.2.0.1.2 PSU (not 11.2.0.1.1 – thanks Jason) which must be downloaded from MOS. The download file is

p9655006_112010_Linux-x86-64.zip

This zip file actually contains two patches, 9655006 and 9654983. We learned through bitter experience that the file must be unzipped into an EMPTY directory. PSUs are normally installed using the new opatch auto option which attempts to install all patches it encounters in the specified directory – if it sees any other directories it will assume they are patches and may fail with a fatal error if they are not.

For example if the PSU has been unzipped into the directory /home/grid/PSU, then the command to install the PSU might be:

/u01/app/11.2.0/grid/OPatch/opatch auto /home/grid/stage
  -oh /u01/app/11.2.0/grid

where the -oh parameter specifies the Oracle home for the Grid Infrastructure (in this case /u01/app/11.2.0/grid)

Prior to installing the PSU, check that you have the correct version of opatch (to include the auto option) which should be 11.2.0.1.3 or ablve. If you currently have Oracle 11.2.0.1, then opatch may be out of date.
Check using:

[grid@server14] $cd /u01/app/11.2.0/grid/OPatch
[grid@server14]$ opatch version
Invoking OPatch 11.1.0.6.6
OPatch Version: 11.1.0.6.6
OPatch succeeded

Download opatch from MOS. We used patch 6880880 and downloaded the following file:

p6880880_112000_Linux-x86-64.zip

We downloaded the file into the Grid Infrastructure home. When you unzip this file it overwrites the files in the existing OPatch directory. After unzipping the file the opatch utillity should have been upgrading. You can verify this using:

[grid@server14] $cd /u01/app/11.2.0/grid/OPatch
[grid@server14]$ opatch version
Invoking OPatch 11.2.0.1.3
OPatch Version: 11.2.0.1.3
OPatch succeeded

As discussed in other posts, we subsequently hit further issues when we tried to execute the rootupgrade.sh script on the second node.  These issues still being investigated at the time of writing appear to be connected with the new HAIP feature introduced in the Oracle 11.2.0.2 patch set.


Oracle 11.2.0.2 Patch Set

10 October 2010

The Oracle 11.2.0.2 patch set was released in mid-September 2010, just before Oracle OpenWorld 2010 in San Francisco.

This patch set has some significant differences to its predecessors. Most importantly Oracle 11.2.0.2 is a full installation of the Oracle software. In the past patch sets have contained a set of files that have replaced existing files in an existing Oracle home. This patch set is a full installation that replaces an entire existing installation. A consequence of this change is that Oracle 11.2.0.2 can be installed directly on a new server; it is no longer necessary to install the base release (Oracle 11.2.0.1) and then to apply the patch set.

Unlike previous versions there is new functionality in this release. Oracle Corporation is claiming to 34 new features in the following product areas:

  • General improvements
  • ASM Cluster File System (ACFS)
  • Quality of Service management
  • Database Replay
  • Management

It is also worth noting that the documentation has been updated for this release. It is always worth checking the latest documentation, even if you are using older versions as it often contains corrections and additions. The latest documentation can always be found at http://tahiti.oracle.com. Worth checking out is the New Features Guide (E17128-03) and the What’s New section at the front of the remaining manuals.

This article covers upgrade from Oracle 11.2.0.1 to Oracle 11.2.0.2 of a two-node Linux x86-64 cluster at a customer site. The database was not yet in production but we wanted to upgrade the Oracle software prior to user acceptance testing.

Oracle 11.2.0.1 is available in the download area of http://www.oracle.com. However, Oracle 11.2.0.2 can only be downloaded from My Oracle Support so you will need a CSI number to obtain it. The patch number is 10098816. Additional information about the upgrade can be found in Note 1189783.1 – Important Changes to Oracle Database Patch Sets Starting with 11.2.0.2. Also worth reviewing is Note 1179474.1 – Known Issues specific to the 11.2.0.2 Patch Set.

At the time of writing (early October 2010), Oracle 11.2.0.2 was available for:

  • Linux x86 32-bit
  • Linux x86-64 64-bit
  • Solaris SPARC 64-bit
  • Solaris x86-64 64bit
  • Solaris SPARC 32-bit (client-only)
  • Solaris x86 32-bit (client-only)

 Oracle 11.2.0.2 patch sets for AIX, HP/UX and Windows are still awaited.

The Linux x86-64 patch set consists of seven files:

p10098816_112020_Linux-x86-64_1of7zip 1247MB Database 1 of 2
p10098816_112020_Linux-x86-64_2of7zip 1001MB Database 2 of 2
p10098816_112020_Linux-x86-64_3of7zip  815MB Grid Infrastructure
p10098816_112020_Linux-x86-64_4of7zip  612MB Client
p10098816_112020_Linux-x86-64_5of7zip  563MB Gateways
p10098816_112020_Linux-x86-64_6of7zip  458MB Examples
p10098816_112020_Linux-x86-64_7of7zip  105MB Deinstaller

Oracle Grid Infrastructure is an out of place upgrade. This means that a new Oracle home will be created by the OUI for the Oracle Grid Infrastructure binaries. At the end of the upgrade you will be prompted to run the rootupgrade.sh script as the root user on each node in the cluster. The rootupgrade.sh script copies all configuration files etc from the old home to the new home.

 The Oracle RAC/RDBMS software can be upgraded either out-of-place or in-place. Oracle recommends that an out-of-place upgrade is performed.

The first step in the Grid Infrastructure OUI installation is now the Download Software Updates page. This screen presents three options to download recommended patches including patchset updates (PSUs) and one-offs 

  • Use My Oracle Support credentials for download (a username and password is required)
  • Use pre-downloaded software updates
  • Skip software updates

If you wish to download these updates, you may need to configure a proxy internet connection for the server or to download them to another server and copy them across your internal network.


Follow

Get every new post delivered to your Inbox.

Join 32 other followers