Good Contents Are Everywhere, But Here, We Deliver The Best of The Best.Please Hold on!
Database Management Services, Oracle Database Management Solution, Oracle Databases, Oracle Exadata
 
You will end up performing storage cell rescue under the following situations:

  • Improper Battery Replacement
  • Improper Card Seating
  • Card Damage During Battery Replacement
  • Corrupted Root File System

In this article we will demonstrate step by step process to Rescue an Exadata Storage Cell or server.
 
Open a browser and enter the ILOM hostname or IP address of the Storage cell you want to rescue
https://dm01cel02-ilom.netsoftmate.com
 
Enter root crendentials

 
On the left pane under “Remote Control”, click “Redirection”. Select “Use video redirection” and click “Launch Remote Console” button

 
Click OK
 
 Click OK

 
Click Continue

 
Click Run

 
Click Continue (not recommended)

 
From the ILOM video console we can see that the root file system can’t be mounted due to corruption and it will be rebooted again in 60 seconds

 
On the left pane under “Host Management” click on “Power Control”. From the drop down list Select “Power Cycle”

 
Click Save

 
Click OK

 
Rebooting in progress

 
Server is no rebooting

 
 
Immediately press Ctrl+S on keyboard 

 
Select the “CELL_USB_BOOT_CELLBOOT_usb_in_rescue_mode

 
At the point, we will have continue the rescue process using serial ILOM

 
As root, ssh to the storage cell ILOM and start the serial console

 
Enter r and hit return

 
Enter y and hit return

 
Enter the rescue password sos1exadata. Enter n and hit return

 
Enter the root user password 

 
We are into the rescue mode. At this moment check to make sure that the there are no file system issue. Fix any other issue you may have. Consult Oracle if required
 
Reboot the server again to complete the rescue process

 
Hit return

 
The server is powered off

 
Power on the server using web ILOM as shown below

 
Rescue process is completed and we got the root login prompt

 
 
Login to the server as root user and perform the post rescue steps

  
Verify the image version of the storage cell

 
 
Post Storage Cell Rescue steps:
 
[root@dm01cel02 ~]# imageinfo

Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
Cell version: OSS_18.1.7.0.0AUG_LINUX.X64_180821
Cell rpm version: cell-18.1.7.0.0_LINUX.X64_180821-1.x86_64

Active image version: 18.1.7.0.0.180821
Active image kernel version: 4.1.12-94.8.4.el6uek
Active image activated: 2019-03-17 03:27:41 -0500
Active image status: success
Active system partition on device: /dev/md5
Active software partition on device: /dev/md7

Cell boot usb partition: /dev/sdm1
Cell boot usb version: 18.1.7.0.0.180821

Inactive image version: undefined
Rollback to the inactive partitions: Impossible


CellCLI> import celldisk all force
No cell disks qualified for this import operation

CellCLI> list physicaldisk
         12:0            PST0XV          normal
         12:1            PZNDSV          normal
         12:2            PT5Z4V          normal
         12:3            PU3XLV          normal
         12:4            PYAKLV          normal
         12:5            PV828V          normal
         12:6            PZE5NV          normal
         12:7            PYV0YV          normal
         12:8            PZKUXV          normal
         12:9            PYD86V          normal
         12:10           PZL15V          normal
         12:11           PZPLAV          normal
         FLASH_1_1       S2T7NCAHA00958  normal
         FLASH_2_1       S2T7NCAHA00986  normal
         FLASH_4_1       S2T7NCAHA00956  normal
         FLASH_5_1       S2T7NCAHA00947  normal

CellCLI> list celldisk
         CD_00_dm01cel02        normal
         CD_01_dm01cel02        normal
         CD_02_dm01cel02        normal
         CD_03_dm01cel02        normal
         CD_04_dm01cel02        normal
         CD_05_dm01cel02        normal
         CD_06_dm01cel02        normal
         CD_07_dm01cel02        normal
         CD_08_dm01cel02        normal
         CD_09_dm01cel02        normal
         CD_10_dm01cel02        normal
         CD_11_dm01cel02        normal
         FD_00_dm01cel02        normal
         FD_01_dm01cel02        normal
         FD_02_dm01cel02        normal
         FD_03_dm01cel02        normal

CellCLI> list griddisk
         DATA_DM01_CD_00_dm01cel02     active
         DATA_DM01_CD_01_dm01cel02     active
         DATA_DM01_CD_02_dm01cel02     active
         DATA_DM01_CD_03_dm01cel02     active
         DATA_DM01_CD_04_dm01cel02     active
         DATA_DM01_CD_05_dm01cel02     active
         DATA_DM01_CD_06_dm01cel02     active
         DATA_DM01_CD_07_dm01cel02     active
         DATA_DM01_CD_08_dm01cel02     active
         DATA_DM01_CD_09_dm01cel02     active
         DATA_DM01_CD_10_dm01cel02     active
         DATA_DM01_CD_11_dm01cel02     active
         DBFS_DG_CD_02_dm01cel02       active
         DBFS_DG_CD_03_dm01cel02       active
         DBFS_DG_CD_04_dm01cel02       active
         DBFS_DG_CD_05_dm01cel02       active
         DBFS_DG_CD_06_dm01cel02       active
         DBFS_DG_CD_07_dm01cel02       active
         DBFS_DG_CD_08_dm01cel02       active
         DBFS_DG_CD_09_dm01cel02       active
         DBFS_DG_CD_10_dm01cel02       active
         DBFS_DG_CD_11_dm01cel02       active
         RECO_DM01_CD_00_dm01cel02     active
         RECO_DM01_CD_01_dm01cel02     active
         RECO_DM01_CD_02_dm01cel02     active
         RECO_DM01_CD_03_dm01cel02     active
         RECO_DM01_CD_04_dm01cel02     active
         RECO_DM01_CD_05_dm01cel02     active
         RECO_DM01_CD_06_dm01cel02     active
         RECO_DM01_CD_07_dm01cel02     active
         RECO_DM01_CD_08_dm01cel02     active
         RECO_DM01_CD_09_dm01cel02     active
         RECO_DM01_CD_10_dm01cel02     active
         RECO_DM01_CD_11_dm01cel02     active


[root@dm01cel02 ~]# cellcli -e list flashcache detail
         name:                   dm01cel02_FLASHCACHE
         cellDisk:               FD_03_dm01cel02,FD_01_dm01cel02,FD_02_dm01cel02,FD_00_dm01cel02
         creationTime:           2019-03-17T03:19:43-05:00
         degradedCelldisks:
         effectiveCacheSize:     11.64312744140625T
         id:                     574c3bd1-7a35-42ba-a03b-75f3a93edac7
         size:                   11.64312744140625T
         status:                 normal

[root@dm01cel02 ~]# cellcli -e list flashlog detail
         name:                   dm01cel02_FLASHLOG
         cellDisk:               FD_03_dm01cel02,FD_00_dm01cel02,FD_01_dm01cel02,FD_02_dm01cel02
         creationTime:           2019-03-17T03:19:43-05:00
         degradedCelldisks:
         effectiveSize:          512M
         efficiency:             100.0
         id:                     73cd8288-c6d8-42c3-95a1-97ce287cf7d0
         size:                   512M
         status:                 normal
 
SQL> select a.name,b.path,b.state,b.mode_status,b.failgroup
    from v$asm_diskgroup a, v$asm_disk b
    where a.group_number=b.group_number
    and b.failgroup=’dm01cel02′
    order by 2,1;

no rows selected


SQL> alter diskgroup DBFS_DG add disk ‘o/192.168.1.1;192.168.1.2/DBFS_DG_*_dm01cel02’ force;

Diskgroup altered.

 
SQL> alter diskgroup DATA_DM01 add disk ‘o/192.168.1.1;192.168.1.2/DATA_DM01_*_dm01cel02’ force;

Diskgroup altered.

 
SQL> alter diskgroup RECO_DM01 add disk ‘o/192.168.1.1;192.168.1.2/RECO_DM01_*_dm01cel02’ force;

Diskgroup altered.

 
SQL> select * from v$asm_operation;

GROUP_NUMBER OPERA STAT      POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
———— —– —- ———- ———- ———- ———- ———- ———– ——————————————–
           1 REBAL RUN           4          4     204367    3521267      13041         254
           3 REBAL WAIT          4


 
SQL> select * from v$asm_operation;

no rows selected


SQL> col path for a70
SQL> set lines 200
SQL> set pages 200
SQL> select a.name,b.path,b.state,b.mode_status,b.failgroup
    from v$asm_diskgroup a, v$asm_disk b
    where a.group_number=b.group_number
    and b.failgroup=’dm01cel02′
    order by 2,1;  2    3    4    5

NAME                           PATH                                                                   STATE    MODE_ST FAILGROUP
—————————— ———————————————————————- ——– ——- ——————————
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_00_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_01_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_02_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_03_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_04_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_05_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_06_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_07_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_08_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_09_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_10_dm01cel02              NORMAL   ONLINE  dm01cel02
DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_11_dm01cel02              NORMAL   ONLINE  dm01cel02
DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_02_dm01cel02                 NORMAL   ONLINE  dm01cel02
DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_03_dm01cel02                 NORMAL   ONLINE  dm01cel02
DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_04_dm01cel02                 NORMAL   ONLINE  dm01cel02
DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_05_dm01cel02                 NORMAL   ONLINE  dm01cel02
DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_06_dm01cel02                 NORMAL   ONLINE  dm01cel02
DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_07_dm01cel02                 NORMAL   ONLINE  dm01cel02
DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_08_dm01cel02                 NORMAL   ONLINE  dm01cel02
DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_09_dm01cel02                 NORMAL   ONLINE  dm01cel02
DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_10_dm01cel02                 NORMAL   ONLINE  dm01cel02
DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_11_dm01cel02                 NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_00_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_01_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_02_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_03_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_04_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_05_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_06_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_07_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_08_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_09_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_10_dm01cel02              NORMAL   ONLINE  dm01cel02
RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_11_dm01cel02              NORMAL   ONLINE  dm01cel02

34 rows selected.

 
 
Conclusion
 
In this article we have demonstrated step by step procedure to perform Storage Cell Rescue. You may have to perform the Storage cell rescue for multiple reasons such as root file system corrupted, Kernel panic, server rebooting continuously and so on. With the help of CELLBOOT usb one can perform the storage cell rescue very easily.
 
0

Oracle has released Exachk utility 18c on May 18th, 2018. Let’s quickly check if there are differences in Exachk 18c or it is similar to Exachk 12c.

Download latest Exachk 18c utility from MOS note:
Oracle Exadata Database Machine exachk or HealthCheck (Doc ID 1070954.1)

Changes in Exachk 18.2 can be found at:
https://docs.oracle.com/cd/E96145_01/OEXUG/changes-in-this-release-18-2-0.htm#OEXUG-GUID-88FCFBC6-C647-47D3-898C-F4C712117B8B

Steps to Execute Exachk 18c on Exadata Database Machine


Download the latest Exachk from MOS note. Here I am downloading Exachk 18c.

Download Completed

Using WinSCP copy the exachk.zip file to Exadata Compute node



Copy completed. List the Exachk file on Compute node

Unzip the Exachk zip file

Verify Exachk version

Execute Exachk Health by running the following command

Exachk execution completed

Review the Exachk report and take necessary action



Conclusion
In this article we have learned how to execute Oracle Exadata Database Machine health Check using Exachk 18c. Using Exachk 18c is NO different than it’s previous releases.

0

On Exadata Database Machine you can configure the following Compute Node and Storage Cell attributes to setup the database server and Storage cells to send notifications about alerts.
  • smtpServer
  • smtpFrom
  • smtpFromAddr
  • smtpToAddr
  • snmpSubscriber
  • notificationMethod
  • notificationPolicy

In this article we will demonstrate how to setup the database server and Storage cells to send notifications about alerts

Compute Nodes:


Configure Compute nodes SMTP email notification for alerts. This can be accomplished using dbmcli alter dbserver command


# Compute node 1


DBMCLI>alter dbserver smtpFrom=’Exadata – dm01db01′

DBMCLI>alter dbserver smtpFromAddr=’dbmadmin@dm01db01.netsoftmate.com’
DBMCLI>alter dbserver smtpToAddr=’oradba@netsoftmate.com’
DBMCLI>alter dbserver smtpServer=’smtp.server’
DBMCLI>alter dbserver snmpSubscriber=((host=192.168.10.1,port=162,community=public,type=ASR))
DBMCLI>alter dbserver notificationPolicy=’critical,warning,clear’
DBMCLI>alter dbserver notificationMethod=’mail,snmp’
DBMCLI>alter dbserver validate mail


Or you can use the following command


DBMCLI>alter dbserver smtpFrom=’Exadata – dm01db01′, smtpFromAddr=’dbmadmin@dm01db01.netsoftmate.com’, smtpToAddr=’oradba@netsoftmate.com’, smtpServer=’smtp.server’, snmpSubscriber=’host=192.168.10.1,port=162,community=public,type=ASR’, notificationPolicy=’critical,warning,clear’, notificationMethod=’mail,snmp’


DBMCLI>alter dbserver validate mail


*** Repeat the above step for all the Compute nodes in the cluster.


# verify


# dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | grep smtpFrom”

# dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | grep smtpFromAddr”
# dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | grep smtpToAddr”
# dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | grep smtpServer”
# dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | grep notificationMethod”


or use the following command


# dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | egrep ‘(smtpFrom|smtpFromAddr|smtpToAddr|smtpServer|notificationMethod)'”




Storage Cells:

Configure Compute nodes SMTP email notification for alerts. This can be accomplished using cellcli alter cell command

# Storage Cell 01


CELLCLI>alter cell smtpFrom=’Exadata – dm01cel01′

CELLCLI>alter cell smtpFromAddr=’celladmin@dm01cel01.netsoftmate.com’
CELLCLI>alter cell smtpToAddr=’oradba@netsoftmate.com’
CELLCLI>alter cell smtpServer=’smtp.server’
CELLCLI>alter cell snmpSubscriber=((host=192.168.10.1,port=162,community=public,type=ASR))
CELLCLI>alter dbserver notificationPolicy=’critical,warning,clear’
CELLCLI>alter cell notificationMethod=’mail,snmp’
CELLCLI>alter cell validate mail


or you can also use the following command


CELLCLI>alter cell smtpFrom=’Exadata – dm01cel01′, smtpFromAddr=’celladmin@dm01cel01.netsoftmate.com’, smtpToAddr=’oradba@netsoftmate.com’, smtpServer=’smtp.server’, notificationMethod=’mail,snmp’


CELLCLI>alter cell validate mail

# Verify

# dcli -g ~/cell_group -l root “dbmcli -e ‘list cell detail’ | grep smtpFrom”

# dcli -g ~/cell_group -l root “dbmcli -e ‘list cell detail’ | grep smtpFromAddr”
# dcli -g ~/cell_group -l root “dbmcli -e ‘list cell detail’ | grep smtpToAddr”
# dcli -g ~/cell_group -l root “dbmcli -e ‘list cell detail’ | grep smtpServer”
# dcli -g ~/cell_group -l root “dbmcli -e ‘list cell detail’ | grep notificationMethod”


or you can use the following command


# dcli -g ~/cell_group -l root “cellcli -e ‘list cell detail’ | egrep ‘(smtpFrom|smtpFromAddr|smtpToAddr|smtpServer|notificationMethod)'”


*** Repeat the above step for all the Storage Cells in the cluster.



Conclusion

In this article we have learned how to Configure Compute nodes and Storage Cell SMTP email notification for alerts.



1

While working on Exadata Storage cell patching, the patching failed due to failed internal USB drive on a storage cell.

Oracle uses internal USB drive to backup Exadata Storage cell automatically. We don’t have to backup Storage cell manually.

In this article I will demonstrate how to replace a failed USB drive an Exadata Storage cell

  • You will receive an automated smtp alert (if configured) similar to below.



  • You can also use the following command to check for USB drive failure

[root@dm01cel01 ~]# cellcli -e list alerthistory
         1_1     2018-04-10T18:25:42-05:00       warning         “Internal USB status is not present.  Affected USB Slots : 0”

  • You can also use the following ILOM command to check for USB drive failure

[root@dm01cel01 ~]# ssh dm01cel01-ilom
Password: *******


Oracle(R) Integrated Lights Out Manager



Version 3.2.10.22.a r121524



Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.


Warning: HTTPS certificate is set to factory default.



Hostname: dm01cel01-ilom



-> show /SYS/MB/USB0


  • Open an SR with Oracle if an ASR is already generated
  • Upload sundgia.sh and ILOM Snapshot to the SR for investigation
  • Oracle confirms the that USB drive is faulted
  • Oracle opens a Field task
  • Oracle dispatch team contacts the SR owner with the hardware dispatch details
  • Confirm the Hardware replacement schedule over email and/or SR
  • Schedule the Hardware replacement
  • Oracle FE arrives at the data center with the new USB drive
  • Shutdown the storage cell by following the steps from the MOS below

Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)
  • Oracle FE replaces the faulty USB drive and power up the storage cell
  • Confirm that the USB drive is good

-> show /SYS/MB/USB0


 /SYS/MB/USB0

    Targets:


    Properties:

        type = USB Port
        fault_state = OK
        clear_fault_action = (none)


    Commands:

        cd
        set
        show


->



[root@dm01cel01 ~]# cellcli -e list alerthistory

         1_2     2018-04-11T02:45:49-05:00       clear           “Internal USB status is back to normal.  Affected USB Slots : 0”

  • You will receive an automated smtp alert (if configured) similar to below that the USB status is back to normal




Conclusion

In this article we have learned how to replace a faulty USB drive in Exadata Storage cell. Oracle uses USB drive to backup Exadata Storage cell automatically. We don’t have to backup Storage cell manually.

0