Tag: Exadata Storage Cell

  • Step By Step Exadata Storage Cell Rescue Process

    Step By Step Exadata Storage Cell Rescue Process

     
    You will end up performing storage cell rescue under the following situations:

    • Improper Battery Replacement
    • Improper Card Seating
    • Card Damage During Battery Replacement
    • Corrupted Root File System
    In this article we will demonstrate step by step process to Rescue an Exadata Storage Cell or server.
     
    Open a browser and enter the ILOM hostname or IP address of the Storage cell you want to rescue
    https://dm01cel02-ilom.netsoftmate.com
     
    Enter root crendentials

     
    On the left pane under “Remote Control”, click “Redirection”. Select “Use video redirection” and click “Launch Remote Console” button

     
    Click OK
     
     Click OK

     
    Click Continue

     
    Click Run

     
    Click Continue (not recommended)

     
    From the ILOM video console we can see that the root file system can’t be mounted due to corruption and it will be rebooted again in 60 seconds

     
    On the left pane under “Host Management” click on “Power Control”. From the drop down list Select “Power Cycle”

     
    Click Save

     
    Click OK

     
    Rebooting in progress

     
    Server is no rebooting

     
     
    Immediately press Ctrl+S on keyboard 

     
    Select the “CELL_USB_BOOT_CELLBOOT_usb_in_rescue_mode

     
    At the point, we will have continue the rescue process using serial ILOM

     
    As root, ssh to the storage cell ILOM and start the serial console

     
    Enter r and hit return

     
    Enter y and hit return

     
    Enter the rescue password sos1exadata. Enter n and hit return

     
    Enter the root user password 

     
    We are into the rescue mode. At this moment check to make sure that the there are no file system issue. Fix any other issue you may have. Consult Oracle if required
     
    Reboot the server again to complete the rescue process

     
    Hit return

     
    The server is powered off

     
    Power on the server using web ILOM as shown below

     
    Rescue process is completed and we got the root login prompt

     
     
    Login to the server as root user and perform the post rescue steps

      
    Verify the image version of the storage cell

     
     
    Post Storage Cell Rescue steps:
     
    [root@dm01cel02 ~]# imageinfo

    Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
    Cell version: OSS_18.1.7.0.0AUG_LINUX.X64_180821
    Cell rpm version: cell-18.1.7.0.0_LINUX.X64_180821-1.x86_64

    Active image version: 18.1.7.0.0.180821
    Active image kernel version: 4.1.12-94.8.4.el6uek
    Active image activated: 2019-03-17 03:27:41 -0500
    Active image status: success
    Active system partition on device: /dev/md5
    Active software partition on device: /dev/md7

    Cell boot usb partition: /dev/sdm1
    Cell boot usb version: 18.1.7.0.0.180821

    Inactive image version: undefined
    Rollback to the inactive partitions: Impossible

    CellCLI> import celldisk all force
    No cell disks qualified for this import operation

    CellCLI> list physicaldisk
             12:0            PST0XV          normal
             12:1            PZNDSV          normal
             12:2            PT5Z4V          normal
             12:3            PU3XLV          normal
             12:4            PYAKLV          normal
             12:5            PV828V          normal
             12:6            PZE5NV          normal
             12:7            PYV0YV          normal
             12:8            PZKUXV          normal
             12:9            PYD86V          normal
             12:10           PZL15V          normal
             12:11           PZPLAV          normal
             FLASH_1_1       S2T7NCAHA00958  normal
             FLASH_2_1       S2T7NCAHA00986  normal
             FLASH_4_1       S2T7NCAHA00956  normal
             FLASH_5_1       S2T7NCAHA00947  normal

    CellCLI> list celldisk
             CD_00_dm01cel02        normal
             CD_01_dm01cel02        normal
             CD_02_dm01cel02        normal
             CD_03_dm01cel02        normal
             CD_04_dm01cel02        normal
             CD_05_dm01cel02        normal
             CD_06_dm01cel02        normal
             CD_07_dm01cel02        normal
             CD_08_dm01cel02        normal
             CD_09_dm01cel02        normal
             CD_10_dm01cel02        normal
             CD_11_dm01cel02        normal
             FD_00_dm01cel02        normal
             FD_01_dm01cel02        normal
             FD_02_dm01cel02        normal
             FD_03_dm01cel02        normal

    CellCLI> list griddisk
             DATA_DM01_CD_00_dm01cel02     active
             DATA_DM01_CD_01_dm01cel02     active
             DATA_DM01_CD_02_dm01cel02     active
             DATA_DM01_CD_03_dm01cel02     active
             DATA_DM01_CD_04_dm01cel02     active
             DATA_DM01_CD_05_dm01cel02     active
             DATA_DM01_CD_06_dm01cel02     active
             DATA_DM01_CD_07_dm01cel02     active
             DATA_DM01_CD_08_dm01cel02     active
             DATA_DM01_CD_09_dm01cel02     active
             DATA_DM01_CD_10_dm01cel02     active
             DATA_DM01_CD_11_dm01cel02     active
             DBFS_DG_CD_02_dm01cel02       active
             DBFS_DG_CD_03_dm01cel02       active
             DBFS_DG_CD_04_dm01cel02       active
             DBFS_DG_CD_05_dm01cel02       active
             DBFS_DG_CD_06_dm01cel02       active
             DBFS_DG_CD_07_dm01cel02       active
             DBFS_DG_CD_08_dm01cel02       active
             DBFS_DG_CD_09_dm01cel02       active
             DBFS_DG_CD_10_dm01cel02       active
             DBFS_DG_CD_11_dm01cel02       active
             RECO_DM01_CD_00_dm01cel02     active
             RECO_DM01_CD_01_dm01cel02     active
             RECO_DM01_CD_02_dm01cel02     active
             RECO_DM01_CD_03_dm01cel02     active
             RECO_DM01_CD_04_dm01cel02     active
             RECO_DM01_CD_05_dm01cel02     active
             RECO_DM01_CD_06_dm01cel02     active
             RECO_DM01_CD_07_dm01cel02     active
             RECO_DM01_CD_08_dm01cel02     active
             RECO_DM01_CD_09_dm01cel02     active
             RECO_DM01_CD_10_dm01cel02     active
             RECO_DM01_CD_11_dm01cel02     active

    [root@dm01cel02 ~]# cellcli -e list flashcache detail
             name:                   dm01cel02_FLASHCACHE
             cellDisk:               FD_03_dm01cel02,FD_01_dm01cel02,FD_02_dm01cel02,FD_00_dm01cel02
             creationTime:           2019-03-17T03:19:43-05:00
             degradedCelldisks:
             effectiveCacheSize:     11.64312744140625T
             id:                     574c3bd1-7a35-42ba-a03b-75f3a93edac7
             size:                   11.64312744140625T
             status:                 normal

    [root@dm01cel02 ~]# cellcli -e list flashlog detail
             name:                   dm01cel02_FLASHLOG
             cellDisk:               FD_03_dm01cel02,FD_00_dm01cel02,FD_01_dm01cel02,FD_02_dm01cel02
             creationTime:           2019-03-17T03:19:43-05:00
             degradedCelldisks:
             effectiveSize:          512M
             efficiency:             100.0
             id:                     73cd8288-c6d8-42c3-95a1-97ce287cf7d0
             size:                   512M
             status:                 normal

     
    SQL> select a.name,b.path,b.state,b.mode_status,b.failgroup
        from v$asm_diskgroup a, v$asm_disk b
        where a.group_number=b.group_number
        and b.failgroup=’dm01cel02′
        order by 2,1;

    no rows selected

    SQL> alter diskgroup DBFS_DG add disk ‘o/192.168.1.1;192.168.1.2/DBFS_DG_*_dm01cel02’ force;

    Diskgroup altered.

     

    SQL> alter diskgroup DATA_DM01 add disk ‘o/192.168.1.1;192.168.1.2/DATA_DM01_*_dm01cel02’ force;

    Diskgroup altered.

     

    SQL> alter diskgroup RECO_DM01 add disk ‘o/192.168.1.1;192.168.1.2/RECO_DM01_*_dm01cel02’ force;

    Diskgroup altered.


     
    SQL> select * from v$asm_operation;

    GROUP_NUMBER OPERA STAT      POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
    ———— —– —- ———- ———- ———- ———- ———- ———– ——————————————–
               1 REBAL RUN           4          4     204367    3521267      13041         254
               3 REBAL WAIT          4

     

    SQL> select * from v$asm_operation;

    no rows selected

    SQL> col path for a70
    SQL> set lines 200
    SQL> set pages 200
    SQL> select a.name,b.path,b.state,b.mode_status,b.failgroup
        from v$asm_diskgroup a, v$asm_disk b
        where a.group_number=b.group_number
        and b.failgroup=’dm01cel02′
        order by 2,1;  2    3    4    5

    NAME                           PATH                                                                   STATE    MODE_ST FAILGROUP
    —————————— ———————————————————————- ——– ——- ——————————
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_00_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_01_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_02_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_03_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_04_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_05_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_06_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_07_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_08_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_09_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_10_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_11_dm01cel02              NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_02_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_03_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_04_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_05_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_06_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_07_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_08_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_09_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_10_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_11_dm01cel02                 NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_00_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_01_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_02_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_03_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_04_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_05_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_06_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_07_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_08_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_09_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_10_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_11_dm01cel02              NORMAL   ONLINE  dm01cel02

    34 rows selected.
     

     
    Conclusion
     
    In this article we have demonstrated step by step procedure to perform Storage Cell Rescue. You may have to perform the Storage cell rescue for multiple reasons such as root file system corrupted, Kernel panic, server rebooting continuously and so on. With the help of CELLBOOT usb one can perform the storage cell rescue very easily.
     
  • Oracle Exadata Database Machine Health Check – Exachk 18c

    Oracle has released Exachk utility 18c on May 18th, 2018. Let’s quickly check if there are differences in Exachk 18c or it is similar to Exachk 12c.


    Download latest Exachk 18c utility from MOS note:
    Oracle Exadata Database Machine exachk or HealthCheck (Doc ID 1070954.1)


    Changes in Exachk 18.2 can be found at:
    https://docs.oracle.com/cd/E96145_01/OEXUG/changes-in-this-release-18-2-0.htm#OEXUG-GUID-88FCFBC6-C647-47D3-898C-F4C712117B8B


    Steps to Execute Exachk 18c on Exadata Database Machine



    Download the latest Exachk from MOS note. Here I am downloading Exachk 18c.

    Download Completed

    Using WinSCP copy the exachk.zip file to Exadata Compute node

    Copy completed. List the Exachk file on Compute node

    Unzip the Exachk zip file

    Verify Exachk version

    Execute Exachk Health by running the following command

    Exachk execution completed

    Review the Exachk report and take necessary action


    Conclusion
    In this article we have learned how to execute Oracle Exadata Database Machine health Check using Exachk 18c. Using Exachk 18c is NO different than it’s previous releases.

  • Exadata Pocket Reference

    Here is the Link to the Exadata Pocket Reference. Click on the Link to Download the file.

    Exadata Pocket Reference

  • Exadata – Configure Compute Node and Storage Cell SMTP Email Notification

    On Exadata Database Machine you can configure the following Compute Node and Storage Cell attributes to setup the database server and Storage cells to send notifications about alerts.
    • smtpServer
    • smtpFrom
    • smtpFromAddr
    • smtpToAddr
    • snmpSubscriber
    • notificationMethod
    • notificationPolicy

    In this article we will demonstrate how to setup the database server and Storage cells to send notifications about alerts

    Compute Nodes:




    Configure Compute nodes SMTP email notification for alerts. This can be accomplished using dbmcli alter dbserver command




    # Compute node 1

    DBMCLI>alter dbserver smtpFrom=’Exadata – dm01db01′



    DBMCLI>alter dbserver smtpFromAddr=’dbmadmin@dm01db01.netsoftmate.com’
    DBMCLI>alter dbserver smtpToAddr=’oradba@netsoftmate.com’
    DBMCLI>alter dbserver smtpServer=’smtp.server’
    DBMCLI>alter dbserver snmpSubscriber=((host=192.168.10.1,port=162,community=public,type=ASR))
    DBMCLI>alter dbserver notificationPolicy=’critical,warning,clear’
    DBMCLI>alter dbserver notificationMethod=’mail,snmp’
    DBMCLI>alter dbserver validate mail

    Or you can use the following command

    DBMCLI>alter dbserver smtpFrom=’Exadata – dm01db01′, smtpFromAddr=’dbmadmin@dm01db01.netsoftmate.com’, smtpToAddr=’oradba@netsoftmate.com’, smtpServer=’smtp.server’, snmpSubscriber=’host=192.168.10.1,port=162,community=public,type=ASR’, notificationPolicy=’critical,warning,clear’, notificationMethod=’mail,snmp’

    DBMCLI>alter dbserver validate mail




    *** Repeat the above step for all the Compute nodes in the cluster.

    # verify

    # dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | grep smtpFrom”



    # dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | grep smtpFromAddr”
    # dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | grep smtpToAddr”
    # dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | grep smtpServer”
    # dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | grep notificationMethod”

    or use the following command




    # dcli -g ~/dbs_group -l root “dbmcli -e ‘list dbserver detail’ | egrep ‘(smtpFrom|smtpFromAddr|smtpToAddr|smtpServer|notificationMethod)’”







    Storage Cells:



    Configure Compute nodes SMTP email notification for alerts. This can be accomplished using cellcli alter cell command



    # Storage Cell 01

    CELLCLI>alter cell smtpFrom=’Exadata – dm01cel01′



    CELLCLI>alter cell smtpFromAddr=’celladmin@dm01cel01.netsoftmate.com’
    CELLCLI>alter cell smtpToAddr=’oradba@netsoftmate.com’
    CELLCLI>alter cell smtpServer=’smtp.server’
    CELLCLI>alter cell snmpSubscriber=((host=192.168.10.1,port=162,community=public,type=ASR))
    CELLCLI>alter dbserver notificationPolicy=’critical,warning,clear’
    CELLCLI>alter cell notificationMethod=’mail,snmp’
    CELLCLI>alter cell validate mail

    or you can also use the following command




    CELLCLI>alter cell smtpFrom=’Exadata – dm01cel01′, smtpFromAddr=’celladmin@dm01cel01.netsoftmate.com’, smtpToAddr=’oradba@netsoftmate.com’, smtpServer=’smtp.server’, notificationMethod=’mail,snmp’

    CELLCLI>alter cell validate mail



    # Verify



    # dcli -g ~/cell_group -l root “dbmcli -e ‘list cell detail’ | grep smtpFrom”



    # dcli -g ~/cell_group -l root “dbmcli -e ‘list cell detail’ | grep smtpFromAddr”
    # dcli -g ~/cell_group -l root “dbmcli -e ‘list cell detail’ | grep smtpToAddr”
    # dcli -g ~/cell_group -l root “dbmcli -e ‘list cell detail’ | grep smtpServer”
    # dcli -g ~/cell_group -l root “dbmcli -e ‘list cell detail’ | grep notificationMethod”

    or you can use the following command




    # dcli -g ~/cell_group -l root “cellcli -e ‘list cell detail’ | egrep ‘(smtpFrom|smtpFromAddr|smtpToAddr|smtpServer|notificationMethod)’”





    *** Repeat the above step for all the Storage Cells in the cluster.





    Conclusion





    In this article we have learned how to Configure Compute nodes and Storage Cell SMTP email notification for alerts.



  • Exadata – Replace Failed Internal USB Drive on Exadata Storage Cell

    While working on Exadata Storage cell patching, the patching failed due to failed internal USB drive on a storage cell.
    Oracle uses internal USB drive to backup Exadata Storage cell automatically. We don’t have to backup Storage cell manually.
    In this article I will demonstrate how to replace a failed USB drive an Exadata Storage cell



    • You will receive an automated smtp alert (if configured) similar to below.
    • You can also use the following command to check for USB drive failure
    [root@dm01cel01 ~]# cellcli -e list alerthistory
             1_1     2018-04-10T18:25:42-05:00       warning         “Internal USB status is not present.  Affected USB Slots : 0”
    • You can also use the following ILOM command to check for USB drive failure
    [root@dm01cel01 ~]# ssh dm01cel01-ilom
    Password: *******
    Oracle(R) Integrated Lights Out Manager



    Version 3.2.10.22.a r121524



    Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.

    Warning: HTTPS certificate is set to factory default.



    Hostname: dm01cel01-ilom



    -> show /SYS/MB/USB0



    • Open an SR with Oracle if an ASR is already generated
    • Upload sundgia.sh and ILOM Snapshot to the SR for investigation
    • Oracle confirms the that USB drive is faulted
    • Oracle opens a Field task
    • Oracle dispatch team contacts the SR owner with the hardware dispatch details
    • Confirm the Hardware replacement schedule over email and/or SR
    • Schedule the Hardware replacement
    • Oracle FE arrives at the data center with the new USB drive
    • Shutdown the storage cell by following the steps from the MOS below
    Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)
    • Oracle FE replaces the faulty USB drive and power up the storage cell
    • Confirm that the USB drive is good
    -> show /SYS/MB/USB0
     /SYS/MB/USB0



        Targets:
        Properties:



            type = USB Port
            fault_state = OK
            clear_fault_action = (none)
        Commands:



            cd
            set
            show
    ->



    [root@dm01cel01 ~]# cellcli -e list alerthistory



             1_2     2018-04-11T02:45:49-05:00       clear           “Internal USB status is back to normal.  Affected USB Slots : 0”
    • You will receive an automated smtp alert (if configured) similar to below that the USB status is back to normal



    Conclusion
    In this article we have learned how to replace a faulty USB drive in Exadata Storage cell. Oracle uses USB drive to backup Exadata Storage cell automatically. We don’t have to backup Storage cell manually.