Tag: Exadata Storage Server

  • Step By Step Exadata Storage Cell Rescue Process

    Step By Step Exadata Storage Cell Rescue Process

     
    You will end up performing storage cell rescue under the following situations:

    • Improper Battery Replacement
    • Improper Card Seating
    • Card Damage During Battery Replacement
    • Corrupted Root File System
    In this article we will demonstrate step by step process to Rescue an Exadata Storage Cell or server.
     
    Open a browser and enter the ILOM hostname or IP address of the Storage cell you want to rescue
    https://dm01cel02-ilom.netsoftmate.com
     
    Enter root crendentials

     
    On the left pane under “Remote Control”, click “Redirection”. Select “Use video redirection” and click “Launch Remote Console” button

     
    Click OK
     
     Click OK

     
    Click Continue

     
    Click Run

     
    Click Continue (not recommended)

     
    From the ILOM video console we can see that the root file system can’t be mounted due to corruption and it will be rebooted again in 60 seconds

     
    On the left pane under “Host Management” click on “Power Control”. From the drop down list Select “Power Cycle”

     
    Click Save

     
    Click OK

     
    Rebooting in progress

     
    Server is no rebooting

     
     
    Immediately press Ctrl+S on keyboard 

     
    Select the “CELL_USB_BOOT_CELLBOOT_usb_in_rescue_mode

     
    At the point, we will have continue the rescue process using serial ILOM

     
    As root, ssh to the storage cell ILOM and start the serial console

     
    Enter r and hit return

     
    Enter y and hit return

     
    Enter the rescue password sos1exadata. Enter n and hit return

     
    Enter the root user password 

     
    We are into the rescue mode. At this moment check to make sure that the there are no file system issue. Fix any other issue you may have. Consult Oracle if required
     
    Reboot the server again to complete the rescue process

     
    Hit return

     
    The server is powered off

     
    Power on the server using web ILOM as shown below

     
    Rescue process is completed and we got the root login prompt

     
     
    Login to the server as root user and perform the post rescue steps

      
    Verify the image version of the storage cell

     
     
    Post Storage Cell Rescue steps:
     
    [root@dm01cel02 ~]# imageinfo

    Kernel version: 4.1.12-94.8.4.el6uek.x86_64 #2 SMP Sat May 5 16:14:51 PDT 2018 x86_64
    Cell version: OSS_18.1.7.0.0AUG_LINUX.X64_180821
    Cell rpm version: cell-18.1.7.0.0_LINUX.X64_180821-1.x86_64

    Active image version: 18.1.7.0.0.180821
    Active image kernel version: 4.1.12-94.8.4.el6uek
    Active image activated: 2019-03-17 03:27:41 -0500
    Active image status: success
    Active system partition on device: /dev/md5
    Active software partition on device: /dev/md7

    Cell boot usb partition: /dev/sdm1
    Cell boot usb version: 18.1.7.0.0.180821

    Inactive image version: undefined
    Rollback to the inactive partitions: Impossible

    CellCLI> import celldisk all force
    No cell disks qualified for this import operation

    CellCLI> list physicaldisk
             12:0            PST0XV          normal
             12:1            PZNDSV          normal
             12:2            PT5Z4V          normal
             12:3            PU3XLV          normal
             12:4            PYAKLV          normal
             12:5            PV828V          normal
             12:6            PZE5NV          normal
             12:7            PYV0YV          normal
             12:8            PZKUXV          normal
             12:9            PYD86V          normal
             12:10           PZL15V          normal
             12:11           PZPLAV          normal
             FLASH_1_1       S2T7NCAHA00958  normal
             FLASH_2_1       S2T7NCAHA00986  normal
             FLASH_4_1       S2T7NCAHA00956  normal
             FLASH_5_1       S2T7NCAHA00947  normal

    CellCLI> list celldisk
             CD_00_dm01cel02        normal
             CD_01_dm01cel02        normal
             CD_02_dm01cel02        normal
             CD_03_dm01cel02        normal
             CD_04_dm01cel02        normal
             CD_05_dm01cel02        normal
             CD_06_dm01cel02        normal
             CD_07_dm01cel02        normal
             CD_08_dm01cel02        normal
             CD_09_dm01cel02        normal
             CD_10_dm01cel02        normal
             CD_11_dm01cel02        normal
             FD_00_dm01cel02        normal
             FD_01_dm01cel02        normal
             FD_02_dm01cel02        normal
             FD_03_dm01cel02        normal

    CellCLI> list griddisk
             DATA_DM01_CD_00_dm01cel02     active
             DATA_DM01_CD_01_dm01cel02     active
             DATA_DM01_CD_02_dm01cel02     active
             DATA_DM01_CD_03_dm01cel02     active
             DATA_DM01_CD_04_dm01cel02     active
             DATA_DM01_CD_05_dm01cel02     active
             DATA_DM01_CD_06_dm01cel02     active
             DATA_DM01_CD_07_dm01cel02     active
             DATA_DM01_CD_08_dm01cel02     active
             DATA_DM01_CD_09_dm01cel02     active
             DATA_DM01_CD_10_dm01cel02     active
             DATA_DM01_CD_11_dm01cel02     active
             DBFS_DG_CD_02_dm01cel02       active
             DBFS_DG_CD_03_dm01cel02       active
             DBFS_DG_CD_04_dm01cel02       active
             DBFS_DG_CD_05_dm01cel02       active
             DBFS_DG_CD_06_dm01cel02       active
             DBFS_DG_CD_07_dm01cel02       active
             DBFS_DG_CD_08_dm01cel02       active
             DBFS_DG_CD_09_dm01cel02       active
             DBFS_DG_CD_10_dm01cel02       active
             DBFS_DG_CD_11_dm01cel02       active
             RECO_DM01_CD_00_dm01cel02     active
             RECO_DM01_CD_01_dm01cel02     active
             RECO_DM01_CD_02_dm01cel02     active
             RECO_DM01_CD_03_dm01cel02     active
             RECO_DM01_CD_04_dm01cel02     active
             RECO_DM01_CD_05_dm01cel02     active
             RECO_DM01_CD_06_dm01cel02     active
             RECO_DM01_CD_07_dm01cel02     active
             RECO_DM01_CD_08_dm01cel02     active
             RECO_DM01_CD_09_dm01cel02     active
             RECO_DM01_CD_10_dm01cel02     active
             RECO_DM01_CD_11_dm01cel02     active

    [root@dm01cel02 ~]# cellcli -e list flashcache detail
             name:                   dm01cel02_FLASHCACHE
             cellDisk:               FD_03_dm01cel02,FD_01_dm01cel02,FD_02_dm01cel02,FD_00_dm01cel02
             creationTime:           2019-03-17T03:19:43-05:00
             degradedCelldisks:
             effectiveCacheSize:     11.64312744140625T
             id:                     574c3bd1-7a35-42ba-a03b-75f3a93edac7
             size:                   11.64312744140625T
             status:                 normal

    [root@dm01cel02 ~]# cellcli -e list flashlog detail
             name:                   dm01cel02_FLASHLOG
             cellDisk:               FD_03_dm01cel02,FD_00_dm01cel02,FD_01_dm01cel02,FD_02_dm01cel02
             creationTime:           2019-03-17T03:19:43-05:00
             degradedCelldisks:
             effectiveSize:          512M
             efficiency:             100.0
             id:                     73cd8288-c6d8-42c3-95a1-97ce287cf7d0
             size:                   512M
             status:                 normal

     
    SQL> select a.name,b.path,b.state,b.mode_status,b.failgroup
        from v$asm_diskgroup a, v$asm_disk b
        where a.group_number=b.group_number
        and b.failgroup=’dm01cel02′
        order by 2,1;

    no rows selected

    SQL> alter diskgroup DBFS_DG add disk ‘o/192.168.1.1;192.168.1.2/DBFS_DG_*_dm01cel02’ force;

    Diskgroup altered.

     

    SQL> alter diskgroup DATA_DM01 add disk ‘o/192.168.1.1;192.168.1.2/DATA_DM01_*_dm01cel02’ force;

    Diskgroup altered.

     

    SQL> alter diskgroup RECO_DM01 add disk ‘o/192.168.1.1;192.168.1.2/RECO_DM01_*_dm01cel02’ force;

    Diskgroup altered.


     
    SQL> select * from v$asm_operation;

    GROUP_NUMBER OPERA STAT      POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
    ———— —– —- ———- ———- ———- ———- ———- ———– ——————————————–
               1 REBAL RUN           4          4     204367    3521267      13041         254
               3 REBAL WAIT          4

     

    SQL> select * from v$asm_operation;

    no rows selected

    SQL> col path for a70
    SQL> set lines 200
    SQL> set pages 200
    SQL> select a.name,b.path,b.state,b.mode_status,b.failgroup
        from v$asm_diskgroup a, v$asm_disk b
        where a.group_number=b.group_number
        and b.failgroup=’dm01cel02′
        order by 2,1;  2    3    4    5

    NAME                           PATH                                                                   STATE    MODE_ST FAILGROUP
    —————————— ———————————————————————- ——– ——- ——————————
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_00_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_01_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_02_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_03_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_04_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_05_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_06_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_07_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_08_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_09_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_10_dm01cel02              NORMAL   ONLINE  dm01cel02
    DATA_DM01                     o/192.168.1.1;192.168.1.2/DATA_DM01_CD_11_dm01cel02              NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_02_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_03_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_04_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_05_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_06_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_07_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_08_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_09_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_10_dm01cel02                 NORMAL   ONLINE  dm01cel02
    DBFS_DG                        o/192.168.1.1;192.168.1.2/DBFS_DG_CD_11_dm01cel02                 NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_00_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_01_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_02_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_03_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_04_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_05_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_06_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_07_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_08_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_09_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_10_dm01cel02              NORMAL   ONLINE  dm01cel02
    RECO_DM01                     o/192.168.1.1;192.168.1.2/RECO_DM01_CD_11_dm01cel02              NORMAL   ONLINE  dm01cel02

    34 rows selected.
     

     
    Conclusion
     
    In this article we have demonstrated step by step procedure to perform Storage Cell Rescue. You may have to perform the Storage cell rescue for multiple reasons such as root file system corrupted, Kernel panic, server rebooting continuously and so on. With the help of CELLBOOT usb one can perform the storage cell rescue very easily.
     
  • Create Cell Disk and Grid Disk Manually on Exadata

    We had a failed hard disk on a Exadata Storage cell X6-2. So we scheduled the Oracle Field Engineer to replace the bad disk. Oracle Field Engineer came onsite and replaced the faulty hard disk. Post hard disk replacement we found that the physical disk and luns are created successfully but the Cell disk and Grid disks were not created automatically. When a hard disk is replaced, the lun, cell disk and grid disks are created automatically and grid disks are added to ASM disk group for you without any manual intervention. In some odd cases, the Cell disk and grid disks are not created automatically, in those cases you must manually create the Cell disk, create the Grid disks with proper sizes and add them to the ASM disk group.


    In this article we will demonstrate how to create the Cell disk, Grid disks manually and add them to the respective ASM Disk Group.


    Environment

    • Exadata X6-2 Elastic Configuration
    • 4 Compute nodes and 6 Storage cells
    • Hard Disk Size: 8TB
    • 3 ASM Disk Group: DATA, RECO & DBFS_DG
    • Total Number of Grid disks: DATA – 72, RECO – 72 & DBFS_DG – 60

    Here the disk in the location 8:5 was back and replaced.


    Before Replacing Hard Disk:


    CellCLI> list physicaldisk
             8:0             PYJZKV                  normal
             8:1             PMU3LV                  normal
             8:2             P1Y2KV                  normal
             8:3             PYH48V                  normal
             8:4             PY7MAV                  normal
             8:5             PPZ47V                  not present
             8:6             PEJKHR                  normal
             8:7             PY4XSV                  normal
             8:8             PYL00V                  normal
             8:9             PV5RGV                  normal
             8:10            PSU26V                  normal
             8:11            PY522V                  normal
             FLASH_1_1       CVMD522500AG1P6NGN      normal
             FLASH_2_1       CVMD522401AC1P6NGN      normal
             FLASH_4_1       CVMD522500AC1P6NGN      normal
             FLASH_5_1       CVMD5230000Y1P6NGN      normal


    CellCLI> list lun
             0_0     0_0     normal
             0_1     0_1     normal
             0_2     0_2     normal
             0_3     0_3     normal
             0_4     0_4     normal
             0_5     0_5     not present
             0_6     0_6     normal
             0_7     0_7     normal
             0_8     0_8     normal
             0_9     0_9     normal
             0_10    0_10    normal
             0_11    0_11    normal
             1_1     1_1     normal
             2_1     2_1     normal
             4_1     4_1     normal
             5_1     5_1     normal


    After replacing Hard Disk:


    CellCLI> list physicaldisk
             8:0             PYJZKV                  normal
             8:1             PMU3LV                  normal
             8:2             P1Y2KV                  normal
             8:3             PYH48V                  normal
             8:4             PY7MAV                  normal
             8:5             PPZ47V                  normal
             8:6             PEJKHR                  normal
             8:7             PY4XSV                  normal
             8:8             PYL00V                  normal
             8:9             PV5RGV                  normal
             8:10            PSU26V                  normal
             8:11            PY522V                  normal
             FLASH_1_1       CVMD522500AG1P6NGN      normal
             FLASH_2_1       CVMD522401AC1P6NGN      normal
             FLASH_4_1       CVMD522500AC1P6NGN      normal
             FLASH_5_1       CVMD5230000Y1P6NGN      normal


    CellCLI> list lun
             0_0     0_0     normal
             0_1     0_1     normal
             0_2     0_2     normal
             0_3     0_3     normal
             0_4     0_4     normal
             0_5     0_5     normal
             0_6     0_6     normal
             0_7     0_7     normal
             0_8     0_8     normal
             0_9     0_9     normal
             0_10    0_10    normal
             0_11    0_11    normal
             1_1     1_1     normal
             2_1     2_1     normal
             4_1     4_1     normal
             5_1     5_1     normal


    [root@dm01cel03 ~]# cellcli -e list physicaldisk 8:5 detail
             name:                   8:5
             deviceId:               21
             deviceName:             /dev/sdf
             diskType:               HardDisk
             enclosureDeviceId:      8
             errOtherCount:          0
             luns:                   0_5
             makeModel:              “HGST    H7280A520SUN8.0T”
             physicalFirmware:       PD51
             physicalInsertTime:     2018-05-18T10:52:29-05:00
             physicalInterface:      sas
             physicalSerial:         PPZ47V
             physicalSize:           7.1536639072000980377197265625T
             slotNumber:             5
             status:                 normal


    [root@dm01cel03 ~]# cellcli -e list celldisk where lun=0_5 detail




    [root@dm01cel03 ~]# cellcli -e list griddisk where cellDisk=CD_05_cm01cel01 attributes name,status
    DATA_CD_05_dm01cel03 not present
    DBFS_DG_CD_05_dm01cel03 not present
    RECO_CD_05_dm01cel03 not present


    [root@dm01cel03 ~]# cellcli -e list griddisk where celldisk=CD_05_dm01cel03 detail
             name:                   DATA_CD_05_dm01cel03
             availableTo:
             cachingPolicy:          default
             cellDisk:               CD_05_dm01cel03
             comment:                “Cluster dm01-cluster diskgroup DATA”
             creationTime:           2016-03-29T20:25:56-05:00
             diskType:               HardDisk
             errorCount:             0
             id:                     db221d77-25b0-4f9e-af6f-95e1c3134af5
             size:                   5.6953125T
             status:                 not present


             name:                   DBFS_DG_CD_05_dm01cel03
             availableTo:
             cachingPolicy:          default
             cellDisk:               CD_05_dm01cel03
             comment:                “Cluster dm01-cluster diskgroup DBFS_DG”
             creationTime:           2016-03-29T20:25:53-05:00
             diskType:               HardDisk
             errorCount:             0
             id:                     216fbec9-6ed4-4ef6-a0d4-d09517906fd5
             size:                   33.796875G
             status:                 not present


             name:                   RECO_CD_05_dm01cel03
             availableTo:
             cachingPolicy:          none
             cellDisk:               CD_05_dm01cel03
             comment:                “Cluster dm01-cluster diskgroup RECO”
             creationTime:           2016-03-29T20:25:58-05:00
             diskType:               HardDisk
             errorCount:             0
             id:                     e8ca6943-0ddd-48ab-b890-e14bbf4e591c
             size:                   1.42388916015625T
             status:                 not present


    We can clearly see that the GRID DISKs are not present. So we have to create the GRID DISKs Manually.


    Steps to create Celldisk, Griddisks and add them to ASM Disk Group


    • List Cell Disks

    [root@dm01cel03 ~]# cellcli -e list celldisk
             CD_00_dm01cel03         normal
             CD_01_dm01cel03         normal
             CD_02_dm01cel03         normal
             CD_03_dm01cel03         normal
             CD_04_dm01cel03         normal
             CD_05_dm01cel03         not present
             CD_06_dm01cel03         normal
             CD_07_dm01cel03         normal
             CD_08_dm01cel03         normal
             CD_09_dm01cel03         normal
             CD_10_dm01cel03         normal
             CD_11_dm01cel03         normal
             FD_00_dm01cel03         normal
             FD_01_dm01cel03         normal
             FD_02_dm01cel03         normal
             FD_03_dm01cel03         normal

    • List Grid Disks

    [root@dm01cel03 ~]# cellcli -e list griddisk
             DATA_CD_00_dm01cel03       active
             DATA_CD_01_dm01cel03       active
             DATA_CD_02_dm01cel03       active
             DATA_CD_03_dm01cel03       active
             DATA_CD_04_dm01cel03       active
             DATA_CD_05_dm01cel03       not present
             DATA_CD_06_dm01cel03       active
             DATA_CD_07_dm01cel03       active
             DATA_CD_08_dm01cel03       active
             DATA_CD_09_dm01cel03       active
             DATA_CD_10_dm01cel03       active
             DATA_CD_11_dm01cel03       active
             DBFS_DG_CD_02_dm01cel03    active
             DBFS_DG_CD_03_dm01cel03    active
             DBFS_DG_CD_04_dm01cel03    active
             DBFS_DG_CD_05_dm01cel03    not present
             DBFS_DG_CD_06_dm01cel03    active
             DBFS_DG_CD_07_dm01cel03    active
             DBFS_DG_CD_08_dm01cel03    active
             DBFS_DG_CD_09_dm01cel03    active
             DBFS_DG_CD_10_dm01cel03    active
             DBFS_DG_CD_11_dm01cel03    active
             RECO_CD_00_dm01cel03       active
             RECO_CD_01_dm01cel03       active
             RECO_CD_02_dm01cel03       active
             RECO_CD_03_dm01cel03       active
             RECO_CD_04_dm01cel03       active
             RECO_CD_05_dm01cel03       not present
             RECO_CD_06_dm01cel03       active
             RECO_CD_07_dm01cel03       active
             RECO_CD_08_dm01cel03       active
             RECO_CD_09_dm01cel03       active
             RECO_CD_10_dm01cel03       active
             RECO_CD_11_dm01cel03       active

    • List Physical Disk details

    [root@dm01cel03 ~]# cellcli -e list physicaldisk where physicalSerial=PPZ47V detail
             name:                   8:5
             deviceId:               21
             deviceName:             /dev/sdf
             diskType:               HardDisk
             enclosureDeviceId:      8
             errOtherCount:          0
             luns:                   0_5
             makeModel:              “HGST    H7280A520SUN8.0T”
             physicalFirmware:       PD51
             physicalInsertTime:     2018-05-18T10:52:29-05:00
             physicalInterface:      sas
             physicalSerial:         PPZ47V
             physicalSize:           7.1536639072000980377197265625T
             slotNumber:             5
             status:                 normal

    • Let’s try to create the Cell Disk

    [root@dm01cel03 ~]# cellcli -e create celldisk CD_09_dm01cel03 lun=0_5


    CELL-02526: Pre-existing cell disk: CD_09_dm01cel03


    It says the Cell Disk already exists.

    • Let’s try to create the Grid Disk. To create the Grid Disk with proper size, get the Grid Disk size from a good Cell Disk as shown below.

    [root@dm01cel03 ~]# cellcli -e list griddisk where celldisk=CD_07_dm01cel03 attributes name,size,offset
             DATA_CD_07_dm01cel03       5.6953125T              32M
             DBFS_DG_CD_07_dm01cel03         33.796875G         7.1192474365234375T
             RECO_CD_07_dm01cel03       1.42388916015625T       5.6953582763671875T

    • Now create the Grid Disk

    [root@dm01cel03 ~]# cellcli -e create griddisk DATA_CD_05_dm01cel03 celldisk=CD_05_dm01cel03,size=5.6953125T


    CELL-02701: Cannot create grid disk on cell disk CD_05_dm01cel03 because its status is not normal.


    Looks like we can’t create the Grid Disk. We will now drop the Cell Disk and recreate it.

    • Drop Cell Disk

    CellCLI> drop celldisk CD_05_dm01cel03 force
    CellDisk CD_05_dm01cel03 successfully dropped

    • Create Cell Disk

    CellCLI> create celldisk CD_05_dm01cel03 lun=0_5
    CellDisk CD_05_dm01cel03 successfully created

    • Create Grid Disks with proper sizes

    CellCLI> create griddisk DATA_CD_05_dm01cel03 celldisk=CD_05_dm01cel03,size=5.6953125T
    GridDisk DATA_CD_05_dm01cel03 successfully created


    CellCLI> create griddisk RECO_CD_05_dm01cel03 celldisk=CD_05_dm01cel03,size=1.42388916015625T
    GridDisk RECO_CD_05_dm01cel03 successfully created


    CellCLI> create griddisk DBFS_DG_CD_05_dm01cel03 celldisk=CD_05_dm01cel03,size=33.796875G
    GridDisk DBFS_DG_CD_05_dm01cel03 successfully created

    • List Grid Disks

    CellCLI> list griddisk where celldisk=CD_05_dm01cel03 attributes name,size,offset
             DATA_CD_05_dm01cel03       5.6953125T              32M
             DBFS_DG_CD_05_dm01cel03         33.796875G              7.1192474365234375T
             RECO_CD_05_dm01cel03       1.42388916015625T       5.6953582763671875T


    CellCLI> list griddisk
             DATA_CD_00_dm01cel03       active
             DATA_CD_01_dm01cel03       active
             DATA_CD_02_dm01cel03       active
             DATA_CD_03_dm01cel03       active
             DATA_CD_04_dm01cel03       active
             DATA_CD_05_dm01cel03       active
             DATA_CD_06_dm01cel03       active
             DATA_CD_07_dm01cel03       active
             DATA_CD_08_dm01cel03       active
             DATA_CD_09_dm01cel03       active
             DATA_CD_10_dm01cel03       active
             DATA_CD_11_dm01cel03       active
             DBFS_DG_CD_02_dm01cel03    active
             DBFS_DG_CD_03_dm01cel03    active
             DBFS_DG_CD_04_dm01cel03    active
             DBFS_DG_CD_05_dm01cel03    active
             DBFS_DG_CD_06_dm01cel03    active
             DBFS_DG_CD_07_dm01cel03    active
             DBFS_DG_CD_08_dm01cel03    active
             DBFS_DG_CD_09_dm01cel03    active
             DBFS_DG_CD_10_dm01cel03    active
             DBFS_DG_CD_11_dm01cel03    active
             RECO_CD_00_dm01cel03       active
             RECO_CD_01_dm01cel03       active
             RECO_CD_02_dm01cel03       active
             RECO_CD_03_dm01cel03       active
             RECO_CD_04_dm01cel03       active
             RECO_CD_05_dm01cel03       active
             RECO_CD_06_dm01cel03       active
             RECO_CD_07_dm01cel03       active
             RECO_CD_08_dm01cel03       active
             RECO_CD_09_dm01cel03       active
             RECO_CD_10_dm01cel03       active
             RECO_CD_11_dm01cel03       active


    The Grid Disks show active now. We can go ahead and add them to ASM disk Group Manually by connecting to ASM instance.


    • Log into +ASM1 instance and add the new disk.  Set the rebalance power higher (11) to perform faster rebalance operation.

    dm01db01-orcldb1 {/home/oracle}:. oraenv
    ORACLE_SID = [orcldb1] ? +ASM1
    The Oracle base remains unchanged with value /u01/app/oracle
    dm01db01-+ASM1 {/home/oracle}:sqlplus / as sysasm


    SQL*Plus: Release 11.2.0.4.0 Production on Wed May 23 09:30:13 2018
    Copyright (c) 1982, 2013, Oracle.  All rights reserved.


    Connected to:
    Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production
    With the Real Application Clusters and Automatic Storage Management options


    SQL> alter diskgroup DATA add failgroup dm01CEL03 disk ‘o/192.168.10.1;192.168.10.2/DATA_CD_05_dm01cel03’ name DATA_CD_05_dm01cel03 rebalance power 11;


    Diskgroup altered.


    SQL> alter diskgroup RECO add failgroup dm01CEL03 disk ‘o/192.168.10.1;192.168.10.2/RECO_CD_05_dm01cel03’ name RECO_CD_05_dm01cel03 rebalance power 11;


    Diskgroup altered.


    SQL> alter diskgroup DBFS_DG add failgroup dm01CEL03 disk ‘o/192.168.10.1;192.168.10.2/DBFS_DG_CD_05_dm01cel03’ name DBFS_DG_CD_05_dm01cel03 rebalance power 11;


    Diskgroup altered.


    SQL> select a.name,a.total_mb,a.free_mb,a.type,
        decode(a.type,’NORMAL’,a.total_mb/2,’HIGH’,a.total_mb/3) avail_mb,
        decode(a.type,’NORMAL’,a.free_mb/2,’HIGH’,a.free_mb/3) usable_mb,
        count(b.path) cell_disks  from v$asm_diskgroup a, v$asm_disk b
        where a.group_number=b.group_number group by a.name,a.total_mb,a.free_mb,a.type,
        decode(a.type,’NORMAL’,a.total_mb/2,’HIGH’,a.total_mb/3) ,
        decode(a.type,’NORMAL’,a.free_mb/2,’HIGH’,a.free_mb/3)
       order by 2,1;


                   Total MB    Free MB          Total MB    Free MB
    Disk Group          Raw        Raw TYPE       Usable     Usable     CELL_DISKS
    ———— ———- ———- —— ———- ———- ———-
    DBFS_DG    2076480    2074688 NORMAL    1038240    1037344         60
    RECO     107500032   57573496 HIGH     35833344   19191165         72
    DATA     429981696  282905064 HIGH    143327232   94301688         72


    SQL> select * from v$asm_operation;


    GROUP_NUMBER OPERA STAT      POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
    ———— —– —- ———- ———- ———- ———- ———- ———– ——————————————–
               1 REBAL RUN          11         11      85992    6697959      11260         587
               3 REBAL WAIT         11




    SQL> select * from gv$asm_operation;


    no rows selected




    Conclusion


    In this article we have learned how to create the Celldisk, Griddisks and add the newly created Griddisks to ASM Disk Group. When a hard disk is replaced, the lun, celldisk and griddisks are created automatically and griddisks are added to ASM disk group for you without any manual intervention. In some cases, if the Celldisk and grid disks are not created automatically, then you must manually create them and add them to the ASM disk group.

  • Exadata – Replace Failed Internal USB Drive on Exadata Storage Cell

    While working on Exadata Storage cell patching, the patching failed due to failed internal USB drive on a storage cell.
    Oracle uses internal USB drive to backup Exadata Storage cell automatically. We don’t have to backup Storage cell manually.
    In this article I will demonstrate how to replace a failed USB drive an Exadata Storage cell



    • You will receive an automated smtp alert (if configured) similar to below.
    • You can also use the following command to check for USB drive failure
    [root@dm01cel01 ~]# cellcli -e list alerthistory
             1_1     2018-04-10T18:25:42-05:00       warning         “Internal USB status is not present.  Affected USB Slots : 0”
    • You can also use the following ILOM command to check for USB drive failure
    [root@dm01cel01 ~]# ssh dm01cel01-ilom
    Password: *******
    Oracle(R) Integrated Lights Out Manager



    Version 3.2.10.22.a r121524



    Copyright (c) 2017, Oracle and/or its affiliates. All rights reserved.

    Warning: HTTPS certificate is set to factory default.



    Hostname: dm01cel01-ilom



    -> show /SYS/MB/USB0



    • Open an SR with Oracle if an ASR is already generated
    • Upload sundgia.sh and ILOM Snapshot to the SR for investigation
    • Oracle confirms the that USB drive is faulted
    • Oracle opens a Field task
    • Oracle dispatch team contacts the SR owner with the hardware dispatch details
    • Confirm the Hardware replacement schedule over email and/or SR
    • Schedule the Hardware replacement
    • Oracle FE arrives at the data center with the new USB drive
    • Shutdown the storage cell by following the steps from the MOS below
    Steps to shut down or reboot an Exadata storage cell without affecting ASM (Doc ID 1188080.1)
    • Oracle FE replaces the faulty USB drive and power up the storage cell
    • Confirm that the USB drive is good
    -> show /SYS/MB/USB0
     /SYS/MB/USB0



        Targets:
        Properties:



            type = USB Port
            fault_state = OK
            clear_fault_action = (none)
        Commands:



            cd
            set
            show
    ->



    [root@dm01cel01 ~]# cellcli -e list alerthistory



             1_2     2018-04-11T02:45:49-05:00       clear           “Internal USB status is back to normal.  Affected USB Slots : 0”
    • You will receive an automated smtp alert (if configured) similar to below that the USB status is back to normal



    Conclusion
    In this article we have learned how to replace a faulty USB drive in Exadata Storage cell. Oracle uses USB drive to backup Exadata Storage cell automatically. We don’t have to backup Storage cell manually.