Thursday, December 30, 2004

sgi shutdown command

sun

init 0

sgi
shutdown -i0 -g0 -y -p

o=zero

SGI : may use desktop to shutdown
Must check shutdown order from Octane to other Octane 2 in sequence map.

Tuesday, December 21, 2004

Friday inspection to check array record

Last friday, we worked on TP9100 failure problem. Below is a brief report .

1. Problem description
As the cust reported, A TP9100 was experiencing failure due to disk error ( amber LED ), filesystems on this device were not avaible, after power circle the host system (octane) and TP9100 , all TP9100 disk stated as amber LED , filesystem not avaible.

2. Work Procedure

2.1 On-site, checked TP9100 with management tool , WAM , some RAID disk were marked as offline, some were unconfigured, LUN was in status of not usable;
2.2 Power circle the TP9100, all disk showed green LED, raid controller was ok, but LUN / filesystem was still unavaible;
2.3 Rebuilt the COD information on RAID device, LUN was rebuild, filesystem could be mounted as normal, but xfs_repair and xfs_check failed , reporting Fatal error, it caused system panic and hang. At this stage, I can say that all data on TP9100 was corrupted, or , the Raid 5 could not recovery;
2.4 Formatted the filesystems, mounted ok.

3. Review

A Raid 5 device has fault tolerance , it could stand for one disk's failure without interrupting regular access , hot spare disk could give you extra protection. On the site of CACT Group, There are 8 disks in TP9100, 6 disks were combined into a Raid 5 device, 2 LUNs build on it, one disk was assigned as hot spare disk , and, the last disk was unused, marked as 'not stable' . Now we know, unfortunately, the hot spare disk was not stable or defective, with GAM tool, you can see that this driver capacity is 0Mb. so, with 2 disk's failure, RAID 5 device corrupted, the data on RAID 5 was lost. at last, we have to re-format the filesystem to make it usable.

5. Recommendation

5.1 Take care of the temperature in your machine room, though the operating temperature for TP9100 is 10 °C to 40 °C , Be mind that hard disk will have much more chance crash down in high temperature .
5.2 Check the system log regularly, not only Octane, but also TP9100, even eyesight checking is good for finding out any hareware failure.
5.3 Replace the failing disk.

SGI disk array manager

#cd /opt/mylex/wam

#./run_wam &

web array manager startup.


1. Display disks logical
# df -k
2. Unmount file sytem disk
# umount /dev/dsk/dks3d0s7
3. show all devices
# hinv
Check controller 3
4. deleting LUNs and rebuilding LUNs
Make sure step 8)in building:
30000Mb raid 5 initilized and write cache 5. partitioning #hinv
#fx
Ctrl#=(0)3 enter
Drive#=(1) 0 enter
Lun#=(0) enter " change next logical disk to 1
R for repartion
O for

Continue? Yes (enter)
.. Back to upper menu
..
Crtl + C to break
6. make file system
#ls /dev/dsk
#mkfs /dev/dsk/dks3d0s7
7. mount logical disks
#mount -t xfs /dev/dsk/dks3d0s7 /local/lfs1
#mount -t xfs /dev/dsk/dks3dl1s7 /local/lfs1


-----Original Message-----
From: alog [mailto:alog@sgi.com]
Sent: Monday, December 20, 2004 5:57 PM
To: Wang,Dali
Subject: Deleting & building LUNs.
FYI.
Deleting LUNs
1) Right mouse click on the controller that is listed in the
system view.
2) Left mouse click on configure.
3) Select the custom and edit configuration.
4) Select the logical drive (next button).
5) Select delete last until you reach the required LUN.
6) Select the drive spares menu (next button).
7) Select next until the LUN mapping window appears.
Select OK.
A warning appears that states you are about to delete a LUN.
Select yes.
When a second window appears, select yes and OK; the
controller reboots.

Building LUNs
1) Right mouse click on the controller listed in system view.
2) Left mouse click on configure.
3) Select custom and new configuration.
This displays a window that lists the drives that are
available to configure.
4) Click on new array. This displays a window where you can
select the drives.
5) Click on the drives that you want in each array.
6) When you have the desired number of drives selected, click
new array and repeat Steps 4 and 5 for all arrays.
7) When you complete the selection, enter NEXT. The logical drive
window will appear on the screen.
8) From the logical drive window, select the RAID level and
stripe-block size; then initialize and write the cache
(if desired).
Enter add; the name increments by one.
Select the next RAID type; then enter add.
Repeat this process until all RAIDs have been added.
9) Enter next; the spare drive window appears on the screen.
Click on each drive that you want to add as a hot spare.
10)Click next; the LUN mapping window appears on the screen.
Click on set sequential;
11)Set Topology to Multiport; then click on OK.
12)The controller resets and a window appears that displays
the LUN percentage that is being bound.
Note: Step 12 takes a long time because parity is built across
each LUN.