So the POWER6 (which the Apple Network Server 500 is subbing in for) did indeed blow its system backplane. Unfortunately it appears to have taken the RAID 5 array with it. There is data still in the auxiliary battery-backed cache, but the cache directory is apparently hosed and SRN 2505-9050 popped up in the logs, indicating it does not recognize the cache data as belonging to the array. The associated MAP 3131 to resolve it strongly suggests I'm more hosed than a fire truck at a gay pride parade, with ugly words like "data loss" and "delete then recreate array."
Diagnostics shows the disks are there, and recognized as belonging to the array, but the array itself is listed as Failed.
My IBM tech friends aren't sure what to do with it either, but I thought I'd ask here in case anyone is a POWER Systems god. The system was relatively quiescent the day it went bad, so there shouldn't be a LOT in that write cache, mostly log files (the 57B7/8 card pair has a modest 175MB of write cache). This is AIX, so these are all JFS2 file systems.
The way I figure it, /, /usr and /opt haven't seen much action since Obama's first term. They should be clean of writes; there shouldn't be anything in the write cache for those partitions. /tmp is expendable. /home is backed up daily, so I can restore it. The logs in /var are almost certainly toast, but the database hasn't been written to in several weeks, mail is backed up several times a day, and the web server and gopher server are backed up daily and weekly respectively. I don't care about the journal volume or the paging volume, since I already assume those are fried.
So, given this, my thought is to just reclaim the write cache and wipe it, and fsck and hope for the best. The drives still organize themselves into an array, just a failed one. The journaling should keep the file system in a sane state, even if I've lost some writes.
Or do you think I'll be rebuilding the server from scratch and backups?
Diagnostics shows the disks are there, and recognized as belonging to the array, but the array itself is listed as Failed.
My IBM tech friends aren't sure what to do with it either, but I thought I'd ask here in case anyone is a POWER Systems god. The system was relatively quiescent the day it went bad, so there shouldn't be a LOT in that write cache, mostly log files (the 57B7/8 card pair has a modest 175MB of write cache). This is AIX, so these are all JFS2 file systems.
The way I figure it, /, /usr and /opt haven't seen much action since Obama's first term. They should be clean of writes; there shouldn't be anything in the write cache for those partitions. /tmp is expendable. /home is backed up daily, so I can restore it. The logs in /var are almost certainly toast, but the database hasn't been written to in several weeks, mail is backed up several times a day, and the web server and gopher server are backed up daily and weekly respectively. I don't care about the journal volume or the paging volume, since I already assume those are fried.
So, given this, my thought is to just reclaim the write cache and wipe it, and fsck and hope for the best. The drives still organize themselves into an array, just a failed one. The journaling should keep the file system in a sane state, even if I've lost some writes.
Or do you think I'll be rebuilding the server from scratch and backups?
smit happens.
bigred , 900MHz R16K, 4GB RAM, V12 DCD, 6.5.30
indy , 150MHz R4400SC, 256MB RAM, XL24, 6.5.10
purplehaze , R10000, Solid IMPACT
probably posted from bruce , Quad 2.5GHz PowerPC 970MP, 16GB RAM, Mac OS X 10.4.11
plus IBM POWER6 p520 * Apple Network Server 500 * HP C8000 * BeBox * Solbourne S3000 * Commodore 128 * many more...
bigred , 900MHz R16K, 4GB RAM, V12 DCD, 6.5.30
indy , 150MHz R4400SC, 256MB RAM, XL24, 6.5.10
purplehaze , R10000, Solid IMPACT
probably posted from bruce , Quad 2.5GHz PowerPC 970MP, 16GB RAM, Mac OS X 10.4.11
plus IBM POWER6 p520 * Apple Network Server 500 * HP C8000 * BeBox * Solbourne S3000 * Commodore 128 * many more...