* rebuild tree on 500 GB partition (resend)
@ 2002-10-29 18:21 Todd Lyons
2002-10-31 8:05 ` Oleg Drokin
0 siblings, 1 reply; 2+ messages in thread
From: Todd Lyons @ 2002-10-29 18:21 UTC (permalink / raw)
To: reiserfs-list
[-- Attachment #1: Type: text/plain, Size: 3580 bytes --]
I am sending this again because the first one seems to have been
dropped.
Due to an apparent (partial) hardware failure on a 500 Gig RAID storage
unit, we're starting a rebuild-tree on a single 500 Gig partition. This
is the last hope before a restore from tape :(.
The sequence of events getting to this point is a little fubar, but let
me recap:
1) Site has one 500 Gig unit in service with 8 drives in a RAID 5 array,
no hot spares, hereafter referred to as Unit A. Site has another 500
Gig unit with 8 drives in a RAID 5 array, no hot spares, hereafter
referred to as Unit B. Both have a SCSI UltraWide 160 interface that
plugs into the host computer.
2) Drive 3 died on Unit A. Running in degraded array mode, ie no fault
tolerance.
3) Hotswap of a spare blank drive failed due to a firmware bug (known
issue due to age of unit). Still running ok in degraded array mode.
4) End of the day, power everything down, replace bad drive with new
drive, power up, but still doesn't see new drive, but still running ok
in degraded array mode.
5) As a test, pull all drives from Unit A and insert in Unit B. New
drive 3 still rejected.
6) Put all drives back in Unit A. Both units now in original
configuration.
7) Plan is to connect both units to the SCSI bus and copy from Unit A to
Unit B.
8) Re-init Unit B array. Took 3 hours.
9) Unit A is set to ID 3 on the SCSI bus. Unit B was at 3, changed it
to ID 4. Disabled termination on Unit A and connected Unit B using a
cable. Powered up. Got parity errors with both connected. Unable to
mount second unit.
10) Power down. Disconnect Unit A. Only connected to Unit B. Make
single 500 Gig partition. Format with reiser.
11) Unit A won't power up. Reports configuration error. Something is
screwed in its internal configuration.
12) Pull all drives from Unit A and insert in Unit B. Powers up, says
all drives are ok. Mount the partition, but getting permission denied
when try to list directory contents or cd into directories. At this
point, it becomes apparent that fs corruption is present.
13) reiserfsprogs-3.0xj (2001) or something like that was installed.
Would segfault with reiserfsck.
14) Download reiserfsprogs-3.6.4, compile, install. reiserfsck said to
use rebuild-tree.
15) Started rebuild-tree, logging to a log file. Will send log file to
the list if requested.
It seems to me that somewhere during step 9, Unit A started corrupting
data on the disks, causing the need for rebuild-tree.
I'm not mentioning any brand names because I don't feel it's fair at
this point. Both units have performed flawlessly up to this point.
UPDATE:
The rebuild tree completed. Our results were nearly identical to the
other gentleman's results, about 8.5 hours. It seems to have recovered
everything that our users can see. In addition, there is 3.454 Gigs
worth of files in lost+found which we are culling through right now.
The log file is a couple of megs long, so it's not useful to post it to
the list, but I can make it available for download somewhere if need be.
Let me know.
The difference in quality between the old 3.0x series and the current
3.6.4 is AMAZING. The new stuff is very well written and performs
tremendously.
--
Blue skies... Todd http://www.mrball.net
Public key: http://www.mrball.net/todd.asc
Signing an email is like wearing underwear.
You don't have to, but it's a really good idea to do it.
Linux kernel 2.5.44 2 users, load average: 0.00, 0.00, 0.00
[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 2+ messages in thread* Re: rebuild tree on 500 GB partition (resend)
2002-10-29 18:21 rebuild tree on 500 GB partition (resend) Todd Lyons
@ 2002-10-31 8:05 ` Oleg Drokin
0 siblings, 0 replies; 2+ messages in thread
From: Oleg Drokin @ 2002-10-31 8:05 UTC (permalink / raw)
To: reiserfs-list
Hello!
On Tue, Oct 29, 2002 at 10:21:38AM -0800, Todd Lyons wrote:
> The rebuild tree completed. Our results were nearly identical to the
> other gentleman's results, about 8.5 hours. It seems to have recovered
> everything that our users can see. In addition, there is 3.454 Gigs
Good to hear that.
> The difference in quality between the old 3.0x series and the current
> 3.6.4 is AMAZING. The new stuff is very well written and performs
> tremendously.
Thank you.
And thank you for the report, too.
Bye,
Oleg
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2002-10-31 8:05 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-10-29 18:21 rebuild tree on 500 GB partition (resend) Todd Lyons
2002-10-31 8:05 ` Oleg Drokin
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.