All of lore.kernel.org
 help / color / mirror / Atom feed
From: Todd Lyons <todd@mrball.net>
To: reiserfs-list@namesys.com
Subject: rebuild tree on 500 GB partition (resend)
Date: Tue, 29 Oct 2002 10:21:38 -0800	[thread overview]
Message-ID: <20021029182138.GB9168@mrball.net> (raw)

[-- Attachment #1: Type: text/plain, Size: 3580 bytes --]

I am sending this again because the first one seems to have been
dropped.

Due to an apparent (partial) hardware failure on a 500 Gig RAID storage
unit, we're starting a rebuild-tree on a single 500 Gig partition.  This
is the last hope before a restore from tape :(.

The sequence of events getting to this point is a little fubar, but let
me recap:

1) Site has one 500 Gig unit in service with 8 drives in a RAID 5 array,
no hot spares, hereafter referred to as Unit A.  Site has another 500
Gig unit with 8 drives in a RAID 5 array, no hot spares, hereafter
referred to as Unit B.  Both have a SCSI UltraWide 160 interface that
plugs into the host computer.
2) Drive 3 died on Unit A.  Running in degraded array mode, ie no fault
tolerance.
3) Hotswap of a spare blank drive failed due to a firmware bug (known
issue due to age of unit).  Still running ok in degraded array mode.
4) End of the day, power everything down, replace bad drive with new
drive, power up, but still doesn't see new drive, but still running ok
in degraded array mode.
5) As a test, pull all drives from Unit A and insert in Unit B.  New
drive 3 still rejected.
6) Put all drives back in Unit A.  Both units now in original
configuration.
7) Plan is to connect both units to the SCSI bus and copy from Unit A to
Unit B.
8) Re-init Unit B array.  Took 3 hours.
9) Unit A is set to ID 3 on the SCSI bus.  Unit B was at 3, changed it
to ID 4.  Disabled termination on Unit A and connected Unit B using a
cable.  Powered up.  Got parity errors with both connected.  Unable to
mount second unit.
10) Power down.  Disconnect Unit A.  Only connected to Unit B.  Make
single 500 Gig partition.  Format with reiser.
11) Unit A won't power up.  Reports configuration error.  Something is
screwed in its internal configuration.
12) Pull all drives from Unit A and insert in Unit B.  Powers up, says
all drives are ok.  Mount the partition, but getting permission denied
when try to list directory contents or cd into directories.  At this
point, it becomes apparent that fs corruption is present.
13) reiserfsprogs-3.0xj (2001) or something like that was installed.
Would segfault with reiserfsck.
14) Download reiserfsprogs-3.6.4, compile, install.  reiserfsck said to
use rebuild-tree.
15) Started rebuild-tree, logging to a log file.  Will send log file to
the list if requested.

It seems to me that somewhere during step 9, Unit A started corrupting
data on the disks, causing the need for rebuild-tree.

I'm not mentioning any brand names because I don't feel it's fair at
this point.  Both units have performed flawlessly up to this point.

UPDATE:
The rebuild tree completed.  Our results were nearly identical to the
other gentleman's results, about 8.5 hours.  It seems to have recovered
everything that our users can see.  In addition, there is 3.454 Gigs
worth of files in lost+found which we are culling through right now.
The log file is a couple of megs long, so it's not useful to post it to
the list, but I can make it available for download somewhere if need be.
Let me know.

The difference in quality between the old 3.0x series and the current
3.6.4 is AMAZING.  The new stuff is very well written and performs
tremendously.
-- 
Blue skies...	Todd                   http://www.mrball.net
        Public key:  http://www.mrball.net/todd.asc
        Signing an email is like wearing underwear.  
  You don't have to, but it's a really good idea to do it.
   Linux kernel 2.5.44   2 users,  load average: 0.00, 0.00, 0.00

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

             reply	other threads:[~2002-10-29 18:21 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2002-10-29 18:21 Todd Lyons [this message]
2002-10-31  8:05 ` rebuild tree on 500 GB partition (resend) Oleg Drokin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20021029182138.GB9168@mrball.net \
    --to=todd@mrball.net \
    --cc=reiserfs-list@namesys.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.