From: Harry Mangalam <hjm@tacgi.com>
To: linux-raid@vger.kernel.org
Subject: problems with 3ware 8506-8 post-disk failure
Date: Mon, 16 May 2005 15:19:06 -0700 [thread overview]
Message-ID: <200505161519.06783.hjm@tacgi.com> (raw)
In-Reply-To: <1116257122.15450.6.camel@langvan2.homenetwork>
Scenario:
dual opteron/4G/Ubuntu pure 64bit SMP / OS on separate IDE drive, 3ware
8506-8port driving 8x WD2500JD disks in Chenbro hotswap cages as RAID5,
config'ed as both reiserfs (pre-catastrophe) and ext3 (postcatastrophe).
I'm responsible for getting this system up (done) and reliable (not done).
The short version is that it ran well for a few weeks until we discovered on a
reboot that a disk had silently failed, degrading the RAID5. In trying to
repair that failure, 3ware's 3dm2 software that indicated that it was
repairing the array, but failed to do so, causing the loss of the entire
array. I tried to rescue the data with reiserfs's fsck but was only able to
recover individual chunks. Since most of the info was huge binary files and
most of it was backed up elsewhere, we decided not to attempt to rescue
anything and we re-formatted with ext3, supposedly bc it was considered more
reliable and better suited for large files. After that, the raid stayed up
for a day or so and I loaded it down with huge disk i/o, trying to see what
would happen. The same port / disk # failed again (tho at least this time
the SW notified us), but this seems pretty suspicious that it's the same port
number failing.
I played around with the motherboard Silicon Image 4port SATA controller and
sw raid (via mdadm) for a while and found that after a certain amount of
futzing, it looked not too bad, but the amount of futzing made me a bit
nervous, especially since someone else is going to have to care for it. The
speed of the SW RAID was about 10-20% better than the 3ware by bonnie++, but
I liked the idea on having the RAID looks like big scsi disk. So I went for
the 3ware.
I'll detail the complete catastrophe later (already written up in large chunks
- just have to remove some inflammatory language before posting), but my
question to the group is what people think of 3ware's support. The common
opinion on 3ware seems to be that it's great that they support Linux and the
HW works fine (also my experience), but my opinion has been shaded
considerably by what happens when a RAID fails - when you really DO need to
recover and you need a straightforward path to do so.
In short, I've found 3ware support for recovery procedures to be hard to find
(via google for example and also on their website), hard to understand
because of some peculiar nomenclature, and sometimes misleading due to
oddities of their software.
Is this just my experience, or is this a widely held view? I realize that I'm
talking to a group that seems to be heavily weighted towards SW RAID, but
maybe it's just me. If anyone can compare recovery paths between the 2 (SW
vs 3ware HW) I'd be very happy to hear the stories. Given this recent
experience, I'm re-evaluating whether I should switch back and go SW RAID,
especially given another large catastrophe involving 3ware ccontrollers on
campus.
Have people found that the Chenbro hotswap cages are a contributing factor to
RAID failure? That's what one 3wware person indicated.
--
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm@tacgi.com
<<plain text preferred>>
next prev parent reply other threads:[~2005-05-16 22:19 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-05-13 21:39 [PATCH] raid1: set bi_idx after bio cloning Mike Tran
2005-05-13 23:08 ` Neil Brown
2005-05-16 15:25 ` [PATCH] bio: " Mike Tran
2005-05-16 22:19 ` Harry Mangalam [this message]
2005-05-23 7:31 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200505161519.06783.hjm@tacgi.com \
--to=hjm@tacgi.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).