Re: RAID5 with 2 drive failure at the same time

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Christoph Nelles <evilazrael@evilazrael.de>
To: linux-raid@vger.kernel.org
Subject: Re: RAID5 with 2 drive failure at the same time
Date: Sun, 10 Feb 2013 21:48:51 +0100	[thread overview]
Message-ID: <511807B3.3070105@evilazrael.de> (raw)
In-Reply-To: <20130203215934.GB18805@cthulhu.home.robinhill.me.uk>

Hello ML,

thanks Chris, Phil & Robin. You helped me alot.

After replacing the Marvell Controller with a LSI SAS2008-based
Controller (IBM M1015 flashed to 9211-IT) the RAID was rebuilt
successfully and is running clean and stable. So the cause of the
problems was one HDD with UREs and the unstable Marvell controller. My
next steps are going to RAID6 and a bigger chunk size and scrubbing the
RAID periodically.

I have a last question. I am wondering that reading a huge file in the
XFS on the Array is faster than reading the raw md0 device. Has anybody
an explanation for that?

9 Drives RAID5, chunk size 64kb, Filesystem XFS not optimized:
# echo 3 > /proc/sys/vm/drop_caches
# dd if=dummy.file of=/dev/null bs=1M count=100k
102400+0 records in
102400+0 records out
107374182400 bytes (107 GB) copied, 211.467 s, 508 MB/s

# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/md0 of=/dev/null bs=1M count=100k
102400+0 records in
102400+0 records out
107374182400 bytes (107 GB) copied, 263.738 s, 407 MB/s

# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/md0 of=/dev/null bs=64k count=1600k
1638400+0 records in
1638400+0 records out
107374182400 bytes (107 GB) copied, 253.76 s, 423 MB/s

# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/md0 of=/dev/null bs=512k count=200k
204800+0 records in
204800+0 records out
107374182400 bytes (107 GB) copied, 260.837 s, 412 MB/s

# echo 3 > /proc/sys/vm/drop_caches
# dd if=/dev/md0 of=/dev/null bs=576k count=200k
204800+0 records in
204800+0 records out
120795955200 bytes (121 GB) copied, 296.567 s, 407 MB/s

Once again thanks for all help

Kind Regards

Christoph

Am 03.02.2013 22:59, schrieb Robin Hill:
> On Sun Feb 03, 2013 at 04:56:35 +0100, Christoph Nelles wrote:
> 
>> Hi folks,
>>
>> the dd_rescue to the new HDD took 14hours. It looks like ddrescue is not
>> reading and writing in parallel. In the end 8kb couldn't be read after
>> 10 retries.
>>
> Note that there's a difference between dd_rescue and ddrescue. GNU
> ddrescue seems to be the better option nowadays,
> 
>> I just force-assembled the RAID with the new drive, but it failed almost
>> immediately with an WRITE FPDMA QUEUED error on one of the other drives
>> (sdj, formerly sdi). I tried immediately again, an this time one disk
>> was rejected but the RAID started on 8 devices, but xfs_repair failed
>> when one of the disks failed with an READ FPDMA QUEUED error :( and md
>> expelled the disk from the RAID.
>>
>> It looks more like a controller problem as all the messages comming from
>> the drives on the PCIe Marvell have all the line
>> ataXX: illegal qc_active transition (00000002->00000003)
>> I found only one similar report about that problem:
>> http://marc.info/?l=linux-ide&m=131475722021117
>>
>> Any recommendations for a decent and affordable SATA Controller with at
>> least 4 ports and faster than PCIe x1? Looks like there are only
>> Marvells and more expensive Enterprise RAID controllers.
>>
> 
> I can recommend the Intel RS2WC080 (or any other LSI SAS2008 based
> controller). Quite frankly, any SAS controller is almost certainly
> going to be better than the SATA equivalent (and for not a huge amount
> more), while still supporting standard SATA drives.
> 
> Cheers,
>     Robin


-- 
Christoph Nelles

E-Mail    : evilazrael@evilazrael.de
Jabber    : eazrael@evilazrael.net      ICQ       : 78819723

PGP-Key   : ID 0x424FB55B on subkeys.pgp.net
            or http://evilazrael.net/pgp.txt

     prev parent reply	other threads:[~2013-02-10 20:48 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-31 10:42 RAID5 with 2 drive failure at the same time Christoph Nelles
2013-01-31 11:38 ` Robin Hill
2013-01-31 13:15   ` Christoph Nelles
2013-01-31 13:45     ` Robin Hill
2013-01-31 17:46     ` Chris Murphy
     [not found]       ` <510ABC1E.6060308@evilazrael.de>
2013-01-31 21:19         ` Chris Murphy
2013-01-31 22:10       ` Robin Hill
2013-01-31 22:40         ` Chris Murphy
2013-01-31 22:48           ` Chris Murphy
2013-02-01 13:34           ` Robin Hill
2013-02-01 17:27             ` Chris Murphy
2013-02-01 19:57               ` Robin Hill
2013-02-02  0:30                 ` Christoph Nelles
2013-02-02  1:24                   ` Phil Turmel
2013-02-02 15:55                     ` Christoph Nelles
2013-02-02 20:34                       ` Chris Murphy
2013-02-02 23:56                         ` Phil Turmel
2013-02-03  1:22                       ` Phil Turmel
2013-02-03 15:56                         ` Christoph Nelles
2013-02-03 21:59                           ` Robin Hill
2013-02-10 20:48                             ` Christoph Nelles [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=511807B3.3070105@evilazrael.de \
    --to=evilazrael@evilazrael.de \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).