From: Neil Brown <neilb@suse.de>
To: Michael Sallaway <michael@sallaway.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: 3-way mirrors
Date: Wed, 8 Sep 2010 16:40:38 +1000 [thread overview]
Message-ID: <20100908164038.3067cc6f@notabene> (raw)
In-Reply-To: <20100908061616.31334.qmail@s217.sureserver.com>
On Wed, 08 Sep 2010 06:16:16 +0000
"Michael Sallaway" <michael@sallaway.com> wrote:
>
> > -------Original Message-------
> > From: Neil Brown <neilb@suse.de>
> > To: Michael Sallaway <michael@sallaway.com>
> > Cc: linux-raid@vger.kernel.org
> > Subject: Re: 3-way mirrors
> > Sent: 08 Sep '10 06:02
> >
> > Hmm.... Drive B shouldn't be ejected from the array for a read error. md
> > should calculate the data for both A and B from the other devices and then
> > write that to A and B.
> > If the write fails, only then should it kick B from the array. Is that what
> > is happening?
> >
> > i.e. do you see messages like:
> > read error corrected
> > read error not correctable
> > read error NOT corrected
> >
> > in the kernel logs??
>
>
> The logs for the relevant section are below, at the bottom -- it's a "read error not correctable". So I'm guessing it's also failing a write, although I can't see the ATA error handling mentioning any writes -- it all looks like reads??
Yes, it is just reads.
It looks like you have an ancient kernel - older than April 2010 :-)
A patch went in to 2.6.35 and I think some 2.6.34.y which fixed a bug that
causes md to drop devices in a degraded RAID6 when it could have fixed the
read error. Commit 7b0bb5368a719
So a newer kernel might fix your problem for you.
>
>
> > If the write is failing, then you want my bad-block-log patches - only they
> > aren't really finished yet and certainly aren't tested very well. I really
> > should get back to those.
>
> Interesting -- I'm not familiar with them, where would I find these patches? And what would they do -- just allow the bad blocks (even on writes), and keep the drive in the array? That's all I'm really after, in this case, I think.
I posted them to the list for review a few months ago and haven't got back to
them.
http://www.spinics.net/lists/raid/msg28813.html
I wouldn't recommend using them until they've seen more review and testing.
NeilBrown
>
> Thanks!
> Michael
>
>
>
> Syslog from the failure of the first drive:
>
> Sep 7 09:31:24 lechuck kernel: [51912.039892] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep 7 09:31:24 lechuck kernel: [51912.048227] ata13.00: irq_stat 0x40000008
> Sep 7 09:31:24 lechuck kernel: [51912.056685] ata13.00: failed command: READ FPDMA QUEUED
> Sep 7 09:31:24 lechuck kernel: [51912.065055] ata13.00: cmd 60/d8:08:00:20:d9/00:00:5d:00:00/40 tag 1 ncq 110592 in
> Sep 7 09:31:24 lechuck kernel: [51912.065061] res 51/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep 7 09:31:25 lechuck kernel: [51912.098113] ata13.00: status: { DRDY ERR }
> Sep 7 09:31:25 lechuck kernel: [51912.106705] ata13.00: error: { UNC }
> Sep 7 09:31:25 lechuck kernel: [51912.128027] ata13.00: configured for UDMA/133
> Sep 7 09:31:25 lechuck kernel: [51912.128054] ata13: EH complete
> Sep 7 09:31:28 lechuck kernel: [51915.216232] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep 7 09:31:28 lechuck kernel: [51915.224757] ata13.00: irq_stat 0x40000008
> Sep 7 09:31:28 lechuck kernel: [51915.233283] ata13.00: failed command: READ FPDMA QUEUED
> Sep 7 09:31:28 lechuck kernel: [51915.241660] ata13.00: cmd 60/d8:38:00:20:d9/00:00:5d:00:00/40 tag 7 ncq 110592 in
> Sep 7 09:31:28 lechuck kernel: [51915.241662] res 41/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep 7 09:31:28 lechuck kernel: [51915.275603] ata13.00: status: { DRDY ERR }
> Sep 7 09:31:28 lechuck kernel: [51915.284267] ata13.00: error: { UNC }
> Sep 7 09:31:28 lechuck kernel: [51915.305722] ata13.00: configured for UDMA/133
> Sep 7 09:31:28 lechuck kernel: [51915.305746] ata13: EH complete
> Sep 7 09:31:30 lechuck kernel: [51917.992164] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep 7 09:31:30 lechuck kernel: [51918.000791] ata13.00: irq_stat 0x40000008
> Sep 7 09:31:30 lechuck kernel: [51918.009631] ata13.00: failed command: READ FPDMA QUEUED
> Sep 7 09:31:30 lechuck kernel: [51918.018303] ata13.00: cmd 60/d8:08:00:20:d9/00:00:5d:00:00/40 tag 1 ncq 110592 in
> Sep 7 09:31:30 lechuck kernel: [51918.018305] res 41/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep 7 09:31:30 lechuck kernel: [51918.054117] ata13.00: status: { DRDY ERR }
> Sep 7 09:31:30 lechuck kernel: [51918.062808] ata13.00: error: { UNC }
> Sep 7 09:31:30 lechuck kernel: [51918.084521] ata13.00: configured for UDMA/133
> Sep 7 09:31:30 lechuck kernel: [51918.084547] ata13: EH complete
> Sep 7 09:31:33 lechuck kernel: [51920.956122] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep 7 09:31:33 lechuck kernel: [51920.964858] ata13.00: irq_stat 0x40000008
> Sep 7 09:31:33 lechuck kernel: [51920.973829] ata13.00: failed command: READ FPDMA QUEUED
> Sep 7 09:31:33 lechuck kernel: [51920.982587] ata13.00: cmd 60/d8:38:00:20:d9/00:00:5d:00:00/40 tag 7 ncq 110592 in
> Sep 7 09:31:33 lechuck kernel: [51920.982589] res 41/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep 7 09:31:33 lechuck kernel: [51921.017401] ata13.00: status: { DRDY ERR }
> Sep 7 09:31:33 lechuck kernel: [51921.026134] ata13.00: error: { UNC }
> Sep 7 09:31:33 lechuck kernel: [51921.048656] ata13.00: configured for UDMA/133
> Sep 7 09:31:33 lechuck kernel: [51921.048680] ata13: EH complete
> Sep 7 09:31:37 lechuck kernel: [51924.153414] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep 7 09:31:37 lechuck kernel: [51924.162178] ata13.00: irq_stat 0x40000008
> Sep 7 09:31:37 lechuck kernel: [51924.162182] ata13.00: failed command: READ FPDMA QUEUED
> Sep 7 09:31:37 lechuck kernel: [51924.162189] ata13.00: cmd 60/d8:08:00:20:d9/00:00:5d:00:00/40 tag 1 ncq 110592 in
> Sep 7 09:31:37 lechuck kernel: [51924.162190] res 41/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep 7 09:31:37 lechuck kernel: [51924.162193] ata13.00: status: { DRDY ERR }
> Sep 7 09:31:37 lechuck kernel: [51924.162195] ata13.00: error: { UNC }
> Sep 7 09:31:37 lechuck kernel: [51924.175348] ata13.00: configured for UDMA/133
> Sep 7 09:31:37 lechuck kernel: [51924.175374] ata13: EH complete
> Sep 7 09:31:39 lechuck kernel: [51927.005666] ata13.00: exception Emask 0x0 SAct 0x1ff SErr 0x0 action 0x0
> Sep 7 09:31:39 lechuck kernel: [51927.014384] ata13.00: irq_stat 0x40000008
> Sep 7 09:31:39 lechuck kernel: [51927.023299] ata13.00: failed command: READ FPDMA QUEUED
> Sep 7 09:31:39 lechuck kernel: [51927.031949] ata13.00: cmd 60/d8:38:00:20:d9/00:00:5d:00:00/40 tag 7 ncq 110592 in
> Sep 7 09:31:39 lechuck kernel: [51927.031951] res 41/40:35:a3:20:d9/00:00:5d:00:00/40 Emask 0x409 (media error) <F>
> Sep 7 09:31:39 lechuck kernel: [51927.066322] ata13.00: status: { DRDY ERR }
> Sep 7 09:31:39 lechuck kernel: [51927.074946] ata13.00: error: { UNC }
> Sep 7 09:31:40 lechuck kernel: [51927.096349] ata13.00: configured for UDMA/133
> Sep 7 09:31:40 lechuck kernel: [51927.096393] sd 12:0:0:0: [sdm] Unhandled sense code
> Sep 7 09:31:40 lechuck kernel: [51927.096396] sd 12:0:0:0: [sdm] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> Sep 7 09:31:40 lechuck kernel: [51927.096401] sd 12:0:0:0: [sdm] Sense Key : Medium Error [current] [descriptor]
> Sep 7 09:31:40 lechuck kernel: [51927.096406] Descriptor sense data with sense descriptors (in hex):
> Sep 7 09:31:40 lechuck kernel: [51927.096409] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> Sep 7 09:31:40 lechuck kernel: [51927.096420] 5d d9 20 a3
> Sep 7 09:31:40 lechuck kernel: [51927.096425] sd 12:0:0:0: [sdm] Add. Sense: Unrecovered read error - auto reallocate failed
> Sep 7 09:31:40 lechuck kernel: [51927.096431] sd 12:0:0:0: [sdm] CDB: Read(10): 28 00 5d d9 20 00 00 00 d8 00
> Sep 7 09:31:40 lechuck kernel: [51927.096442] end_request: I/O error, dev sdm, sector 1574510755
> Sep 7 09:31:40 lechuck kernel: [51927.104975] raid5:md10: read error not correctable (sector 1574510752 on sdm).
> Sep 7 09:31:40 lechuck kernel: [51927.104985] raid5: Disk failure on sdm, disabling device.
> Sep 7 09:31:40 lechuck kernel: [51927.104989] raid5: Operation continuing on 10 devices.
> Sep 7 09:31:40 lechuck kernel: [51927.122210] raid5:md10: read error not correctable (sector 1574510760 on sdm).
> Sep 7 09:31:40 lechuck kernel: [51927.122214] raid5:md10: read error not correctable (sector 1574510768 on sdm).
> Sep 7 09:31:40 lechuck kernel: [51927.122218] raid5:md10: read error not correctable (sector 1574510776 on sdm).
> Sep 7 09:31:40 lechuck kernel: [51927.122222] raid5:md10: read error not correctable (sector 1574510784 on sdm).
> Sep 7 09:31:40 lechuck kernel: [51927.122225] raid5:md10: read error not correctable (sector 1574510792 on sdm).
> Sep 7 09:31:40 lechuck kernel: [51927.122229] raid5:md10: read error not correctable (sector 1574510800 on sdm).
> Sep 7 09:31:40 lechuck kernel: [51927.122242] ata13: EH complete
> Sep 7 09:31:40 lechuck kernel: [51927.142926] md: md10: recovery done.
> Sep 7 09:31:40 lechuck mdadm[3840]: Fail event detected on md device /dev/md10, component device /dev/sdm
> Sep 7 09:31:40 lechuck kernel: [51927.344026] RAID5 conf printout:
> Sep 7 09:31:40 lechuck kernel: [51927.344031] --- rd:12 wd:10
> Sep 7 09:31:40 lechuck kernel: [51927.344034] disk 0, o:1, dev:sdf
> Sep 7 09:31:40 lechuck kernel: [51927.344037] disk 1, o:1, dev:sdb
> Sep 7 09:31:40 lechuck kernel: [51927.344039] disk 2, o:1, dev:sda
> Sep 7 09:31:40 lechuck kernel: [51927.344042] disk 3, o:1, dev:sdc
> Sep 7 09:31:40 lechuck kernel: [51927.344044] disk 4, o:1, dev:sdj
> Sep 7 09:31:40 lechuck kernel: [51927.344047] disk 5, o:1, dev:sdi
> Sep 7 09:31:40 lechuck kernel: [51927.344049] disk 6, o:1, dev:sdp
> Sep 7 09:31:40 lechuck kernel: [51927.344052] disk 7, o:1, dev:sdn
> Sep 7 09:31:40 lechuck kernel: [51927.344054] disk 8, o:1, dev:sdo
> Sep 7 09:31:40 lechuck kernel: [51927.344057] disk 9, o:0, dev:sdm
> Sep 7 09:31:40 lechuck kernel: [51927.344059] disk 10, o:1, dev:sdk
> Sep 7 09:31:40 lechuck kernel: [51927.344062] disk 11, o:1, dev:sdl
> Sep 7 09:31:40 lechuck kernel: [51927.344064] RAID5 conf printout:
> Sep 7 09:31:40 lechuck kernel: [51927.344066] --- rd:12 wd:10
> Sep 7 09:31:40 lechuck kernel: [51927.344068] disk 0, o:1, dev:sdf
> Sep 7 09:31:40 lechuck kernel: [51927.344070] disk 1, o:1, dev:sdb
> Sep 7 09:31:40 lechuck kernel: [51927.344073] disk 2, o:1, dev:sda
> Sep 7 09:31:40 lechuck kernel: [51927.344075] disk 3, o:1, dev:sdc
> Sep 7 09:31:40 lechuck kernel: [51927.344077] disk 4, o:1, dev:sdj
> Sep 7 09:31:40 lechuck kernel: [51927.344080] disk 5, o:1, dev:sdi
> Sep 7 09:31:40 lechuck kernel: [51927.344082] disk 6, o:1, dev:sdp
> Sep 7 09:31:40 lechuck kernel: [51927.344084] disk 7, o:1, dev:sdn
> Sep 7 09:31:40 lechuck kernel: [51927.344087] disk 8, o:1, dev:sdo
> Sep 7 09:31:40 lechuck kernel: [51927.344089] disk 9, o:0, dev:sdm
> Sep 7 09:31:40 lechuck kernel: [51927.344091] disk 10, o:1, dev:sdk
> Sep 7 09:31:40 lechuck kernel: [51927.344093] disk 11, o:1, dev:sdl
> Sep 7 09:31:40 lechuck kernel: [51927.344095] RAID5 conf printout:
> Sep 7 09:31:40 lechuck kernel: [51927.344097] --- rd:12 wd:10
> Sep 7 09:31:40 lechuck kernel: [51927.344100] disk 0, o:1, dev:sdf
> Sep 7 09:31:40 lechuck kernel: [51927.344102] disk 1, o:1, dev:sdb
> Sep 7 09:31:40 lechuck kernel: [51927.344104] disk 2, o:1, dev:sda
> Sep 7 09:31:40 lechuck kernel: [51927.344106] disk 3, o:1, dev:sdc
> Sep 7 09:31:40 lechuck kernel: [51927.344109] disk 4, o:1, dev:sdj
> Sep 7 09:31:40 lechuck kernel: [51927.344111] disk 5, o:1, dev:sdi
> Sep 7 09:31:40 lechuck kernel: [51927.344113] disk 6, o:1, dev:sdp
> Sep 7 09:31:40 lechuck kernel: [51927.344116] disk 7, o:1, dev:sdn
> Sep 7 09:31:40 lechuck kernel: [51927.344118] disk 8, o:1, dev:sdo
> Sep 7 09:31:40 lechuck kernel: [51927.344120] disk 9, o:0, dev:sdm
> Sep 7 09:31:40 lechuck kernel: [51927.344122] disk 10, o:1, dev:sdk
> Sep 7 09:31:40 lechuck kernel: [51927.344125] disk 11, o:1, dev:sdl
> Sep 7 09:31:40 lechuck kernel: [51927.400014] RAID5 conf printout:
> Sep 7 09:31:40 lechuck kernel: [51927.400017] --- rd:12 wd:10
> Sep 7 09:31:40 lechuck kernel: [51927.400020] disk 0, o:1, dev:sdf
> Sep 7 09:31:40 lechuck kernel: [51927.400022] disk 1, o:1, dev:sdb
> Sep 7 09:31:40 lechuck kernel: [51927.400025] disk 2, o:1, dev:sda
> Sep 7 09:31:40 lechuck kernel: [51927.400027] disk 3, o:1, dev:sdc
> Sep 7 09:31:40 lechuck kernel: [51927.400029] disk 4, o:1, dev:sdj
> Sep 7 09:31:40 lechuck kernel: [51927.400032] disk 5, o:1, dev:sdi
> Sep 7 09:31:40 lechuck kernel: [51927.400034] disk 6, o:1, dev:sdp
> Sep 7 09:31:40 lechuck kernel: [51927.400036] disk 7, o:1, dev:sdn
> Sep 7 09:31:40 lechuck kernel: [51927.400039] disk 8, o:1, dev:sdo
> Sep 7 09:31:40 lechuck kernel: [51927.400041] disk 10, o:1, dev:sdk
> Sep 7 09:31:40 lechuck kernel: [51927.400043] disk 11, o:1, dev:sdl
> Sep 7 09:31:40 lechuck kernel: [51927.400138] md: recovery of RAID array md10
> Sep 7 09:31:40 lechuck kernel: [51927.400141] md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
> Sep 7 09:31:40 lechuck kernel: [51927.400145] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> Sep 7 09:31:40 lechuck kernel: [51927.400155] md: using 128k window, over a total of 1465138496 blocks.
> Sep 7 09:31:40 lechuck kernel: [51927.400159] md: resuming recovery of md10 from checkpoint.
> Sep 7 09:31:40 lechuck mdadm[3840]: RebuildFinished event detected on md device /dev/md10, component device mismatches found: 477544
> Sep 7 09:31:40 lechuck mdadm[3840]: RebuildStarted event detected on md device /dev/md10
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-09-08 6:40 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-09-08 6:16 3-way mirrors Michael Sallaway
2010-09-08 6:40 ` Neil Brown [this message]
2010-09-08 9:06 ` Tim Small
-- strict thread matches above, loose matches on Subject: below --
2010-09-08 7:01 Michael Sallaway
2010-09-08 9:11 ` Tim Small
2010-09-08 5:45 Michael Sallaway
2010-09-08 6:02 ` Neil Brown
2010-09-08 3:58 Michael Sallaway
2010-09-08 4:16 ` Neil Brown
2010-09-07 14:19 George Spelvin
2010-09-07 16:07 ` Iordan Iordanov
2010-09-07 18:49 ` George Spelvin
2010-09-07 19:55 ` Keld Jørn Simonsen
2010-09-07 18:31 ` Aryeh Gregor
2010-09-07 19:02 ` George Spelvin
2010-09-08 22:28 ` Bill Davidsen
2010-09-07 22:01 ` Neil Brown
2010-09-08 1:33 ` Neil Brown
2010-09-08 14:52 ` George Spelvin
2010-09-08 23:04 ` Neil Brown
2010-09-28 16:42 ` Tim Small
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100908164038.3067cc6f@notabene \
--to=neilb@suse.de \
--cc=linux-raid@vger.kernel.org \
--cc=michael@sallaway.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).