From: "H. Peter Anvin" <hpa@zytor.com>
To: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: NeilBrown <neilb@suse.de>,
Joe Lawrence <joe.lawrence@stratus.com>,
Dan Williams <dan.j.williams@gmail.com>,
linux-raid <linux-raid@vger.kernel.org>
Subject: Re: RAID-10 keeps aborting
Date: Wed, 12 Jun 2013 07:45:16 -0700 [thread overview]
Message-ID: <51B8897C.8000801@zytor.com> (raw)
In-Reply-To: <yq1txl3iejr.fsf@sermon.lab.mkp.net>
[-- Attachment #1: Type: text/plain, Size: 504 bytes --]
On 06/12/2013 07:34 AM, Martin K. Petersen wrote:
>>>>>> "hpa" == H Peter Anvin <hpa@zytor.com> writes:
>
> hpa> The second question is if we should disable WRITE SAME for raid1/10
> hpa> (what about raid0?) for 3.10/stable or if your patch really is
> hpa> sufficient... "just adds another heuristic" makes me nervous.
>
> I think we should disable 1+10 in stable until we get the recovery
> scenario sorted out.
>
> I don't believe there are any problems with raid0.
>
How does this look?
-hpa
[-- Attachment #2: 0001-raid1-10-Disable-WRITE-SAME-until-a-recovery-strateg.patch --]
[-- Type: text/x-patch, Size: 2452 bytes --]
From ac28be1574a6187f4f26bd75217059bf17b13560 Mon Sep 17 00:00:00 2001
From: "H. Peter Anvin" <hpa@zytor.com>
Date: Wed, 12 Jun 2013 07:37:43 -0700
Subject: [PATCH] raid1,10: Disable WRITE SAME until a recovery strategy is in
place
There are cases where the kernel will believe that the WRITE SAME
command is supported by a block device which does not, in fact,
support WRITE SAME. This currently happens for SATA drivers behind a
SAS controller, but there are probably a hundred other ways that can
happen, including drive firmware bugs.
After receiving an error for WRITE SAME the block layer will retry the
request as a plain write of zeroes, but mdraid will consider the
failure as fatal and consider the drive failed. This has the effect
that all the mirrors containing a specific set of data are each
offlined in very rapid succession resulting in data loss.
However, just bouncing the request back up to the block layer isn't
ideal either, because the whole initial request-retry sequence should
be inside the write bitmap fence, which probably means that md needs
to do its own conversion of WRITE SAME to write zero.
Until the failure scenario has been sorted out, disable WRITE SAME for
raid1 and raid10.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
---
drivers/md/raid1.c | 4 ++--
drivers/md/raid10.c | 3 +--
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 5595118..914ca0a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2780,8 +2780,8 @@ static int run(struct mddev *mddev)
return PTR_ERR(conf);
if (mddev->queue)
- blk_queue_max_write_same_sectors(mddev->queue,
- mddev->chunk_sectors);
+ blk_queue_max_write_same_sectors(mddev->queue, 0);
+
rdev_for_each(rdev, mddev) {
if (!mddev->gendisk)
continue;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 59d4daa..807ace8 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3609,8 +3609,7 @@ static int run(struct mddev *mddev)
if (mddev->queue) {
blk_queue_max_discard_sectors(mddev->queue,
mddev->chunk_sectors);
- blk_queue_max_write_same_sectors(mddev->queue,
- mddev->chunk_sectors);
+ blk_queue_max_write_same_sectors(mddev->queue, 0);
blk_queue_io_min(mddev->queue, chunk_size);
if (conf->geo.raid_disks % conf->geo.near_copies)
blk_queue_io_opt(mddev->queue, chunk_size * conf->geo.raid_disks);
--
1.7.11.7
next prev parent reply other threads:[~2013-06-12 14:45 UTC|newest]
Thread overview: 64+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-06-03 3:57 RAID-10 keeps aborting H. Peter Anvin
2013-06-03 4:05 ` H. Peter Anvin
2013-06-03 5:47 ` Dan Williams
2013-06-03 6:06 ` H. Peter Anvin
2013-06-03 6:14 ` Dan Williams
2013-06-03 6:30 ` H. Peter Anvin
2013-06-03 14:39 ` H. Peter Anvin
2013-06-11 16:47 ` Joe Lawrence
2013-06-11 17:12 ` H. Peter Anvin
2013-06-03 15:47 ` H. Peter Anvin
2013-06-03 16:09 ` Joe Lawrence
2013-06-03 17:22 ` Dan Williams
2013-06-03 17:40 ` H. Peter Anvin
2013-06-03 18:35 ` Martin K. Petersen
2013-06-03 18:38 ` H. Peter Anvin
2013-06-03 18:40 ` H. Peter Anvin
2013-06-03 22:20 ` H. Peter Anvin
2013-06-03 22:34 ` H. Peter Anvin
2013-06-04 15:56 ` Martin K. Petersen
2013-06-03 23:19 ` H. Peter Anvin
2013-06-04 15:39 ` Joe Lawrence
2013-06-04 15:46 ` H. Peter Anvin
2013-06-04 15:54 ` Martin K. Petersen
2013-06-05 10:02 ` Bernd Schubert
2013-06-05 11:38 ` Bernd Schubert
2013-06-05 12:53 ` [PATCH] scsi: Check if the device support WRITE_SAME_10 Bernd Schubert
2013-06-05 19:14 ` Martin K. Petersen
2013-06-05 20:09 ` Bernd Schubert
2013-06-07 2:15 ` Martin K. Petersen
2013-06-12 19:34 ` Bernd Schubert
2013-06-05 19:11 ` RAID-10 keeps aborting Martin K. Petersen
2013-06-04 17:36 ` Dan Williams
2013-06-04 17:54 ` Martin K. Petersen
2013-06-04 17:57 ` H. Peter Anvin
2013-06-04 18:04 ` Martin K. Petersen
2013-06-04 18:32 ` Dan Williams
2013-06-04 18:38 ` H. Peter Anvin
2013-06-04 18:56 ` Dan Williams
2013-06-05 2:39 ` H. Peter Anvin
[not found] ` <(H.>
[not found] ` <Peter>
[not found] ` <Anvin's>
[not found] ` <message>
[not found] ` <of>
[not found] ` <"Wed>
[not found] ` <"Thu>
[not found] ` <"Tue>
[not found] ` <04>
[not found] ` <Jun>
[not found] ` <2013>
[not found] ` <14:27:47>
[not found] ` <-0400")>
2013-06-07 2:19 ` Martin K. Petersen
2013-06-10 14:15 ` Joe Lawrence
2013-06-12 3:15 ` NeilBrown
2013-06-12 4:07 ` H. Peter Anvin
2013-06-12 6:29 ` Bernd Schubert
2013-06-12 10:22 ` Joe Lawrence
2013-06-12 14:28 ` Martin K. Petersen
2013-06-12 14:25 ` Martin K. Petersen
2013-06-12 14:29 ` H. Peter Anvin
2013-06-12 14:34 ` Martin K. Petersen
2013-06-12 14:37 ` H. Peter Anvin
2013-06-12 14:45 ` H. Peter Anvin [this message]
[not found] ` <5AA430FFE4486C448003201AC83BC85E0360CE3F@EXHQ.corp.stratus! .com>
[not found] ` <5AA430FFE4486C448003201AC83BC85E0360CE3F@EXHQ.corp.stratus.com>
2013-06-12 15:58 ` H. Peter Anvin
2013-06-13 3:10 ` NeilBrown
2013-06-13 3:13 ` H. Peter Anvin
2013-06-13 3:31 ` NeilBrown
2013-06-13 21:40 ` Martin K. Petersen
2013-06-13 2:45 ` Joe Lawrence
2013-06-13 3:11 ` NeilBrown
[not found] ` <19:39:58>
[not found] ` <-0700")>
2013-06-05 19:29 ` Martin K. Petersen
2013-06-06 18:27 ` Joe Lawrence
[not found] ` <(Joe>
2013-06-06 18:36 ` H. Peter Anvin
2013-06-12 14:43 ` Martin K. Petersen
2013-06-11 21:50 ` Joe Lawrence
2013-06-11 21:53 ` H. Peter Anvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=51B8897C.8000801@zytor.com \
--to=hpa@zytor.com \
--cc=dan.j.williams@gmail.com \
--cc=joe.lawrence@stratus.com \
--cc=linux-raid@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).