[PATCH 1/2] RAID1: ignore discard error

stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] RAID1: ignore discard error
@ 2016-10-06 21:20 Shaohua Li
  2016-10-06 21:20 ` [PATCH 2/2] RAID10: " Shaohua Li
  2016-10-07 10:53 ` [PATCH 1/2] RAID1: " Sitsofe Wheeler
  0 siblings, 2 replies; 4+ messages in thread
From: Shaohua Li @ 2016-10-06 21:20 UTC (permalink / raw)
  To: linux-raid; +Cc: Kernel-team, sitsofe, v3.6

Sitsofe,
could you try if this patch fixes the issue too please?

Thanks,
Shaohua


If a write error occurs, raid1 will try to rewrite the bio in small
chunk size. If the rewrite fails, raid1 will record the error in bad
block. narrow_write_error will always use WRITE for the bio, but
actually it could be a discard. Since discard bio hasn't payload, write
the bio will cause different issues. But discard error isn't fatal, we
can safely ignore it. This is what this patch does.

This issue should exist since discard is added, but only exposed with
recent arbitrary bio size feature.

Reported-by: Sitsofe Wheeler <sitsofe@gmail.com>
Cc: stable@vger.kernel.org (v3.6)
Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/raid1.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 21dc00e..95bf4cd 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -407,11 +407,14 @@ static void raid1_end_write_request(struct bio *bio)
 	struct bio *to_put = NULL;
 	int mirror = find_bio_disk(r1_bio, bio);
 	struct md_rdev *rdev = conf->mirrors[mirror].rdev;
+	bool discard_error;
+
+	discard_error = bio->bi_error && bio_op(bio) == REQ_OP_DISCARD;
 
 	/*
 	 * 'one mirror IO has finished' event handler:
 	 */
-	if (bio->bi_error) {
+	if (bio->bi_error && !discard_error) {
 		set_bit(WriteErrorSeen,	&rdev->flags);
 		if (!test_and_set_bit(WantReplacement, &rdev->flags))
 			set_bit(MD_RECOVERY_NEEDED, &
@@ -448,7 +451,7 @@ static void raid1_end_write_request(struct bio *bio)
 
 		/* Maybe we can clear some bad blocks. */
 		if (is_badblock(rdev, r1_bio->sector, r1_bio->sectors,
-				&first_bad, &bad_sectors)) {
+				&first_bad, &bad_sectors) && !discard_error) {
 			r1_bio->bios[mirror] = IO_MADE_GOOD;
 			set_bit(R1BIO_MadeGood, &r1_bio->state);
 		}
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] RAID10: ignore discard error
  2016-10-06 21:20 [PATCH 1/2] RAID1: ignore discard error Shaohua Li
@ 2016-10-06 21:20 ` Shaohua Li
  2016-10-07 10:53 ` [PATCH 1/2] RAID1: " Sitsofe Wheeler
  1 sibling, 0 replies; 4+ messages in thread
From: Shaohua Li @ 2016-10-06 21:20 UTC (permalink / raw)
  To: linux-raid; +Cc: Kernel-team, sitsofe, v3.6

This is the counterpart of raid10 fix. If a write error occurs, raid10
will try to rewrite the bio in small chunk size. If the rewrite fails,
raid10 will record the error in bad block. narrow_write_error will
always use WRITE for the bio, but actually it could be a discard. Since
discard bio hasn't payload, write the bio will cause different issues.
But discard error isn't fatal, we can safely ignore it. This is what
this patch does.

This issue should exist since discard is added, but only exposed with
recent arbitrary bio size feature.

Cc: Sitsofe Wheeler <sitsofe@gmail.com>
Cc: stable@vger.kernel.org (v3.6)
Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/raid10.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index be1a9fc..39fddda 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -447,6 +447,9 @@ static void raid10_end_write_request(struct bio *bio)
 	struct r10conf *conf = r10_bio->mddev->private;
 	int slot, repl;
 	struct md_rdev *rdev = NULL;
+	bool discard_error;
+
+	discard_error = bio->bi_error && bio_op(bio) == REQ_OP_DISCARD;
 
 	dev = find_bio_disk(conf, r10_bio, bio, &slot, &repl);
 
@@ -460,7 +463,7 @@ static void raid10_end_write_request(struct bio *bio)
 	/*
 	 * this branch is our 'one mirror IO has finished' event handler:
 	 */
-	if (bio->bi_error) {
+	if (bio->bi_error && !discard_error) {
 		if (repl)
 			/* Never record new bad blocks to replacement,
 			 * just fail it.
@@ -503,7 +506,7 @@ static void raid10_end_write_request(struct bio *bio)
 		if (is_badblock(rdev,
 				r10_bio->devs[slot].addr,
 				r10_bio->sectors,
-				&first_bad, &bad_sectors)) {
+				&first_bad, &bad_sectors) && !discard_error) {
 			bio_put(bio);
 			if (repl)
 				r10_bio->devs[slot].repl_bio = IO_MADE_GOOD;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] RAID1: ignore discard error
  2016-10-06 21:20 [PATCH 1/2] RAID1: ignore discard error Shaohua Li
  2016-10-06 21:20 ` [PATCH 2/2] RAID10: " Shaohua Li
@ 2016-10-07 10:53 ` Sitsofe Wheeler
  2016-10-07 17:42   ` Shaohua Li
  1 sibling, 1 reply; 4+ messages in thread
From: Sitsofe Wheeler @ 2016-10-07 10:53 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid, Kernel-team, v3.6

On 6 October 2016 at 22:20, Shaohua Li <shli@fb.com> wrote:
> Sitsofe,
> could you try if this patch fixes the issue too please?

As with with the previous patch the BUG_ON/crash is fixed and as
before dmesg only says this now:
[   17.884175] sd 0:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[   17.884180] sd 0:0:1:0: [sdb] tag#0 Sense Key : Illegal Request [current]
[   17.884185] sd 0:0:1:0: [sdb] tag#0 Add. Sense: Invalid command
operation code
[   17.884190] sd 0:0:1:0: [sdb] tag#0 CDB: Write same(16) 93 08 00 00
00 00 00 00 40 80 00 00 08 00 00 00
[   17.884196] blk_update_request: critical target error, dev sdb, sector 16512
[   17.884205] sd 0:0:2:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[   17.884208] sd 0:0:2:0: [sdc] tag#1 Sense Key : Illegal Request [current]
[   17.884211] sd 0:0:2:0: [sdc] tag#1 Add. Sense: Invalid command
operation code
[   17.884215] sd 0:0:2:0: [sdc] tag#1 CDB: Write same(16) 93 08 00 00
00 00 00 00 40 80 00 00 08 00 00 00
[   17.884217] blk_update_request: critical target error, dev sdc, sector 16512
[   17.924716] Buffer I/O error on dev dm-0, logical block 0, async page read

(sd[bc] are the devices the RAID is using)

However this time around blkdiscard doesn't return an error (despite
the discard not being done) and is quiet. It's hard to say if this is
desirable given that the underlying devices switch to saying they
don't support discard but perhaps that's part of a different
discussion.

Tested-by: Sitsofe Wheeler <sitsofe@gmail.com>

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] RAID1: ignore discard error
  2016-10-07 10:53 ` [PATCH 1/2] RAID1: " Sitsofe Wheeler
@ 2016-10-07 17:42   ` Shaohua Li
  0 siblings, 0 replies; 4+ messages in thread
From: Shaohua Li @ 2016-10-07 17:42 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Shaohua Li, linux-raid, Kernel-team, v3.6

On Fri, Oct 07, 2016 at 11:53:07AM +0100, Sitsofe Wheeler wrote:
> On 6 October 2016 at 22:20, Shaohua Li <shli@fb.com> wrote:
> > Sitsofe,
> > could you try if this patch fixes the issue too please?
> 
> As with with the previous patch the BUG_ON/crash is fixed and as
> before dmesg only says this now:
> [   17.884175] sd 0:0:1:0: [sdb] tag#0 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [   17.884180] sd 0:0:1:0: [sdb] tag#0 Sense Key : Illegal Request [current]
> [   17.884185] sd 0:0:1:0: [sdb] tag#0 Add. Sense: Invalid command
> operation code
> [   17.884190] sd 0:0:1:0: [sdb] tag#0 CDB: Write same(16) 93 08 00 00
> 00 00 00 00 40 80 00 00 08 00 00 00
> [   17.884196] blk_update_request: critical target error, dev sdb, sector 16512
> [   17.884205] sd 0:0:2:0: [sdc] tag#1 FAILED Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [   17.884208] sd 0:0:2:0: [sdc] tag#1 Sense Key : Illegal Request [current]
> [   17.884211] sd 0:0:2:0: [sdc] tag#1 Add. Sense: Invalid command
> operation code
> [   17.884215] sd 0:0:2:0: [sdc] tag#1 CDB: Write same(16) 93 08 00 00
> 00 00 00 00 40 80 00 00 08 00 00 00
> [   17.884217] blk_update_request: critical target error, dev sdc, sector 16512
> [   17.924716] Buffer I/O error on dev dm-0, logical block 0, async page read
> 
> (sd[bc] are the devices the RAID is using)
> 
> However this time around blkdiscard doesn't return an error (despite
> the discard not being done) and is quiet. It's hard to say if this is
> desirable given that the underlying devices switch to saying they
> don't support discard but perhaps that's part of a different
> discussion.

Yep, I think the spec doesn't require discard IO returns error. We probably can
make this smarter later in MD

> Tested-by: Sitsofe Wheeler <sitsofe@gmail.com>

Thanks for the testing, I'll queue the two patches.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-10-07 17:42 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-06 21:20 [PATCH 1/2] RAID1: ignore discard error Shaohua Li
2016-10-06 21:20 ` [PATCH 2/2] RAID10: " Shaohua Li
2016-10-07 10:53 ` [PATCH 1/2] RAID1: " Sitsofe Wheeler
2016-10-07 17:42   ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).