[PATCH AUTOSEL 4.19 031/128] md/raid1: end bio when the device faulty

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 4.19 031/128] md/raid1: end bio when the device faulty
       [not found] <20190922185418.2158-1-sashal@kernel.org>
@ 2019-09-22 18:52 ` Sasha Levin
  2019-09-22 18:52 ` [PATCH AUTOSEL 4.19 032/128] md: don't call spare_active in md_reap_sync_thread if all member devices can't work Sasha Levin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2019-09-22 18:52 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Yufen Yu, Song Liu, Sasha Levin, linux-raid

From: Yufen Yu <yuyufen@huawei.com>

[ Upstream commit eeba6809d8d58908b5ed1b5ceb5fcb09a98a7cad ]

When write bio return error, it would be added to conf->retry_list
and wait for raid1d thread to retry write and acknowledge badblocks.

In narrow_write_error(), the error bio will be split in the unit of
badblock shift (such as one sector) and raid1d thread issues them
one by one. Until all of the splited bio has finished, raid1d thread
can go on processing other things, which is time consuming.

But, there is a scene for error handling that is not necessary.
When the device has been set faulty, flush_bio_list() may end
bios in pending_bio_list with error status. Since these bios
has not been issued to the device actually, error handlding to
retry write and acknowledge badblocks make no sense.

Even without that scene, when the device is faulty, badblocks info
can not be written out to the device. Thus, we also no need to
handle the error IO.

Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/md/raid1.c | 26 ++++++++++++++------------
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index fa47249fa3e42..54010675df9a5 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -434,19 +434,21 @@ static void raid1_end_write_request(struct bio *bio)
 		    /* We never try FailFast to WriteMostly devices */
 		    !test_bit(WriteMostly, &rdev->flags)) {
 			md_error(r1_bio->mddev, rdev);
-			if (!test_bit(Faulty, &rdev->flags))
-				/* This is the only remaining device,
-				 * We need to retry the write without
-				 * FailFast
-				 */
-				set_bit(R1BIO_WriteError, &r1_bio->state);
-			else {
-				/* Finished with this branch */
-				r1_bio->bios[mirror] = NULL;
-				to_put = bio;
-			}
-		} else
+		}
+
+		/*
+		 * When the device is faulty, it is not necessary to
+		 * handle write error.
+		 * For failfast, this is the only remaining device,
+		 * We need to retry the write without FailFast.
+		 */
+		if (!test_bit(Faulty, &rdev->flags))
 			set_bit(R1BIO_WriteError, &r1_bio->state);
+		else {
+			/* Finished with this branch */
+			r1_bio->bios[mirror] = NULL;
+			to_put = bio;
+		}
 	} else {
 		/*
 		 * Set R1BIO_Uptodate in our master bio, so that we
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 4.19 032/128] md: don't call spare_active in md_reap_sync_thread if all member devices can't work
       [not found] <20190922185418.2158-1-sashal@kernel.org>
  2019-09-22 18:52 ` [PATCH AUTOSEL 4.19 031/128] md/raid1: end bio when the device faulty Sasha Levin
@ 2019-09-22 18:52 ` Sasha Levin
  2019-09-22 18:52 ` [PATCH AUTOSEL 4.19 033/128] md: don't set In_sync if array is frozen Sasha Levin
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2019-09-22 18:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Guoqing Jiang, Guoqing Jiang, Song Liu, Sasha Levin, linux-raid

From: Guoqing Jiang <jgq516@gmail.com>

[ Upstream commit 0d8ed0e9bf9643f27f4816dca61081784dedb38d ]

When add one disk to array, the md_reap_sync_thread is responsible
to activate the spare and set In_sync flag for the new member in
spare_active().

But if raid1 has one member disk A, and disk B is added to the array.
Then we offline A before all the datas are synchronized from A to B,
obviously B doesn't have the latest data as A, but B is still marked
with In_sync flag.

So let's not call spare_active under the condition, otherwise B is
still showed with 'U' state which is not correct.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/md/md.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index fb5d702e43b5b..73758b3679a11 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8948,7 +8948,8 @@ void md_reap_sync_thread(struct mddev *mddev)
 	/* resync has finished, collect result */
 	md_unregister_thread(&mddev->sync_thread);
 	if (!test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&
-	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
+	    !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
+	    mddev->degraded != mddev->raid_disks) {
 		/* success...*/
 		/* activate any spares */
 		if (mddev->pers->spare_active(mddev)) {
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 4.19 033/128] md: don't set In_sync if array is frozen
       [not found] <20190922185418.2158-1-sashal@kernel.org>
  2019-09-22 18:52 ` [PATCH AUTOSEL 4.19 031/128] md/raid1: end bio when the device faulty Sasha Levin
  2019-09-22 18:52 ` [PATCH AUTOSEL 4.19 032/128] md: don't call spare_active in md_reap_sync_thread if all member devices can't work Sasha Levin
@ 2019-09-22 18:52 ` Sasha Levin
  2019-09-22 18:53 ` [PATCH AUTOSEL 4.19 104/128] md/raid1: fail run raid1 array when active disk less than one Sasha Levin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2019-09-22 18:52 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Guoqing Jiang, Guoqing Jiang, Song Liu, Sasha Levin, linux-raid

From: Guoqing Jiang <jgq516@gmail.com>

[ Upstream commit 062f5b2ae12a153644c765e7ba3b0f825427be1d ]

When a disk is added to array, the following path is called in mdadm.

Manage_subdevs -> sysfs_freeze_array
               -> Manage_add
               -> sysfs_set_str(&info, NULL, "sync_action","idle")

Then from kernel side, Manage_add invokes the path (add_new_disk ->
validate_super = super_1_validate) to set In_sync flag.

Since In_sync means "device is in_sync with rest of array", and the new
added disk need to resync thread to help the synchronization of data.
And md_reap_sync_thread would call spare_active to set In_sync for the
new added disk finally. So don't set In_sync if array is in frozen.

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/md/md.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 73758b3679a11..277025784d6c0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1770,8 +1770,15 @@ static int super_1_validate(struct mddev *mddev, struct md_rdev *rdev)
 				if (!(le32_to_cpu(sb->feature_map) &
 				      MD_FEATURE_RECOVERY_BITMAP))
 					rdev->saved_raid_disk = -1;
-			} else
-				set_bit(In_sync, &rdev->flags);
+			} else {
+				/*
+				 * If the array is FROZEN, then the device can't
+				 * be in_sync with rest of array.
+				 */
+				if (!test_bit(MD_RECOVERY_FROZEN,
+					      &mddev->recovery))
+					set_bit(In_sync, &rdev->flags);
+			}
 			rdev->raid_disk = role;
 			break;
 		}
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 4.19 104/128] md/raid1: fail run raid1 array when active disk less than one
       [not found] <20190922185418.2158-1-sashal@kernel.org>
                   ` (2 preceding siblings ...)
  2019-09-22 18:52 ` [PATCH AUTOSEL 4.19 033/128] md: don't set In_sync if array is frozen Sasha Levin
@ 2019-09-22 18:53 ` Sasha Levin
  2019-09-22 18:54 ` [PATCH AUTOSEL 4.19 114/128] raid5: don't set STRIPE_HANDLE to stripe which is in batch list Sasha Levin
  2019-09-22 18:54 ` [PATCH AUTOSEL 4.19 119/128] raid5: don't increment read_errors on EILSEQ return Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2019-09-22 18:53 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Yufen Yu, NeilBrown, Song Liu, Sasha Levin, linux-raid

From: Yufen Yu <yuyufen@huawei.com>

[ Upstream commit 07f1a6850c5d5a65c917c3165692b5179ac4cb6b ]

When run test case:
  mdadm -CR /dev/md1 -l 1 -n 4 /dev/sd[a-d] --assume-clean --bitmap=internal
  mdadm -S /dev/md1
  mdadm -A /dev/md1 /dev/sd[b-c] --run --force

  mdadm --zero /dev/sda
  mdadm /dev/md1 -a /dev/sda

  echo offline > /sys/block/sdc/device/state
  echo offline > /sys/block/sdb/device/state
  sleep 5
  mdadm -S /dev/md1

  echo running > /sys/block/sdb/device/state
  echo running > /sys/block/sdc/device/state
  mdadm -A /dev/md1 /dev/sd[a-c] --run --force

mdadm run fail with kernel message as follow:
[  172.986064] md: kicking non-fresh sdb from array!
[  173.004210] md: kicking non-fresh sdc from array!
[  173.022383] md/raid1:md1: active with 0 out of 4 mirrors
[  173.022406] md1: failed to create bitmap (-5)

In fact, when active disk in raid1 array less than one, we
need to return fail in raid1_run().

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/md/raid1.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 54010675df9a5..6929d110d8048 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -3105,6 +3105,13 @@ static int raid1_run(struct mddev *mddev)
 		    !test_bit(In_sync, &conf->mirrors[i].rdev->flags) ||
 		    test_bit(Faulty, &conf->mirrors[i].rdev->flags))
 			mddev->degraded++;
+	/*
+	 * RAID1 needs at least one disk in active
+	 */
+	if (conf->raid_disks - mddev->degraded < 1) {
+		ret = -EINVAL;
+		goto abort;
+	}
 
 	if (conf->raid_disks - mddev->degraded == 1)
 		mddev->recovery_cp = MaxSector;
@@ -3138,8 +3145,12 @@ static int raid1_run(struct mddev *mddev)
 	ret =  md_integrity_register(mddev);
 	if (ret) {
 		md_unregister_thread(&mddev->thread);
-		raid1_free(mddev, conf);
+		goto abort;
 	}
+	return 0;
+
+abort:
+	raid1_free(mddev, conf);
 	return ret;
 }
 
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 4.19 114/128] raid5: don't set STRIPE_HANDLE to stripe which is in batch list
       [not found] <20190922185418.2158-1-sashal@kernel.org>
                   ` (3 preceding siblings ...)
  2019-09-22 18:53 ` [PATCH AUTOSEL 4.19 104/128] md/raid1: fail run raid1 array when active disk less than one Sasha Levin
@ 2019-09-22 18:54 ` Sasha Levin
  2019-09-22 18:54 ` [PATCH AUTOSEL 4.19 119/128] raid5: don't increment read_errors on EILSEQ return Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2019-09-22 18:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Guoqing Jiang, Song Liu, Sasha Levin, linux-raid

From: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

[ Upstream commit 6ce220dd2f8ea71d6afc29b9a7524c12e39f374a ]

If stripe in batch list is set with STRIPE_HANDLE flag, then the stripe
could be set with STRIPE_ACTIVE by the handle_stripe function. And if
error happens to the batch_head at the same time, break_stripe_batch_list
is called, then below warning could happen (the same report in [1]), it
means a member of batch list was set with STRIPE_ACTIVE.

[7028915.431770] stripe state: 2001
[7028915.431815] ------------[ cut here ]------------
[7028915.431828] WARNING: CPU: 18 PID: 29089 at drivers/md/raid5.c:4614 break_stripe_batch_list+0x203/0x240 [raid456]
[...]
[7028915.431879] CPU: 18 PID: 29089 Comm: kworker/u82:5 Tainted: G           O    4.14.86-1-storage #4.14.86-1.2~deb9
[7028915.431881] Hardware name: Supermicro SSG-2028R-ACR24L/X10DRH-iT, BIOS 3.1 06/18/2018
[7028915.431888] Workqueue: raid5wq raid5_do_work [raid456]
[7028915.431890] task: ffff9ab0ef36d7c0 task.stack: ffffb72926f84000
[7028915.431896] RIP: 0010:break_stripe_batch_list+0x203/0x240 [raid456]
[7028915.431898] RSP: 0018:ffffb72926f87ba8 EFLAGS: 00010286
[7028915.431900] RAX: 0000000000000012 RBX: ffff9aaa84a98000 RCX: 0000000000000000
[7028915.431901] RDX: 0000000000000000 RSI: ffff9ab2bfa15458 RDI: ffff9ab2bfa15458
[7028915.431902] RBP: ffff9aaa8fb4e900 R08: 0000000000000001 R09: 0000000000002eb4
[7028915.431903] R10: 00000000ffffffff R11: 0000000000000000 R12: ffff9ab1736f1b00
[7028915.431904] R13: 0000000000000000 R14: ffff9aaa8fb4e900 R15: 0000000000000001
[7028915.431906] FS:  0000000000000000(0000) GS:ffff9ab2bfa00000(0000) knlGS:0000000000000000
[7028915.431907] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[7028915.431908] CR2: 00007ff953b9f5d8 CR3: 0000000bf4009002 CR4: 00000000003606e0
[7028915.431909] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[7028915.431910] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[7028915.431910] Call Trace:
[7028915.431923]  handle_stripe+0x8e7/0x2020 [raid456]
[7028915.431930]  ? __wake_up_common_lock+0x89/0xc0
[7028915.431935]  handle_active_stripes.isra.58+0x35f/0x560 [raid456]
[7028915.431939]  raid5_do_work+0xc6/0x1f0 [raid456]

Also commit 59fc630b8b5f9f ("RAID5: batch adjacent full stripe write")
said "If a stripe is added to batch list, then only the first stripe
of the list should be put to handle_list and run handle_stripe."

So don't set STRIPE_HANDLE to stripe which is already in batch list,
otherwise the stripe could be put to handle_list and run handle_stripe,
then the above warning could be triggered.

[1]. https://www.spinics.net/lists/raid/msg62552.html

Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/md/raid5.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a147619498dfb..d26e5e9bea427 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5721,7 +5721,8 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
 				do_flush = false;
 			}
 
-			set_bit(STRIPE_HANDLE, &sh->state);
+			if (!sh->batch_head)
+				set_bit(STRIPE_HANDLE, &sh->state);
 			clear_bit(STRIPE_DELAYED, &sh->state);
 			if ((!sh->batch_head || sh == sh->batch_head) &&
 			    (bi->bi_opf & REQ_SYNC) &&
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH AUTOSEL 4.19 119/128] raid5: don't increment read_errors on EILSEQ return
       [not found] <20190922185418.2158-1-sashal@kernel.org>
                   ` (4 preceding siblings ...)
  2019-09-22 18:54 ` [PATCH AUTOSEL 4.19 114/128] raid5: don't set STRIPE_HANDLE to stripe which is in batch list Sasha Levin
@ 2019-09-22 18:54 ` Sasha Levin
  5 siblings, 0 replies; 6+ messages in thread
From: Sasha Levin @ 2019-09-22 18:54 UTC (permalink / raw)
  To: linux-kernel, stable; +Cc: Nigel Croxon, Song Liu, Sasha Levin, linux-raid

From: Nigel Croxon <ncroxon@redhat.com>

[ Upstream commit b76b4715eba0d0ed574f58918b29c1b2f0fa37a8 ]

While MD continues to count read errors returned by the lower layer.
If those errors are -EILSEQ, instead of -EIO, it should NOT increase
the read_errors count.

When RAID6 is set up on dm-integrity target that detects massive
corruption, the leg will be ejected from the array.  Even if the
issue is correctable with a sector re-write and the array has
necessary redundancy to correct it.

The leg is ejected because it runs up the rdev->read_errors beyond
conf->max_nr_stripes.  The return status in dm-drypt when there is
a data integrity error is -EILSEQ (BLK_STS_PROTECTION).

Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/md/raid5.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index d26e5e9bea427..dbc4655a95768 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -2540,7 +2540,8 @@ static void raid5_end_read_request(struct bio * bi)
 		int set_bad = 0;
 
 		clear_bit(R5_UPTODATE, &sh->dev[i].flags);
-		atomic_inc(&rdev->read_errors);
+		if (!(bi->bi_status == BLK_STS_PROTECTION))
+			atomic_inc(&rdev->read_errors);
 		if (test_bit(R5_ReadRepl, &sh->dev[i].flags))
 			pr_warn_ratelimited(
 				"md/raid:%s: read error on replacement device (sector %llu on %s).\n",
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-09-22 18:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20190922185418.2158-1-sashal@kernel.org>
2019-09-22 18:52 ` [PATCH AUTOSEL 4.19 031/128] md/raid1: end bio when the device faulty Sasha Levin
2019-09-22 18:52 ` [PATCH AUTOSEL 4.19 032/128] md: don't call spare_active in md_reap_sync_thread if all member devices can't work Sasha Levin
2019-09-22 18:52 ` [PATCH AUTOSEL 4.19 033/128] md: don't set In_sync if array is frozen Sasha Levin
2019-09-22 18:53 ` [PATCH AUTOSEL 4.19 104/128] md/raid1: fail run raid1 array when active disk less than one Sasha Levin
2019-09-22 18:54 ` [PATCH AUTOSEL 4.19 114/128] raid5: don't set STRIPE_HANDLE to stripe which is in batch list Sasha Levin
2019-09-22 18:54 ` [PATCH AUTOSEL 4.19 119/128] raid5: don't increment read_errors on EILSEQ return Sasha Levin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).