linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] raid10 bugfix
@ 2023-07-01  8:05 linan666
  2023-07-01  8:05 ` [PATCH v2 1/3] md/raid10: check replacement and rdev to prevent submit the same io twice linan666
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: linan666 @ 2023-07-01  8:05 UTC (permalink / raw)
  To: song, guoqing.jiang, xni, colyli
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

Changes in v2:
 - patch 2/3, change function name to dereference_rdev_and_rrdev. Return
   rdev to reduce output argument.

Li Nan (3):
  md/raid10: check replacement and rdev to prevent submit the same io
    twice
  md/raid10: factor out dereference_rdev_and_rrdev()
  md/raid10: use dereference_rdev_and_rrdev() to get devices

 drivers/md/raid10.c | 42 +++++++++++++++++++++++++-----------------
 1 file changed, 25 insertions(+), 17 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2 1/3] md/raid10: check replacement and rdev to prevent submit the same io twice
  2023-07-01  8:05 [PATCH v2 0/3] raid10 bugfix linan666
@ 2023-07-01  8:05 ` linan666
  2023-07-01  8:05 ` [PATCH v2 2/3] md/raid10: factor out dereference_rdev_and_rrdev() linan666
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: linan666 @ 2023-07-01  8:05 UTC (permalink / raw)
  To: song, guoqing.jiang, xni, colyli
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

After commit 4ca40c2ce099 ("md/raid10: Allow replacement device to be
replace old drive."), 'rdev' and 'replacement' could appear to be
identical. There are already checks for that in wait_blocked_dev() and
raid10_write_request(). Add check for raid10_handle_discard() now.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index fabc340aae4f..3e6a09aaaba6 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1811,6 +1811,8 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 		r10_bio->devs[disk].bio = NULL;
 		r10_bio->devs[disk].repl_bio = NULL;
 
+		if (rdev == rrdev)
+			rrdev = NULL;
 		if (rdev && (test_bit(Faulty, &rdev->flags)))
 			rdev = NULL;
 		if (rrdev && (test_bit(Faulty, &rrdev->flags)))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 2/3] md/raid10: factor out dereference_rdev_and_rrdev()
  2023-07-01  8:05 [PATCH v2 0/3] raid10 bugfix linan666
  2023-07-01  8:05 ` [PATCH v2 1/3] md/raid10: check replacement and rdev to prevent submit the same io twice linan666
@ 2023-07-01  8:05 ` linan666
  2023-07-01  8:05 ` [PATCH v2 3/3] md/raid10: use dereference_rdev_and_rrdev() to get devices linan666
  2023-07-07  9:14 ` [PATCH v2 0/3] raid10 bugfix Song Liu
  3 siblings, 0 replies; 5+ messages in thread
From: linan666 @ 2023-07-01  8:05 UTC (permalink / raw)
  To: song, guoqing.jiang, xni, colyli
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

Factor out a helper to get 'rdev' and 'replacement' from config->mirrors.
Just to make code cleaner and prepare to fix the bug of io loss while
'replacement' replace 'rdev'.

There is no functional change.

Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 3e6a09aaaba6..a6c3806be903 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1346,6 +1346,25 @@ static void raid10_write_one_disk(struct mddev *mddev, struct r10bio *r10_bio,
 	}
 }
 
+static struct md_rdev *dereference_rdev_and_rrdev(struct raid10_info *mirror,
+						  struct md_rdev **prrdev)
+{
+	struct md_rdev *rdev, *rrdev;
+
+	rrdev = rcu_dereference(mirror->replacement);
+	/*
+	 * Read replacement first to prevent reading both rdev and
+	 * replacement as NULL during replacement replace rdev.
+	 */
+	smp_mb();
+	rdev = rcu_dereference(mirror->rdev);
+	if (rdev == rrdev)
+		rrdev = NULL;
+
+	*prrdev = rrdev;
+	return rdev;
+}
+
 static void wait_blocked_dev(struct mddev *mddev, struct r10bio *r10_bio)
 {
 	int i;
@@ -1489,15 +1508,7 @@ static void raid10_write_request(struct mddev *mddev, struct bio *bio,
 		int d = r10_bio->devs[i].devnum;
 		struct md_rdev *rdev, *rrdev;
 
-		rrdev = rcu_dereference(conf->mirrors[d].replacement);
-		/*
-		 * Read replacement first to prevent reading both rdev and
-		 * replacement as NULL during replacement replace rdev.
-		 */
-		smp_mb();
-		rdev = rcu_dereference(conf->mirrors[d].rdev);
-		if (rdev == rrdev)
-			rrdev = NULL;
+		rdev = dereference_rdev_and_rrdev(&conf->mirrors[d], &rrdev);
 		if (rdev && (test_bit(Faulty, &rdev->flags)))
 			rdev = NULL;
 		if (rrdev && (test_bit(Faulty, &rrdev->flags)))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2 3/3] md/raid10: use dereference_rdev_and_rrdev() to get devices
  2023-07-01  8:05 [PATCH v2 0/3] raid10 bugfix linan666
  2023-07-01  8:05 ` [PATCH v2 1/3] md/raid10: check replacement and rdev to prevent submit the same io twice linan666
  2023-07-01  8:05 ` [PATCH v2 2/3] md/raid10: factor out dereference_rdev_and_rrdev() linan666
@ 2023-07-01  8:05 ` linan666
  2023-07-07  9:14 ` [PATCH v2 0/3] raid10 bugfix Song Liu
  3 siblings, 0 replies; 5+ messages in thread
From: linan666 @ 2023-07-01  8:05 UTC (permalink / raw)
  To: song, guoqing.jiang, xni, colyli
  Cc: linux-raid, linux-kernel, linan122, yukuai3, yi.zhang, houtao1,
	yangerkun

From: Li Nan <linan122@huawei.com>

Commit 2ae6aaf76912 ("md/raid10: fix io loss while replacement replace
rdev") reads replacement first to prevent io loss. However, there are same
issue in wait_blocked_dev() and raid10_handle_discard(), too. Fix it by
using dereference_rdev_and_rrdev() to get devices.

Fixes: d30588b2731f ("md/raid10: improve raid10 discard request")
Fixes: f2e7e269a752 ("md/raid10: pull the code that wait for blocked dev into one function")
Signed-off-by: Li Nan <linan122@huawei.com>
---
 drivers/md/raid10.c | 15 +++++----------
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a6c3806be903..cbaec6fce1d7 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1375,11 +1375,9 @@ static void wait_blocked_dev(struct mddev *mddev, struct r10bio *r10_bio)
 	blocked_rdev = NULL;
 	rcu_read_lock();
 	for (i = 0; i < conf->copies; i++) {
-		struct md_rdev *rdev = rcu_dereference(conf->mirrors[i].rdev);
-		struct md_rdev *rrdev = rcu_dereference(
-			conf->mirrors[i].replacement);
-		if (rdev == rrdev)
-			rrdev = NULL;
+		struct md_rdev *rdev, *rrdev;
+
+		rdev = dereference_rdev_and_rrdev(&conf->mirrors[i], &rrdev);
 		if (rdev && unlikely(test_bit(Blocked, &rdev->flags))) {
 			atomic_inc(&rdev->nr_pending);
 			blocked_rdev = rdev;
@@ -1815,15 +1813,12 @@ static int raid10_handle_discard(struct mddev *mddev, struct bio *bio)
 	 */
 	rcu_read_lock();
 	for (disk = 0; disk < geo->raid_disks; disk++) {
-		struct md_rdev *rdev = rcu_dereference(conf->mirrors[disk].rdev);
-		struct md_rdev *rrdev = rcu_dereference(
-			conf->mirrors[disk].replacement);
+		struct md_rdev *rdev, *rrdev;
 
+		rdev = dereference_rdev_and_rrdev(&conf->mirrors[disk], &rrdev);
 		r10_bio->devs[disk].bio = NULL;
 		r10_bio->devs[disk].repl_bio = NULL;
 
-		if (rdev == rrdev)
-			rrdev = NULL;
 		if (rdev && (test_bit(Faulty, &rdev->flags)))
 			rdev = NULL;
 		if (rrdev && (test_bit(Faulty, &rrdev->flags)))
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2 0/3] raid10 bugfix
  2023-07-01  8:05 [PATCH v2 0/3] raid10 bugfix linan666
                   ` (2 preceding siblings ...)
  2023-07-01  8:05 ` [PATCH v2 3/3] md/raid10: use dereference_rdev_and_rrdev() to get devices linan666
@ 2023-07-07  9:14 ` Song Liu
  3 siblings, 0 replies; 5+ messages in thread
From: Song Liu @ 2023-07-07  9:14 UTC (permalink / raw)
  To: linan666
  Cc: guoqing.jiang, xni, colyli, linux-raid, linux-kernel, linan122,
	yukuai3, yi.zhang, houtao1, yangerkun

On Sat, Jul 1, 2023 at 4:06 PM <linan666@huaweicloud.com> wrote:
>
> From: Li Nan <linan122@huawei.com>
>
> Changes in v2:
>  - patch 2/3, change function name to dereference_rdev_and_rrdev. Return
>    rdev to reduce output argument.
>
> Li Nan (3):
>   md/raid10: check replacement and rdev to prevent submit the same io
>     twice
>   md/raid10: factor out dereference_rdev_and_rrdev()
>   md/raid10: use dereference_rdev_and_rrdev() to get devices

Applied to md-next. Thanks!
Song

>
>  drivers/md/raid10.c | 42 +++++++++++++++++++++++++-----------------
>  1 file changed, 25 insertions(+), 17 deletions(-)
>
> --
> 2.39.2
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-07  9:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-01  8:05 [PATCH v2 0/3] raid10 bugfix linan666
2023-07-01  8:05 ` [PATCH v2 1/3] md/raid10: check replacement and rdev to prevent submit the same io twice linan666
2023-07-01  8:05 ` [PATCH v2 2/3] md/raid10: factor out dereference_rdev_and_rrdev() linan666
2023-07-01  8:05 ` [PATCH v2 3/3] md/raid10: use dereference_rdev_and_rrdev() to get devices linan666
2023-07-07  9:14 ` [PATCH v2 0/3] raid10 bugfix Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).