linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Misc changes for md
@ 2021-10-04 15:34 Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

Hello,

The first patch fixes the same calltrace as commit 6607cd319b6b ("raid1:
ensure write behind bio has less than BIO_MAX_VECS sectors") tried 
before, but unfortunately the calltrace still could happen if array
without write mostly device is configured with write-behind enabled.
So the first patch is suitable for fix branch which others are materials
for next branch.

Pls review.

Thanks,
Guoqing

Guoqing Jiang (6):
  md/raid1: only allocate write behind bio for WriteMostly device
  md/bitmap: don't set max_write_behind if there is no write mostly
    device
  md/raid1: use rdev in raid1_write_request directly
  md/raid10: add 'read_err' to raid10_read_request
  md/raid5: call roundup_pow_of_two in raid5_run
  md: remove unused argument from md_new_event

 drivers/md/md-bitmap.c | 17 +++++++++++++++++
 drivers/md/md.c        | 30 +++++++++++++++---------------
 drivers/md/md.h        |  2 +-
 drivers/md/raid1.c     | 13 ++++++-------
 drivers/md/raid10.c    | 10 +++++-----
 drivers/md/raid5.c     |  7 ++-----
 6 files changed, 46 insertions(+), 33 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-05  1:05   ` Guoqing Jiang
  2021-10-05  5:55   ` Jens Stutte
  2021-10-04 15:34 ` [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device Guoqing Jiang
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

Commit 6607cd319b6b91bff94e90f798a61c031650b514 ("raid1: ensure write
behind bio has less than BIO_MAX_VECS sectors") tried to guarantee the
size of behind bio is not bigger than BIO_MAX_VECS sectors.

Unfortunately the same calltrace still could happen since an array could
enable write-behind without write mostly device.

To match the manpage of mdadm (which says "write-behind is only attempted
on drives marked as write-mostly"), we need to check WriteMostly flag to
avoid such unexpected behavior.

[1]. https://bugzilla.kernel.org/show_bug.cgi?id=213181#c25

Cc: stable@vger.kernel.org # v5.12+
Cc: Jens Stutte <jens@chianterastutte.eu>
Reported-by: Jens Stutte <jens@chianterastutte.eu>
Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/raid1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 19598bd38939..6ba12f0f0f03 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1496,7 +1496,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 		if (!r1_bio->bios[i])
 			continue;
 
-		if (first_clone) {
+		if (first_clone && test_bit(WriteMostly, &rdev->flags)) {
 			/* do behind I/O ?
 			 * Not if there are too many, or cannot
 			 * allocate memory, or a reader on WriteMostly
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-07  6:25   ` Song Liu
  2021-10-04 15:34 ` [PATCH 3/6] md/raid1: use rdev in raid1_write_request directly Guoqing Jiang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

We shouldn't set it since write behind IO should only happen to write
mostly device.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/md-bitmap.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
index e29c6298ef5c..0346281b1555 100644
--- a/drivers/md/md-bitmap.c
+++ b/drivers/md/md-bitmap.c
@@ -2469,11 +2469,28 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
 {
 	unsigned long backlog;
 	unsigned long old_mwb = mddev->bitmap_info.max_write_behind;
+	struct md_rdev *rdev;
+	bool has_write_mostly = false;
 	int rv = kstrtoul(buf, 10, &backlog);
 	if (rv)
 		return rv;
 	if (backlog > COUNTER_MAX)
 		return -EINVAL;
+
+	/*
+	 * Without write mostly device, it doesn't make sense to set
+	 * backlog for max_write_behind.
+	 */
+	rdev_for_each(rdev, mddev)
+		if (test_bit(WriteMostly, &rdev->flags)) {
+			has_write_mostly = true;
+			break;
+		}
+	if (!has_write_mostly) {
+		pr_warn_ratelimited("md: No write mostly device available\n");
+		return -EINVAL;
+	}
+
 	mddev->bitmap_info.max_write_behind = backlog;
 	if (!backlog && mddev->serial_info_pool) {
 		/* serial_info_pool is not needed if backlog is zero */
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/6] md/raid1: use rdev in raid1_write_request directly
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request Guoqing Jiang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

We already get rdev from conf->mirrors[i].rdev at the beginning of the
loop, so just use it.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/raid1.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6ba12f0f0f03..7dc8026cf6ee 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1529,13 +1529,12 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 
 		r1_bio->bios[i] = mbio;
 
-		mbio->bi_iter.bi_sector	= (r1_bio->sector +
-				   conf->mirrors[i].rdev->data_offset);
-		bio_set_dev(mbio, conf->mirrors[i].rdev->bdev);
+		mbio->bi_iter.bi_sector	= (r1_bio->sector + rdev->data_offset);
+		bio_set_dev(mbio, rdev->bdev);
 		mbio->bi_end_io	= raid1_end_write_request;
 		mbio->bi_opf = bio_op(bio) | (bio->bi_opf & (REQ_SYNC | REQ_FUA));
-		if (test_bit(FailFast, &conf->mirrors[i].rdev->flags) &&
-		    !test_bit(WriteMostly, &conf->mirrors[i].rdev->flags) &&
+		if (test_bit(FailFast, &rdev->flags) &&
+		    !test_bit(WriteMostly, &rdev->flags) &&
 		    conf->raid_disks - mddev->degraded > 1)
 			mbio->bi_opf |= MD_FAILFAST;
 		mbio->bi_private = r1_bio;
@@ -1546,7 +1545,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 			trace_block_bio_remap(mbio, disk_devt(mddev->gendisk),
 					      r1_bio->sector);
 		/* flush_pending_writes() needs access to the rdev so...*/
-		mbio->bi_bdev = (void *)conf->mirrors[i].rdev;
+		mbio->bi_bdev = (void *)rdev;
 
 		cb = blk_check_plugged(raid1_unplug, mddev, sizeof(*plug));
 		if (cb)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
                   ` (2 preceding siblings ...)
  2021-10-04 15:34 ` [PATCH 3/6] md/raid1: use rdev in raid1_write_request directly Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-07  6:20   ` Song Liu
  2021-10-04 15:34 ` [PATCH 5/6] md/raid5: call roundup_pow_of_two in raid5_run Guoqing Jiang
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

Add the paramenter since the err retry logic is only meaningful when
the caller is handle_read_error.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/raid10.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index aa2636582841..29eb538db953 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1116,7 +1116,7 @@ static void regular_request_wait(struct mddev *mddev, struct r10conf *conf,
 }
 
 static void raid10_read_request(struct mddev *mddev, struct bio *bio,
-				struct r10bio *r10_bio)
+				struct r10bio *r10_bio, bool read_err)
 {
 	struct r10conf *conf = mddev->private;
 	struct bio *read_bio;
@@ -1129,7 +1129,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
 	struct md_rdev *err_rdev = NULL;
 	gfp_t gfp = GFP_NOIO;
 
-	if (slot >= 0 && r10_bio->devs[slot].rdev) {
+	if (read_err && slot >= 0 && r10_bio->devs[slot].rdev) {
 		/*
 		 * This is an error retry, but we cannot
 		 * safely dereference the rdev in the r10_bio,
@@ -1519,7 +1519,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
 			conf->geo.raid_disks);
 
 	if (bio_data_dir(bio) == READ)
-		raid10_read_request(mddev, bio, r10_bio);
+		raid10_read_request(mddev, bio, r10_bio, false);
 	else
 		raid10_write_request(mddev, bio, r10_bio);
 }
@@ -2918,7 +2918,7 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
 	rdev_dec_pending(rdev, mddev);
 	allow_barrier(conf);
 	r10_bio->state = 0;
-	raid10_read_request(mddev, r10_bio->master_bio, r10_bio);
+	raid10_read_request(mddev, r10_bio->master_bio, r10_bio, true);
 }
 
 static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/6] md/raid5: call roundup_pow_of_two in raid5_run
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
                   ` (3 preceding siblings ...)
  2021-10-04 15:34 ` [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-04 15:34 ` [PATCH 6/6] md: remove unused argument from md_new_event Guoqing Jiang
  2021-10-07  6:32 ` [PATCH 0/6] Misc changes for md Song Liu
  6 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

Let's call roundup_pow_of_two here instead of open code.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/raid5.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 02ed53b20654..4ea9e7b647b0 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7732,10 +7732,7 @@ static int raid5_run(struct mddev *mddev)
 		 * discard data disk but write parity disk
 		 */
 		stripe = stripe * PAGE_SIZE;
-		/* Round up to power of 2, as discard handling
-		 * currently assumes that */
-		while ((stripe-1) & stripe)
-			stripe = (stripe | (stripe-1)) + 1;
+		stripe = roundup_pow_of_two(stripe);
 		mddev->queue->limits.discard_alignment = stripe;
 		mddev->queue->limits.discard_granularity = stripe;
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/6] md: remove unused argument from md_new_event
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
                   ` (4 preceding siblings ...)
  2021-10-04 15:34 ` [PATCH 5/6] md/raid5: call roundup_pow_of_two in raid5_run Guoqing Jiang
@ 2021-10-04 15:34 ` Guoqing Jiang
  2021-10-07  6:32 ` [PATCH 0/6] Misc changes for md Song Liu
  6 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-04 15:34 UTC (permalink / raw)
  To: song; +Cc: linux-raid

Actually, mddev is not used by md_new_event.

Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
---
 drivers/md/md.c     | 30 +++++++++++++++---------------
 drivers/md/md.h     |  2 +-
 drivers/md/raid10.c |  2 +-
 drivers/md/raid5.c  |  2 +-
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index c322841d4edc..90c624bec695 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -352,7 +352,7 @@ static bool create_on_open = true;
  */
 static DECLARE_WAIT_QUEUE_HEAD(md_event_waiters);
 static atomic_t md_event_count;
-void md_new_event(struct mddev *mddev)
+void md_new_event(void)
 {
 	atomic_inc(&md_event_count);
 	wake_up(&md_event_waiters);
@@ -2886,7 +2886,7 @@ static int add_bound_rdev(struct md_rdev *rdev)
 	if (mddev->degraded)
 		set_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
-	md_new_event(mddev);
+	md_new_event();
 	md_wakeup_thread(mddev->thread);
 	return 0;
 }
@@ -3002,7 +3002,7 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
 					set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
 					md_wakeup_thread(mddev->thread);
 				}
-				md_new_event(mddev);
+				md_new_event();
 			}
 		}
 	} else if (cmd_match(buf, "writemostly")) {
@@ -4099,7 +4099,7 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
 	if (!mddev->thread)
 		md_update_sb(mddev, 1);
 	sysfs_notify_dirent_safe(mddev->sysfs_level);
-	md_new_event(mddev);
+	md_new_event();
 	rv = len;
 out_unlock:
 	mddev_unlock(mddev);
@@ -4620,7 +4620,7 @@ new_dev_store(struct mddev *mddev, const char *buf, size_t len)
 		export_rdev(rdev);
 	mddev_unlock(mddev);
 	if (!err)
-		md_new_event(mddev);
+		md_new_event();
 	return err ? err : len;
 }
 
@@ -6041,7 +6041,7 @@ int md_run(struct mddev *mddev)
 	if (mddev->sb_flags)
 		md_update_sb(mddev, 0);
 
-	md_new_event(mddev);
+	md_new_event();
 	return 0;
 
 bitmap_abort:
@@ -6431,7 +6431,7 @@ static int do_md_stop(struct mddev *mddev, int mode,
 		if (mddev->hold_active == UNTIL_STOP)
 			mddev->hold_active = 0;
 	}
-	md_new_event(mddev);
+	md_new_event();
 	sysfs_notify_dirent_safe(mddev->sysfs_state);
 	return 0;
 }
@@ -6935,7 +6935,7 @@ static int hot_remove_disk(struct mddev *mddev, dev_t dev)
 		md_wakeup_thread(mddev->thread);
 	else
 		md_update_sb(mddev, 1);
-	md_new_event(mddev);
+	md_new_event();
 
 	return 0;
 busy:
@@ -7008,7 +7008,7 @@ static int hot_add_disk(struct mddev *mddev, dev_t dev)
 	 */
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	md_wakeup_thread(mddev->thread);
-	md_new_event(mddev);
+	md_new_event();
 	return 0;
 
 abort_export:
@@ -7982,7 +7982,7 @@ void md_error(struct mddev *mddev, struct md_rdev *rdev)
 	md_wakeup_thread(mddev->thread);
 	if (mddev->event_work.func)
 		queue_work(md_misc_wq, &mddev->event_work);
-	md_new_event(mddev);
+	md_new_event();
 }
 EXPORT_SYMBOL(md_error);
 
@@ -8866,7 +8866,7 @@ void md_do_sync(struct md_thread *thread)
 		mddev->curr_resync = 3; /* no longer delayed */
 	mddev->curr_resync_completed = j;
 	sysfs_notify_dirent_safe(mddev->sysfs_completed);
-	md_new_event(mddev);
+	md_new_event();
 	update_time = jiffies;
 
 	blk_start_plug(&plug);
@@ -8937,7 +8937,7 @@ void md_do_sync(struct md_thread *thread)
 			/* this is the earliest that rebuild will be
 			 * visible in /proc/mdstat
 			 */
-			md_new_event(mddev);
+			md_new_event();
 
 		if (last_check + window > io_sectors || j == max_sectors)
 			continue;
@@ -9161,7 +9161,7 @@ static int remove_and_add_spares(struct mddev *mddev,
 			sysfs_link_rdev(mddev, rdev);
 			if (!test_bit(Journal, &rdev->flags))
 				spares++;
-			md_new_event(mddev);
+			md_new_event();
 			set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
 		}
 	}
@@ -9195,7 +9195,7 @@ static void md_start_sync(struct work_struct *ws)
 	} else
 		md_wakeup_thread(mddev->sync_thread);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
-	md_new_event(mddev);
+	md_new_event();
 }
 
 /*
@@ -9454,7 +9454,7 @@ void md_reap_sync_thread(struct mddev *mddev)
 	/* flag recovery needed just to double check */
 	set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
 	sysfs_notify_dirent_safe(mddev->sysfs_action);
-	md_new_event(mddev);
+	md_new_event();
 	if (mddev->event_work.func)
 		queue_work(md_misc_wq, &mddev->event_work);
 }
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 4c96c36bd01a..53ea7a6961de 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -731,7 +731,7 @@ extern int sync_page_io(struct md_rdev *rdev, sector_t sector, int size,
 			struct page *page, int op, int op_flags,
 			bool metadata_op);
 extern void md_do_sync(struct md_thread *thread);
-extern void md_new_event(struct mddev *mddev);
+extern void md_new_event(void);
 extern void md_allow_write(struct mddev *mddev);
 extern void md_wait_for_blocked_rdev(struct md_rdev *rdev, struct mddev *mddev);
 extern void md_set_array_sectors(struct mddev *mddev, sector_t array_sectors);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 29eb538db953..0aee0675f18e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4647,7 +4647,7 @@ static int raid10_start_reshape(struct mddev *mddev)
 	}
 	conf->reshape_checkpoint = jiffies;
 	md_wakeup_thread(mddev->sync_thread);
-	md_new_event(mddev);
+	md_new_event();
 	return 0;
 
 abort:
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4ea9e7b647b0..9c1a5877cf9f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -8279,7 +8279,7 @@ static int raid5_start_reshape(struct mddev *mddev)
 	}
 	conf->reshape_checkpoint = jiffies;
 	md_wakeup_thread(mddev->sync_thread);
-	md_new_event(mddev);
+	md_new_event();
 	return 0;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device
  2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
@ 2021-10-05  1:05   ` Guoqing Jiang
  2021-10-05  5:55   ` Jens Stutte
  1 sibling, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-05  1:05 UTC (permalink / raw)
  To: song; +Cc: linux-raid, jens

Now cc Jens actually.

On 10/4/21 11:34 PM, Guoqing Jiang wrote:
> Commit 6607cd319b6b91bff94e90f798a61c031650b514 ("raid1: ensure write
> behind bio has less than BIO_MAX_VECS sectors") tried to guarantee the
> size of behind bio is not bigger than BIO_MAX_VECS sectors.
>
> Unfortunately the same calltrace still could happen since an array could
> enable write-behind without write mostly device.
>
> To match the manpage of mdadm (which says "write-behind is only attempted
> on drives marked as write-mostly"), we need to check WriteMostly flag to
> avoid such unexpected behavior.
>
> [1]. https://bugzilla.kernel.org/show_bug.cgi?id=213181#c25
>
> Cc: stable@vger.kernel.org # v5.12+
> Cc: Jens Stutte <jens@chianterastutte.eu>
> Reported-by: Jens Stutte <jens@chianterastutte.eu>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
>   drivers/md/raid1.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 19598bd38939..6ba12f0f0f03 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1496,7 +1496,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   		if (!r1_bio->bios[i])
>   			continue;
>   
> -		if (first_clone) {
> +		if (first_clone && test_bit(WriteMostly, &rdev->flags)) {
>   			/* do behind I/O ?
>   			 * Not if there are too many, or cannot
>   			 * allocate memory, or a reader on WriteMostly


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device
  2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
  2021-10-05  1:05   ` Guoqing Jiang
@ 2021-10-05  5:55   ` Jens Stutte
  1 sibling, 0 replies; 14+ messages in thread
From: Jens Stutte @ 2021-10-05  5:55 UTC (permalink / raw)
  To: Guoqing Jiang, song; +Cc: linux-raid

FWIW, I can confirm that an equivalent change (except for variable 
renaming, see [1]) is running well here now for a while now.

Thank you to work on this!

Am 04.10.21 um 17:34 schrieb Guoqing Jiang:
> Commit 6607cd319b6b91bff94e90f798a61c031650b514 ("raid1: ensure write
> behind bio has less than BIO_MAX_VECS sectors") tried to guarantee the
> size of behind bio is not bigger than BIO_MAX_VECS sectors.
>
> Unfortunately the same calltrace still could happen since an array could
> enable write-behind without write mostly device.
>
> To match the manpage of mdadm (which says "write-behind is only attempted
> on drives marked as write-mostly"), we need to check WriteMostly flag to
> avoid such unexpected behavior.
>
> [1]. https://bugzilla.kernel.org/show_bug.cgi?id=213181#c25
>
> Cc: stable@vger.kernel.org # v5.12+
> Cc: Jens Stutte <jens@chianterastutte.eu>
> Reported-by: Jens Stutte <jens@chianterastutte.eu>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
>   drivers/md/raid1.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 19598bd38939..6ba12f0f0f03 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1496,7 +1496,7 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   		if (!r1_bio->bios[i])
>   			continue;
>   
> -		if (first_clone) {
> +		if (first_clone && test_bit(WriteMostly, &rdev->flags)) {
>   			/* do behind I/O ?
>   			 * Not if there are too many, or cannot
>   			 * allocate memory, or a reader on WriteMostly

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request
  2021-10-04 15:34 ` [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request Guoqing Jiang
@ 2021-10-07  6:20   ` Song Liu
  2021-10-07 10:16     ` Guoqing Jiang
  0 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2021-10-07  6:20 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Mon, Oct 4, 2021 at 8:40 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
> Add the paramenter since the err retry logic is only meaningful when
> the caller is handle_read_error.
>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
>  drivers/md/raid10.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index aa2636582841..29eb538db953 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1116,7 +1116,7 @@ static void regular_request_wait(struct mddev *mddev, struct r10conf *conf,
>  }
>
>  static void raid10_read_request(struct mddev *mddev, struct bio *bio,
> -                               struct r10bio *r10_bio)
> +                               struct r10bio *r10_bio, bool read_err)
>  {
>         struct r10conf *conf = mddev->private;
>         struct bio *read_bio;
> @@ -1129,7 +1129,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
>         struct md_rdev *err_rdev = NULL;
>         gfp_t gfp = GFP_NOIO;
>
> -       if (slot >= 0 && r10_bio->devs[slot].rdev) {
> +       if (read_err && slot >= 0 && r10_bio->devs[slot].rdev) {

How about we just move this section to a separate function?

Thanks,
Song

>                 /*
>                  * This is an error retry, but we cannot
>                  * safely dereference the rdev in the r10_bio,
> @@ -1519,7 +1519,7 @@ static void __make_request(struct mddev *mddev, struct bio *bio, int sectors)
>                         conf->geo.raid_disks);
>
>         if (bio_data_dir(bio) == READ)
> -               raid10_read_request(mddev, bio, r10_bio);
> +               raid10_read_request(mddev, bio, r10_bio, false);
>         else
>                 raid10_write_request(mddev, bio, r10_bio);
>  }
> @@ -2918,7 +2918,7 @@ static void handle_read_error(struct mddev *mddev, struct r10bio *r10_bio)
>         rdev_dec_pending(rdev, mddev);
>         allow_barrier(conf);
>         r10_bio->state = 0;
> -       raid10_read_request(mddev, r10_bio->master_bio, r10_bio);
> +       raid10_read_request(mddev, r10_bio->master_bio, r10_bio, true);
>  }
>
>  static void handle_write_completed(struct r10conf *conf, struct r10bio *r10_bio)
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device
  2021-10-04 15:34 ` [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device Guoqing Jiang
@ 2021-10-07  6:25   ` Song Liu
  2021-10-07 10:20     ` Guoqing Jiang
  0 siblings, 1 reply; 14+ messages in thread
From: Song Liu @ 2021-10-07  6:25 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Mon, Oct 4, 2021 at 8:40 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
> We shouldn't set it since write behind IO should only happen to write
> mostly device.
>
> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
> ---
>  drivers/md/md-bitmap.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
>
> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
> index e29c6298ef5c..0346281b1555 100644
> --- a/drivers/md/md-bitmap.c
> +++ b/drivers/md/md-bitmap.c
> @@ -2469,11 +2469,28 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
>  {
>         unsigned long backlog;
>         unsigned long old_mwb = mddev->bitmap_info.max_write_behind;
> +       struct md_rdev *rdev;
> +       bool has_write_mostly = false;
>         int rv = kstrtoul(buf, 10, &backlog);
>         if (rv)
>                 return rv;
>         if (backlog > COUNTER_MAX)
>                 return -EINVAL;
> +
> +       /*
> +        * Without write mostly device, it doesn't make sense to set
> +        * backlog for max_write_behind.
> +        */
> +       rdev_for_each(rdev, mddev)
> +               if (test_bit(WriteMostly, &rdev->flags)) {
> +                       has_write_mostly = true;
> +                       break;
> +               }
> +       if (!has_write_mostly) {
> +               pr_warn_ratelimited("md: No write mostly device available\n");

Most of these _store functions do not print warnings for invalid changes. So
I am not sure whether we want to add this one. If we do want it, we should
make it clear, as
"md: No write mostly device available. Cannot set backlog\n".
We may also add the device name there.

Thanks,
Song

> +               return -EINVAL;
> +       }
> +
>         mddev->bitmap_info.max_write_behind = backlog;
>         if (!backlog && mddev->serial_info_pool) {
>                 /* serial_info_pool is not needed if backlog is zero */
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/6] Misc changes for md
  2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
                   ` (5 preceding siblings ...)
  2021-10-04 15:34 ` [PATCH 6/6] md: remove unused argument from md_new_event Guoqing Jiang
@ 2021-10-07  6:32 ` Song Liu
  6 siblings, 0 replies; 14+ messages in thread
From: Song Liu @ 2021-10-07  6:32 UTC (permalink / raw)
  To: Guoqing Jiang; +Cc: linux-raid

On Mon, Oct 4, 2021 at 8:40 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>
> Hello,
>
> The first patch fixes the same calltrace as commit 6607cd319b6b ("raid1:
> ensure write behind bio has less than BIO_MAX_VECS sectors") tried
> before, but unfortunately the calltrace still could happen if array
> without write mostly device is configured with write-behind enabled.
> So the first patch is suitable for fix branch which others are materials
> for next branch.
>
> Pls review.
>
> Thanks,
> Guoqing
>
> Guoqing Jiang (6):
>   md/raid1: only allocate write behind bio for WriteMostly device
>   md/bitmap: don't set max_write_behind if there is no write mostly
>     device
>   md/raid1: use rdev in raid1_write_request directly
>   md/raid10: add 'read_err' to raid10_read_request
>   md/raid5: call roundup_pow_of_two in raid5_run
>   md: remove unused argument from md_new_event

Thanks for these fixes and cleanups! I applied 1, 3, 5, 6 to md-next.

Song

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request
  2021-10-07  6:20   ` Song Liu
@ 2021-10-07 10:16     ` Guoqing Jiang
  0 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-07 10:16 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid



On 10/7/21 2:20 PM, Song Liu wrote:
> On Mon, Oct 4, 2021 at 8:40 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>> Add the paramenter since the err retry logic is only meaningful when
>> the caller is handle_read_error.
>>
>> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
>> ---
>>   drivers/md/raid10.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
>> index aa2636582841..29eb538db953 100644
>> --- a/drivers/md/raid10.c
>> +++ b/drivers/md/raid10.c
>> @@ -1116,7 +1116,7 @@ static void regular_request_wait(struct mddev *mddev, struct r10conf *conf,
>>   }
>>
>>   static void raid10_read_request(struct mddev *mddev, struct bio *bio,
>> -                               struct r10bio *r10_bio)
>> +                               struct r10bio *r10_bio, bool read_err)
>>   {
>>          struct r10conf *conf = mddev->private;
>>          struct bio *read_bio;
>> @@ -1129,7 +1129,7 @@ static void raid10_read_request(struct mddev *mddev, struct bio *bio,
>>          struct md_rdev *err_rdev = NULL;
>>          gfp_t gfp = GFP_NOIO;
>>
>> -       if (slot >= 0 && r10_bio->devs[slot].rdev) {
>> +       if (read_err && slot >= 0 && r10_bio->devs[slot].rdev) {
> How about we just move this section to a separate function?

Will add an additional patch for it.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device
  2021-10-07  6:25   ` Song Liu
@ 2021-10-07 10:20     ` Guoqing Jiang
  0 siblings, 0 replies; 14+ messages in thread
From: Guoqing Jiang @ 2021-10-07 10:20 UTC (permalink / raw)
  To: Song Liu; +Cc: linux-raid



On 10/7/21 2:25 PM, Song Liu wrote:
> On Mon, Oct 4, 2021 at 8:40 AM Guoqing Jiang <guoqing.jiang@linux.dev> wrote:
>> We shouldn't set it since write behind IO should only happen to write
>> mostly device.
>>
>> Signed-off-by: Guoqing Jiang <guoqing.jiang@linux.dev>
>> ---
>>   drivers/md/md-bitmap.c | 17 +++++++++++++++++
>>   1 file changed, 17 insertions(+)
>>
>> diff --git a/drivers/md/md-bitmap.c b/drivers/md/md-bitmap.c
>> index e29c6298ef5c..0346281b1555 100644
>> --- a/drivers/md/md-bitmap.c
>> +++ b/drivers/md/md-bitmap.c
>> @@ -2469,11 +2469,28 @@ backlog_store(struct mddev *mddev, const char *buf, size_t len)
>>   {
>>          unsigned long backlog;
>>          unsigned long old_mwb = mddev->bitmap_info.max_write_behind;
>> +       struct md_rdev *rdev;
>> +       bool has_write_mostly = false;
>>          int rv = kstrtoul(buf, 10, &backlog);
>>          if (rv)
>>                  return rv;
>>          if (backlog > COUNTER_MAX)
>>                  return -EINVAL;
>> +
>> +       /*
>> +        * Without write mostly device, it doesn't make sense to set
>> +        * backlog for max_write_behind.
>> +        */
>> +       rdev_for_each(rdev, mddev)
>> +               if (test_bit(WriteMostly, &rdev->flags)) {
>> +                       has_write_mostly = true;
>> +                       break;
>> +               }
>> +       if (!has_write_mostly) {
>> +               pr_warn_ratelimited("md: No write mostly device available\n");
> Most of these _store functions do not print warnings for invalid changes. So
> I am not sure whether we want to add this one.

I think it is better to makes users know the reason of invalidation.

> If we do want it, we should make it clear, as
> "md: No write mostly device available. Cannot set backlog\n".
> We may also add the device name there.

Sure, I will make it clearer.

Thanks,
Guoqing

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-10-07 10:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-04 15:34 [PATCH 0/6] Misc changes for md Guoqing Jiang
2021-10-04 15:34 ` [PATCH 1/6] md/raid1: only allocate write behind bio for WriteMostly device Guoqing Jiang
2021-10-05  1:05   ` Guoqing Jiang
2021-10-05  5:55   ` Jens Stutte
2021-10-04 15:34 ` [PATCH 2/6] md/bitmap: don't set max_write_behind if there is no write mostly device Guoqing Jiang
2021-10-07  6:25   ` Song Liu
2021-10-07 10:20     ` Guoqing Jiang
2021-10-04 15:34 ` [PATCH 3/6] md/raid1: use rdev in raid1_write_request directly Guoqing Jiang
2021-10-04 15:34 ` [PATCH 4/6] md/raid10: add 'read_err' to raid10_read_request Guoqing Jiang
2021-10-07  6:20   ` Song Liu
2021-10-07 10:16     ` Guoqing Jiang
2021-10-04 15:34 ` [PATCH 5/6] md/raid5: call roundup_pow_of_two in raid5_run Guoqing Jiang
2021-10-04 15:34 ` [PATCH 6/6] md: remove unused argument from md_new_event Guoqing Jiang
2021-10-07  6:32 ` [PATCH 0/6] Misc changes for md Song Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).