linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* atomic queue limit updates for stackable devices
@ 2024-02-23 16:12 Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 1/9] block: add a queue_limits_set helper Christoph Hellwig
                   ` (9 more replies)
  0 siblings, 10 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 16:12 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai
  Cc: dm-devel, linux-block, linux-raid

Hi all,

this series adds new helpers for the atomic queue limit update
functionality and then switches dm and md over to it.  The dm switch is
pretty trivial as it was basically implementing the model by hand
already, md is a bit more work.

I've run the mdadm testsuite, and it has the same (rather large) number
of failures as the baseline.  I've still not managed to get the dm
testuite running unfortunately, but it survives xfstests which exercises
quite a few dm targets and blktests.

drbd and nvme-multipath will be handled separately.

Diffstat:
 block/blk-settings.c   |   46 ++++++++++++------
 drivers/md/dm-table.c  |   27 ++++------
 drivers/md/md.c        |   37 ++++++++++++++
 drivers/md/md.h        |    3 +
 drivers/md/raid0.c     |   35 ++++++-------
 drivers/md/raid1.c     |   24 +++------
 drivers/md/raid10.c    |   52 +++++++++-----------
 drivers/md/raid5.c     |  123 ++++++++++++++++++++++---------------------------
 include/linux/blkdev.h |    5 +
 9 files changed, 193 insertions(+), 159 deletions(-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/9] block: add a queue_limits_set helper
  2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
@ 2024-02-23 16:12 ` Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 2/9] block: add a queue_limits_stack_bdev helper Christoph Hellwig
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 16:12 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai
  Cc: dm-devel, linux-block, linux-raid

Add a small wrapper around queue_limits_commit_update for stacking
drivers that don't want to update existing limits, but set an
entirely new set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-settings.c   | 18 ++++++++++++++++++
 include/linux/blkdev.h |  1 +
 2 files changed, 19 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b6bbe683d218fa..1989a177be201b 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -266,6 +266,24 @@ int queue_limits_commit_update(struct request_queue *q,
 }
 EXPORT_SYMBOL_GPL(queue_limits_commit_update);
 
+/**
+ * queue_limits_commit_set - apply queue limits to queue
+ * @q:		queue to update
+ * @lim:	limits to apply
+ *
+ * Apply the limits in @lim that were freshly initialized to @q.
+ * To update existing limits use queue_limits_start_update() and
+ * queue_limits_commit_update() instead.
+ *
+ * Returns 0 if successful, else a negative error code.
+ */
+int queue_limits_set(struct request_queue *q, struct queue_limits *lim)
+{
+	mutex_lock(&q->limits_lock);
+	return queue_limits_commit_update(q, lim);
+}
+EXPORT_SYMBOL_GPL(queue_limits_set);
+
 /**
  * blk_queue_bounce_limit - set bounce buffer limit for queue
  * @q: the request queue for the device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a14ea934413850..dd510ad7ce4b45 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -889,6 +889,7 @@ queue_limits_start_update(struct request_queue *q)
 }
 int queue_limits_commit_update(struct request_queue *q,
 		struct queue_limits *lim);
+int queue_limits_set(struct request_queue *q, struct queue_limits *lim);
 
 /*
  * Access functions for manipulating queue properties
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/9] block: add a queue_limits_stack_bdev helper
  2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 1/9] block: add a queue_limits_set helper Christoph Hellwig
@ 2024-02-23 16:12 ` Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 3/9] dm: use queue_limits_set Christoph Hellwig
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 16:12 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai
  Cc: dm-devel, linux-block, linux-raid

Add a small wrapper around blk_stack_limits that allows passing a bdev
for the bottom device and prints an error in case of misaligned
device. The name fits into the new queue limits API and the intent is
to eventually replace disk_stack_limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-settings.c   | 24 ++++++++++++++++++++++++
 include/linux/blkdev.h |  2 ++
 2 files changed, 26 insertions(+)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 1989a177be201b..f14d3a18f9e2f0 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -891,6 +891,30 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 }
 EXPORT_SYMBOL(blk_stack_limits);
 
+/**
+ * queue_limits_stack_bdev - adjust queue_limits for stacked devices
+ * @t:	the stacking driver limits (top device)
+ * @bdev:  the underlying block device (bottom)
+ * @offset:  offset to beginning of data within component device
+ *
+ * Description:
+ *    This function is used by stacking drivers like MD and DM to ensure
+ *    that all component devices have compatible block sizes and
+ *    alignments.  The stacking driver must provide a queue_limits
+ *    struct (top) and then iteratively call the stacking function for
+ *    all component (bottom) devices.  The stacking function will
+ *    attempt to combine the values and ensure proper alignment.
+ */
+void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
+		sector_t offset, const char *pfx)
+{
+	if (blk_stack_limits(t, &bdev_get_queue(bdev)->limits,
+			get_start_sect(bdev) + offset))
+		pr_notice("%s: Warning: Device %pg is misaligned\n",
+			pfx, bdev);
+}
+EXPORT_SYMBOL_GPL(queue_limits_stack_bdev);
+
 /**
  * disk_stack_limits - adjust queue limits for stacked drivers
  * @disk:  MD/DM gendisk (top)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index dd510ad7ce4b45..285e82723d641f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -924,6 +924,8 @@ extern void blk_set_queue_depth(struct request_queue *q, unsigned int depth);
 extern void blk_set_stacking_limits(struct queue_limits *lim);
 extern int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 			    sector_t offset);
+void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
+		sector_t offset, const char *pfx);
 extern void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
 			      sector_t offset);
 extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/9] dm: use queue_limits_set
  2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 1/9] block: add a queue_limits_set helper Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 2/9] block: add a queue_limits_stack_bdev helper Christoph Hellwig
@ 2024-02-23 16:12 ` Christoph Hellwig
  2024-02-23 17:30   ` Mike Snitzer
  2024-02-23 16:12 ` [PATCH 4/9] md: add queue limit helpers Christoph Hellwig
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 16:12 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai
  Cc: dm-devel, linux-block, linux-raid

Use queue_limits_set which validates the limits and takes care of
updating the readahead settings instead of directly assigning them to
the queue.  For that make sure all limits are actually updated before
the assignment.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/dm-table.c | 27 ++++++++++++---------------
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 41f1d731ae5ac2..88114719fe187a 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1963,26 +1963,27 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	bool wc = false, fua = false;
 	int r;
 
-	/*
-	 * Copy table's limits to the DM device's request_queue
-	 */
-	q->limits = *limits;
-
 	if (dm_table_supports_nowait(t))
 		blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
 	else
 		blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, q);
 
 	if (!dm_table_supports_discards(t)) {
-		q->limits.max_discard_sectors = 0;
-		q->limits.max_hw_discard_sectors = 0;
-		q->limits.discard_granularity = 0;
-		q->limits.discard_alignment = 0;
-		q->limits.discard_misaligned = 0;
+		limits->max_hw_discard_sectors = 0;
+		limits->discard_granularity = 0;
+		limits->discard_alignment = 0;
+		limits->discard_misaligned = 0;
 	}
 
+	if (!dm_table_supports_write_zeroes(t))
+		limits->max_write_zeroes_sectors = 0;
+
 	if (!dm_table_supports_secure_erase(t))
-		q->limits.max_secure_erase_sectors = 0;
+		limits->max_secure_erase_sectors = 0;
+
+	r = queue_limits_set(q, limits);
+	if (r)
+		return r;
 
 	if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_WC))) {
 		wc = true;
@@ -2007,9 +2008,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	else
 		blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
 
-	if (!dm_table_supports_write_zeroes(t))
-		q->limits.max_write_zeroes_sectors = 0;
-
 	dm_table_verify_integrity(t);
 
 	/*
@@ -2047,7 +2045,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	}
 
 	dm_update_crypto_profile(q, t);
-	disk_update_readahead(t->md->disk);
 
 	/*
 	 * Check for request-based device is left to
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/9] md: add queue limit helpers
  2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-02-23 16:12 ` [PATCH 3/9] dm: use queue_limits_set Christoph Hellwig
@ 2024-02-23 16:12 ` Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 5/9] md/raid0: use the atomic queue limit update APIs Christoph Hellwig
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 16:12 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai
  Cc: dm-devel, linux-block, linux-raid

Add a few helpers that wrap the block queue limits API for use in MD.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/md.c | 37 +++++++++++++++++++++++++++++++++++++
 drivers/md/md.h |  3 +++
 2 files changed, 40 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 75266c34b1f99b..23823823f80c6b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5699,6 +5699,43 @@ static const struct kobj_type md_ktype = {
 
 int mdp_major = 0;
 
+/* stack the limit for all rdevs into lim */
+void mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim)
+{
+	struct md_rdev *rdev;
+
+	rdev_for_each(rdev, mddev) {
+		queue_limits_stack_bdev(lim, rdev->bdev, rdev->data_offset,
+					mddev->gendisk->disk_name);
+	}
+}
+EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits);
+
+/* apply the extra stacking limits from a new rdev into mddev */
+int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
+{
+	struct queue_limits lim = queue_limits_start_update(mddev->queue);
+
+	queue_limits_stack_bdev(&lim, rdev->bdev, rdev->data_offset,
+				mddev->gendisk->disk_name);
+	return queue_limits_commit_update(mddev->queue, &lim);
+}
+EXPORT_SYMBOL_GPL(mddev_stack_new_rdev);
+
+/* update the optimal I/O size after a reshape */
+void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes)
+{
+	struct queue_limits lim;
+	int ret;
+
+	blk_mq_freeze_queue(mddev->queue);
+	lim = queue_limits_start_update(mddev->queue);
+	lim.io_opt = lim.io_min * nr_stripes;
+	ret = queue_limits_commit_update(mddev->queue, &lim);
+	blk_mq_unfreeze_queue(mddev->queue);
+}
+EXPORT_SYMBOL_GPL(mddev_update_io_opt);
+
 static void mddev_delayed_delete(struct work_struct *ws)
 {
 	struct mddev *mddev = container_of(ws, struct mddev, del_work);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 8d881cc597992f..25b19614aa3239 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -860,6 +860,9 @@ void md_autostart_arrays(int part);
 int md_set_array_info(struct mddev *mddev, struct mdu_array_info_s *info);
 int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info);
 int do_md_run(struct mddev *mddev);
+void mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim);
+int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev);
+void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes);
 
 extern const struct block_device_operations md_fops;
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/9] md/raid0: use the atomic queue limit update APIs
  2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-02-23 16:12 ` [PATCH 4/9] md: add queue limit helpers Christoph Hellwig
@ 2024-02-23 16:12 ` Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 6/9] md/raid1: " Christoph Hellwig
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 16:12 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai
  Cc: dm-devel, linux-block, linux-raid

Build the queue limits outside the queue and apply them using
queue_limits_set.  Also remove the bogus ->gendisk and ->queue NULL
checks in the are while touching it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/raid0.c | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index c50a7abda744ad..f7d78ee5338bd3 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -381,6 +381,7 @@ static void raid0_free(struct mddev *mddev, void *priv)
 
 static int raid0_run(struct mddev *mddev)
 {
+	struct queue_limits lim;
 	struct r0conf *conf;
 	int ret;
 
@@ -391,29 +392,23 @@ static int raid0_run(struct mddev *mddev)
 	if (md_check_no_bitmap(mddev))
 		return -EINVAL;
 
-	/* if private is not null, we are here after takeover */
-	if (mddev->private == NULL) {
+	/* if conf is not null, we are here after takeover */
+	if (!conf) {
 		ret = create_strip_zones(mddev, &conf);
 		if (ret < 0)
 			return ret;
 		mddev->private = conf;
 	}
-	conf = mddev->private;
-	if (mddev->queue) {
-		struct md_rdev *rdev;
-
-		blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
-		blk_queue_max_write_zeroes_sectors(mddev->queue, mddev->chunk_sectors);
-
-		blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
-		blk_queue_io_opt(mddev->queue,
-				 (mddev->chunk_sectors << 9) * mddev->raid_disks);
 
-		rdev_for_each(rdev, mddev) {
-			disk_stack_limits(mddev->gendisk, rdev->bdev,
-					  rdev->data_offset << 9);
-		}
-	}
+	blk_set_stacking_limits(&lim);
+	lim.max_hw_sectors = mddev->chunk_sectors;
+	lim.max_write_zeroes_sectors = mddev->chunk_sectors;
+	lim.io_min = mddev->chunk_sectors << 9;
+	lim.io_opt = lim.io_min * mddev->raid_disks;
+	mddev_stack_rdev_limits(mddev, &lim);
+	ret = queue_limits_set(mddev->queue, &lim);
+	if (ret)
+		goto out_free_conf;
 
 	/* calculate array device size */
 	md_set_array_sectors(mddev, raid0_size(mddev, 0, 0));
@@ -426,8 +421,10 @@ static int raid0_run(struct mddev *mddev)
 
 	ret = md_integrity_register(mddev);
 	if (ret)
-		free_conf(mddev, conf);
-
+		goto out_free_conf;
+	return 0;
+out_free_conf:
+	free_conf(mddev, conf);
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/9] md/raid1: use the atomic queue limit update APIs
  2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-02-23 16:12 ` [PATCH 5/9] md/raid0: use the atomic queue limit update APIs Christoph Hellwig
@ 2024-02-23 16:12 ` Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 7/9] md/raid10: " Christoph Hellwig
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 16:12 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai
  Cc: dm-devel, linux-block, linux-raid

Build the queue limits outside the queue and apply them using
queue_limits_set.  Also remove the bogus ->gendisk and ->queue NULL
checks in the are while touching it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/raid1.c | 24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 286f8b16c7bde7..752ff99736a636 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1791,10 +1791,9 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 	for (mirror = first; mirror <= last; mirror++) {
 		p = conf->mirrors + mirror;
 		if (!p->rdev) {
-			if (mddev->gendisk)
-				disk_stack_limits(mddev->gendisk, rdev->bdev,
-						  rdev->data_offset << 9);
-
+			err = mddev_stack_new_rdev(mddev, rdev);
+			if (err)
+				return err;
 			p->head_position = 0;
 			rdev->raid_disk = mirror;
 			err = 0;
@@ -3089,9 +3088,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
 static void raid1_free(struct mddev *mddev, void *priv);
 static int raid1_run(struct mddev *mddev)
 {
+	struct queue_limits lim;
 	struct r1conf *conf;
 	int i;
-	struct md_rdev *rdev;
 	int ret;
 
 	if (mddev->level != 1) {
@@ -3118,15 +3117,12 @@ static int raid1_run(struct mddev *mddev)
 	if (IS_ERR(conf))
 		return PTR_ERR(conf);
 
-	if (mddev->queue)
-		blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
-
-	rdev_for_each(rdev, mddev) {
-		if (!mddev->gendisk)
-			continue;
-		disk_stack_limits(mddev->gendisk, rdev->bdev,
-				  rdev->data_offset << 9);
-	}
+	blk_set_stacking_limits(&lim);
+	lim.max_write_zeroes_sectors = 0;
+	mddev_stack_rdev_limits(mddev, &lim);
+	ret = queue_limits_set(mddev->queue, &lim);
+	if (ret)
+		goto abort;
 
 	mddev->degraded = 0;
 	for (i = 0; i < conf->raid_disks; i++)
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/9] md/raid10: use the atomic queue limit update APIs
  2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-02-23 16:12 ` [PATCH 6/9] md/raid1: " Christoph Hellwig
@ 2024-02-23 16:12 ` Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 8/9] md/raid5: " Christoph Hellwig
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 16:12 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai
  Cc: dm-devel, linux-block, linux-raid

Build the queue limits outside the queue and apply them using
queue_limits_set.  Also remove the bogus ->gendisk and ->queue NULL
checks in the are while touching it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/raid10.c | 52 +++++++++++++++++++++------------------------
 1 file changed, 24 insertions(+), 28 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7412066ea22c7a..21d0aced9a0725 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2130,11 +2130,9 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 				repl_slot = mirror;
 			continue;
 		}
-
-		if (mddev->gendisk)
-			disk_stack_limits(mddev->gendisk, rdev->bdev,
-					  rdev->data_offset << 9);
-
+		err = mddev_stack_new_rdev(mddev, rdev);
+		if (err)
+			return err;
 		p->head_position = 0;
 		p->recovery_disabled = mddev->recovery_disabled - 1;
 		rdev->raid_disk = mirror;
@@ -2150,10 +2148,9 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
 		clear_bit(In_sync, &rdev->flags);
 		set_bit(Replacement, &rdev->flags);
 		rdev->raid_disk = repl_slot;
-		err = 0;
-		if (mddev->gendisk)
-			disk_stack_limits(mddev->gendisk, rdev->bdev,
-					  rdev->data_offset << 9);
+		err = mddev_stack_new_rdev(mddev, rdev);
+		if (err)
+			return err;
 		conf->fullsync = 1;
 		WRITE_ONCE(p->replacement, rdev);
 	}
@@ -4002,18 +3999,18 @@ static struct r10conf *setup_conf(struct mddev *mddev)
 	return ERR_PTR(err);
 }
 
-static void raid10_set_io_opt(struct r10conf *conf)
+static unsigned int raid10_nr_stripes(struct r10conf *conf)
 {
-	int raid_disks = conf->geo.raid_disks;
+	unsigned int raid_disks = conf->geo.raid_disks;
 
-	if (!(conf->geo.raid_disks % conf->geo.near_copies))
-		raid_disks /= conf->geo.near_copies;
-	blk_queue_io_opt(conf->mddev->queue, (conf->mddev->chunk_sectors << 9) *
-			 raid_disks);
+	if (conf->geo.raid_disks % conf->geo.near_copies)
+		return raid_disks;
+	return raid_disks / conf->geo.near_copies;
 }
 
 static int raid10_run(struct mddev *mddev)
 {
+	struct queue_limits lim;
 	struct r10conf *conf;
 	int i, disk_idx;
 	struct raid10_info *disk;
@@ -4021,6 +4018,7 @@ static int raid10_run(struct mddev *mddev)
 	sector_t size;
 	sector_t min_offset_diff = 0;
 	int first = 1;
+	int ret = -EIO;
 
 	if (mddev->private == NULL) {
 		conf = setup_conf(mddev);
@@ -4047,12 +4045,6 @@ static int raid10_run(struct mddev *mddev)
 		}
 	}
 
-	if (mddev->queue) {
-		blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
-		blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
-		raid10_set_io_opt(conf);
-	}
-
 	rdev_for_each(rdev, mddev) {
 		long long diff;
 
@@ -4081,14 +4073,19 @@ static int raid10_run(struct mddev *mddev)
 		if (first || diff < min_offset_diff)
 			min_offset_diff = diff;
 
-		if (mddev->gendisk)
-			disk_stack_limits(mddev->gendisk, rdev->bdev,
-					  rdev->data_offset << 9);
-
 		disk->head_position = 0;
 		first = 0;
 	}
 
+	blk_set_stacking_limits(&lim);
+	lim.max_write_zeroes_sectors = 0;
+	lim.io_min = mddev->chunk_sectors << 9;
+	lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
+	mddev_stack_rdev_limits(mddev, &lim);
+	ret = queue_limits_set(mddev->queue, &lim);
+	if (ret)
+		goto out_free_conf;
+
 	/* need to check that every block has at least one working mirror */
 	if (!enough(conf, -1)) {
 		pr_err("md/raid10:%s: not enough operational mirrors.\n",
@@ -4189,7 +4186,7 @@ static int raid10_run(struct mddev *mddev)
 	raid10_free_conf(conf);
 	mddev->private = NULL;
 out:
-	return -EIO;
+	return ret;
 }
 
 static void raid10_free(struct mddev *mddev, void *priv)
@@ -4966,8 +4963,7 @@ static void end_reshape(struct r10conf *conf)
 	conf->reshape_safe = MaxSector;
 	spin_unlock_irq(&conf->device_lock);
 
-	if (conf->mddev->queue)
-		raid10_set_io_opt(conf);
+	mddev_update_io_opt(conf->mddev, raid10_nr_stripes(conf));
 	conf->fullsync = 0;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/9] md/raid5: use the atomic queue limit update APIs
  2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-02-23 16:12 ` [PATCH 7/9] md/raid10: " Christoph Hellwig
@ 2024-02-23 16:12 ` Christoph Hellwig
  2024-02-23 16:12 ` [PATCH 9/9] block: remove disk_stack_limits Christoph Hellwig
  2024-02-23 17:36 ` atomic queue limit updates for stackable devices Mike Snitzer
  9 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 16:12 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai
  Cc: dm-devel, linux-block, linux-raid

Build the queue limits outside the queue and apply them using
queue_limits_set.  Also remove the bogus ->gendisk and ->queue NULL
checks in the are while touching it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/md/raid5.c | 123 +++++++++++++++++++++------------------------
 1 file changed, 56 insertions(+), 67 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 14f2cf75abbd72..3dd7c05d3ba2ab 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7682,12 +7682,6 @@ static int only_parity(int raid_disk, int algo, int raid_disks, int max_degraded
 	return 0;
 }
 
-static void raid5_set_io_opt(struct r5conf *conf)
-{
-	blk_queue_io_opt(conf->mddev->queue, (conf->chunk_sectors << 9) *
-			 (conf->raid_disks - conf->max_degraded));
-}
-
 static int raid5_run(struct mddev *mddev)
 {
 	struct r5conf *conf;
@@ -7695,9 +7689,12 @@ static int raid5_run(struct mddev *mddev)
 	struct md_rdev *rdev;
 	struct md_rdev *journal_dev = NULL;
 	sector_t reshape_offset = 0;
+	struct queue_limits lim;
 	int i;
 	long long min_offset_diff = 0;
 	int first = 1;
+	int data_disks, stripe;
+	int ret = -EIO;
 
 	if (mddev->recovery_cp != MaxSector)
 		pr_notice("md/raid:%s: not clean -- starting background reconstruction\n",
@@ -7950,67 +7947,59 @@ static int raid5_run(struct mddev *mddev)
 			mdname(mddev));
 	md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
 
-	if (mddev->queue) {
-		int chunk_size;
-		/* read-ahead size must cover two whole stripes, which
-		 * is 2 * (datadisks) * chunksize where 'n' is the
-		 * number of raid devices
-		 */
-		int data_disks = conf->previous_raid_disks - conf->max_degraded;
-		int stripe = data_disks *
-			((mddev->chunk_sectors << 9) / PAGE_SIZE);
-
-		chunk_size = mddev->chunk_sectors << 9;
-		blk_queue_io_min(mddev->queue, chunk_size);
-		raid5_set_io_opt(conf);
-		mddev->queue->limits.raid_partial_stripes_expensive = 1;
-		/*
-		 * We can only discard a whole stripe. It doesn't make sense to
-		 * discard data disk but write parity disk
-		 */
-		stripe = stripe * PAGE_SIZE;
-		stripe = roundup_pow_of_two(stripe);
-		mddev->queue->limits.discard_granularity = stripe;
-
-		blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
-
-		rdev_for_each(rdev, mddev) {
-			disk_stack_limits(mddev->gendisk, rdev->bdev,
-					  rdev->data_offset << 9);
-			disk_stack_limits(mddev->gendisk, rdev->bdev,
-					  rdev->new_data_offset << 9);
-		}
+	/*
+	 * The read-ahead size must cover two whole stripes, which is
+	 * 2 * (datadisks) * chunksize where 'n' is the number of raid devices.
+	 */
+	data_disks = conf->previous_raid_disks - conf->max_degraded;
+	/*
+	 * We can only discard a whole stripe. It doesn't make sense to
+	 * discard data disk but write parity disk
+	 */
+	stripe = roundup_pow_of_two(data_disks * (mddev->chunk_sectors << 9));
+
+	blk_set_stacking_limits(&lim);
+	lim.io_min = mddev->chunk_sectors << 9;
+	lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded);
+	lim.raid_partial_stripes_expensive = 1;
+	lim.discard_granularity = stripe;
+	lim.max_write_zeroes_sectors = 0;
+	mddev_stack_rdev_limits(mddev, &lim);
+	rdev_for_each(rdev, mddev) {
+		queue_limits_stack_bdev(&lim, rdev->bdev, rdev->new_data_offset,
+	                         mddev->gendisk->disk_name);
+	}
 
-		/*
-		 * zeroing is required, otherwise data
-		 * could be lost. Consider a scenario: discard a stripe
-		 * (the stripe could be inconsistent if
-		 * discard_zeroes_data is 0); write one disk of the
-		 * stripe (the stripe could be inconsistent again
-		 * depending on which disks are used to calculate
-		 * parity); the disk is broken; The stripe data of this
-		 * disk is lost.
-		 *
-		 * We only allow DISCARD if the sysadmin has confirmed that
-		 * only safe devices are in use by setting a module parameter.
-		 * A better idea might be to turn DISCARD into WRITE_ZEROES
-		 * requests, as that is required to be safe.
-		 */
-		if (!devices_handle_discard_safely ||
-		    mddev->queue->limits.max_discard_sectors < (stripe >> 9) ||
-		    mddev->queue->limits.discard_granularity < stripe)
-			blk_queue_max_discard_sectors(mddev->queue, 0);
+	/*
+	 * Zeroing is required for discard, otherwise data could be lost.
+	 *
+	 * Consider a scenario: discard a stripe (the stripe could be
+	 * inconsistent if discard_zeroes_data is 0); write one disk of the
+	 * stripe (the stripe could be inconsistent again depending on which
+	 * disks are used to calculate parity); the disk is broken; The stripe
+	 * data of this disk is lost.
+	 *
+	 * We only allow DISCARD if the sysadmin has confirmed that only safe
+	 * devices are in use by setting a module parameter.  A better idea
+	 * might be to turn DISCARD into WRITE_ZEROES requests, as that is
+	 * required to be safe.
+	 */
+	if (!devices_handle_discard_safely ||
+	    lim.max_discard_sectors < (stripe >> 9) ||
+	    lim.discard_granularity < stripe)
+		lim.max_hw_discard_sectors = 0;
 
-		/*
-		 * Requests require having a bitmap for each stripe.
-		 * Limit the max sectors based on this.
-		 */
-		blk_queue_max_hw_sectors(mddev->queue,
-			RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf));
+	/*
+	 * Requests require having a bitmap for each stripe.
+	 * Limit the max sectors based on this.
+	 */
+	lim.max_hw_sectors = RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf);
 
-		/* No restrictions on the number of segments in the request */
-		blk_queue_max_segments(mddev->queue, USHRT_MAX);
-	}
+	/* No restrictions on the number of segments in the request */
+	lim.max_segments = USHRT_MAX;
+	ret = queue_limits_set(mddev->queue, &lim);
+	if (ret)
+		goto abort;
 
 	if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
 		goto abort;
@@ -8022,7 +8011,7 @@ static int raid5_run(struct mddev *mddev)
 	free_conf(conf);
 	mddev->private = NULL;
 	pr_warn("md/raid:%s: failed to run raid set.\n", mdname(mddev));
-	return -EIO;
+	return ret;
 }
 
 static void raid5_free(struct mddev *mddev, void *priv)
@@ -8554,8 +8543,8 @@ static void end_reshape(struct r5conf *conf)
 		spin_unlock_irq(&conf->device_lock);
 		wake_up(&conf->wait_for_overlap);
 
-		if (conf->mddev->queue)
-			raid5_set_io_opt(conf);
+		mddev_update_io_opt(conf->mddev,
+			conf->raid_disks - conf->max_degraded);
 	}
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 9/9] block: remove disk_stack_limits
  2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-02-23 16:12 ` [PATCH 8/9] md/raid5: " Christoph Hellwig
@ 2024-02-23 16:12 ` Christoph Hellwig
  2024-02-23 17:36 ` atomic queue limit updates for stackable devices Mike Snitzer
  9 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 16:12 UTC (permalink / raw)
  To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai
  Cc: dm-devel, linux-block, linux-raid

disk_stack_limits is unused now, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-settings.c   | 24 ------------------------
 include/linux/blkdev.h |  2 --
 2 files changed, 26 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index f14d3a18f9e2f0..299ecc399c0e6f 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -915,30 +915,6 @@ void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
 }
 EXPORT_SYMBOL_GPL(queue_limits_stack_bdev);
 
-/**
- * disk_stack_limits - adjust queue limits for stacked drivers
- * @disk:  MD/DM gendisk (top)
- * @bdev:  the underlying block device (bottom)
- * @offset:  offset to beginning of data within component device
- *
- * Description:
- *    Merges the limits for a top level gendisk and a bottom level
- *    block_device.
- */
-void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
-		       sector_t offset)
-{
-	struct request_queue *t = disk->queue;
-
-	if (blk_stack_limits(&t->limits, &bdev_get_queue(bdev)->limits,
-			get_start_sect(bdev) + (offset >> 9)) < 0)
-		pr_notice("%s: Warning: Device %pg is misaligned\n",
-			disk->disk_name, bdev);
-
-	disk_update_readahead(disk);
-}
-EXPORT_SYMBOL(disk_stack_limits);
-
 /**
  * blk_queue_update_dma_pad - update pad mask
  * @q:     the request queue for the device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 285e82723d641f..75c909865a8b7b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -926,8 +926,6 @@ extern int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 			    sector_t offset);
 void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
 		sector_t offset, const char *pfx);
-extern void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
-			      sector_t offset);
 extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
 extern void blk_queue_segment_boundary(struct request_queue *, unsigned long);
 extern void blk_queue_virt_boundary(struct request_queue *, unsigned long);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/9] dm: use queue_limits_set
  2024-02-23 16:12 ` [PATCH 3/9] dm: use queue_limits_set Christoph Hellwig
@ 2024-02-23 17:30   ` Mike Snitzer
  0 siblings, 0 replies; 23+ messages in thread
From: Mike Snitzer @ 2024-02-23 17:30 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Mikulas Patocka, Song Liu, Yu Kuai, dm-devel,
	linux-block, linux-raid

On Fri, Feb 23 2024 at 11:12P -0500,
Christoph Hellwig <hch@lst.de> wrote:

> Use queue_limits_set which validates the limits and takes care of
> updating the readahead settings instead of directly assigning them to
> the queue.  For that make sure all limits are actually updated before
> the assignment.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good,

Reviewed-by: Mike Snitzer <snitzer@kernel.org>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-02-23 16:12 ` [PATCH 9/9] block: remove disk_stack_limits Christoph Hellwig
@ 2024-02-23 17:36 ` Mike Snitzer
  2024-02-23 17:38   ` Mike Snitzer
                     ` (2 more replies)
  9 siblings, 3 replies; 23+ messages in thread
From: Mike Snitzer @ 2024-02-23 17:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Mikulas Patocka, Song Liu, Yu Kuai, dm-devel,
	linux-block, linux-raid

On Fri, Feb 23 2024 at 11:12P -0500,
Christoph Hellwig <hch@lst.de> wrote:

> Hi all,
> 
> this series adds new helpers for the atomic queue limit update
> functionality and then switches dm and md over to it.  The dm switch is
> pretty trivial as it was basically implementing the model by hand
> already, md is a bit more work.
> 
> I've run the mdadm testsuite, and it has the same (rather large) number
> of failures as the baseline.  I've still not managed to get the dm
> testuite running unfortunately, but it survives xfstests which exercises
> quite a few dm targets and blktests.

Which DM testsuite are you trying?  There is the old ruby-based
"device-mapper-test-suite", and a newer one written in python which
should hopefully be less hassle to setup and run, see:
https://github.com/jthornber/dmtest-python

Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-23 17:36 ` atomic queue limit updates for stackable devices Mike Snitzer
@ 2024-02-23 17:38   ` Mike Snitzer
  2024-02-27 15:10     ` Christoph Hellwig
  2024-02-23 17:41   ` Christoph Hellwig
  2024-02-27 15:09   ` Christoph Hellwig
  2 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2024-02-23 17:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Mikulas Patocka, Song Liu, Yu Kuai, dm-devel,
	linux-block, linux-raid

On Fri, Feb 23 2024 at 12:36P -0500,
Mike Snitzer <snitzer@kernel.org> wrote:

> On Fri, Feb 23 2024 at 11:12P -0500,
> Christoph Hellwig <hch@lst.de> wrote:
> 
> > Hi all,
> > 
> > this series adds new helpers for the atomic queue limit update
> > functionality and then switches dm and md over to it.  The dm switch is
> > pretty trivial as it was basically implementing the model by hand
> > already, md is a bit more work.
> > 
> > I've run the mdadm testsuite, and it has the same (rather large) number
> > of failures as the baseline.  I've still not managed to get the dm
> > testuite running unfortunately, but it survives xfstests which exercises
> > quite a few dm targets and blktests.
> 
> Which DM testsuite are you trying?  There is the old ruby-based
> "device-mapper-test-suite", and a newer one written in python which
> should hopefully be less hassle to setup and run, see:
> https://github.com/jthornber/dmtest-python

Also, you can use the lvm2 source code's testsuite to get really solid
DM test coverge (particularly for changes in this patchset which is
dealing with setting limits at device creation).

Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-23 17:36 ` atomic queue limit updates for stackable devices Mike Snitzer
  2024-02-23 17:38   ` Mike Snitzer
@ 2024-02-23 17:41   ` Christoph Hellwig
  2024-02-27 15:09   ` Christoph Hellwig
  2 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-23 17:41 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Jens Axboe, Mikulas Patocka, Song Liu, Yu Kuai,
	dm-devel, linux-block, linux-raid

On Fri, Feb 23, 2024 at 12:36:50PM -0500, Mike Snitzer wrote:
> Which DM testsuite are you trying?  There is the old ruby-based
> "device-mapper-test-suite",

Yes.

> and a newer one written in python which
> should hopefully be less hassle to setup and run, see:
> https://github.com/jthornber/dmtest-python

Oh, I didn't know that one.  I'll give it a spin, and maybe the
lvm2 one as well.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-23 17:36 ` atomic queue limit updates for stackable devices Mike Snitzer
  2024-02-23 17:38   ` Mike Snitzer
  2024-02-23 17:41   ` Christoph Hellwig
@ 2024-02-27 15:09   ` Christoph Hellwig
  2 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-27 15:09 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Jens Axboe, Mikulas Patocka, Song Liu, Yu Kuai,
	dm-devel, linux-block, linux-raid, Joe Thornber

On Fri, Feb 23, 2024 at 12:36:50PM -0500, Mike Snitzer wrote:
> Which DM testsuite are you trying?  There is the old ruby-based
> "device-mapper-test-suite", and a newer one written in python which
> should hopefully be less hassle to setup and run, see:
> https://github.com/jthornber/dmtest-python

I gave this a spin on the plane yesterday, but it seems to want
pip based dependencies, which the pip packaged on Debian refuses
with a refereence to the python PEP 668.

I'll see if the dependencies are properly packaged, and can send
patches to improve the documentation. 

> 
> Mike
---end quoted text---

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-23 17:38   ` Mike Snitzer
@ 2024-02-27 15:10     ` Christoph Hellwig
  2024-02-27 15:16       ` Mike Snitzer
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-27 15:10 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Jens Axboe, Mikulas Patocka, Song Liu, Yu Kuai,
	dm-devel, linux-block, linux-raid

On Fri, Feb 23, 2024 at 12:38:46PM -0500, Mike Snitzer wrote:
> Also, you can use the lvm2 source code's testsuite to get really solid
> DM test coverge (particularly for changes in this patchset which is
> dealing with setting limits at device creation).

And that one runs fine, although even with Jens' tree as a baseline
it hangs in the md code when dm tries to stop it.  Trying mainline
now..


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-27 15:10     ` Christoph Hellwig
@ 2024-02-27 15:16       ` Mike Snitzer
  2024-02-27 15:17         ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2024-02-27 15:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Mikulas Patocka, Song Liu, Yu Kuai, dm-devel,
	linux-block, linux-raid

On Tue, Feb 27 2024 at 10:10P -0500,
Christoph Hellwig <hch@lst.de> wrote:

> On Fri, Feb 23, 2024 at 12:38:46PM -0500, Mike Snitzer wrote:
> > Also, you can use the lvm2 source code's testsuite to get really solid
> > DM test coverge (particularly for changes in this patchset which is
> > dealing with setting limits at device creation).
> 
> And that one runs fine, although even with Jens' tree as a baseline
> it hangs in the md code when dm tries to stop it.  Trying mainline
> now..

That's the mainline issue a bunch of MD (and dm-raid) oriented
engineers are working hard to fix, they've been discussing on
linux-raid (with many iterations of proposed patches).

It regressed due to 6.8 MD changes (maybe earlier).

Mike

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-27 15:16       ` Mike Snitzer
@ 2024-02-27 15:17         ` Christoph Hellwig
  2024-02-27 15:36           ` Mike Snitzer
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-27 15:17 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Jens Axboe, Mikulas Patocka, Song Liu, Yu Kuai,
	dm-devel, linux-block, linux-raid

On Tue, Feb 27, 2024 at 10:16:39AM -0500, Mike Snitzer wrote:
> That's the mainline issue a bunch of MD (and dm-raid) oriented
> engineers are working hard to fix, they've been discussing on
> linux-raid (with many iterations of proposed patches).
> 
> It regressed due to 6.8 MD changes (maybe earlier).


Do you know if there is a way to skip specific tests to get a useful
baseline value (and to complete the run?)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-27 15:17         ` Christoph Hellwig
@ 2024-02-27 15:36           ` Mike Snitzer
  2024-02-27 21:50             ` Song Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Mike Snitzer @ 2024-02-27 15:36 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Mikulas Patocka, Song Liu, Yu Kuai, dm-devel,
	linux-block, linux-raid, lvm-devel

On Tue, Feb 27 2024 at 10:17P -0500,
Christoph Hellwig <hch@lst.de> wrote:

> On Tue, Feb 27, 2024 at 10:16:39AM -0500, Mike Snitzer wrote:
> > That's the mainline issue a bunch of MD (and dm-raid) oriented
> > engineers are working hard to fix, they've been discussing on
> > linux-raid (with many iterations of proposed patches).
> > 
> > It regressed due to 6.8 MD changes (maybe earlier).
> 
> 
> Do you know if there is a way to skip specific tests to get a useful
> baseline value (and to complete the run?)

I only know to sprinkle 'skip' code around to explicitly force the
test to get skipped (e.g. in test/shell/, adding 'skip' at the top of
each test as needed).

But I've cc'd the lvm-devel mailing list in case there is an easier
way.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-27 15:36           ` Mike Snitzer
@ 2024-02-27 21:50             ` Song Liu
  2024-02-28 19:56               ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Song Liu @ 2024-02-27 21:50 UTC (permalink / raw)
  To: Mike Snitzer, Benjamin Marzinski, Xiao Ni, Zdenek Kabelac
  Cc: Christoph Hellwig, Jens Axboe, Mikulas Patocka, Yu Kuai, dm-devel,
	linux-block, linux-raid, lvm-devel

CC Benjamin, Zdenek, and Xiao, who are running the lvm tests.

On Tue, Feb 27, 2024 at 7:36 AM Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Tue, Feb 27 2024 at 10:17P -0500,
> Christoph Hellwig <hch@lst.de> wrote:
>
> > On Tue, Feb 27, 2024 at 10:16:39AM -0500, Mike Snitzer wrote:
> > > That's the mainline issue a bunch of MD (and dm-raid) oriented
> > > engineers are working hard to fix, they've been discussing on
> > > linux-raid (with many iterations of proposed patches).
> > >
> > > It regressed due to 6.8 MD changes (maybe earlier).
> >
> >
> > Do you know if there is a way to skip specific tests to get a useful
> > baseline value (and to complete the run?)
>
> I only know to sprinkle 'skip' code around to explicitly force the
> test to get skipped (e.g. in test/shell/, adding 'skip' at the top of
> each test as needed).

I think we can do something like:

make check S=<list of test to skip>

I don't have a reliable list to skip at the moment, as some of the tests
fail on some systems but not on others. However, per early report,
I guess we can start with the following skip list:

shell/integrity-caching.sh
shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh
shell/lvconvert-raid-reshape.sh

Thanks,
Song

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-27 21:50             ` Song Liu
@ 2024-02-28 19:56               ` Christoph Hellwig
  2024-02-29  2:02                 ` Song Liu
  0 siblings, 1 reply; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-28 19:56 UTC (permalink / raw)
  To: Song Liu
  Cc: Mike Snitzer, Benjamin Marzinski, Xiao Ni, Zdenek Kabelac,
	Christoph Hellwig, Jens Axboe, Mikulas Patocka, Yu Kuai, dm-devel,
	linux-block, linux-raid, lvm-devel

On Tue, Feb 27, 2024 at 01:50:19PM -0800, Song Liu wrote:
> I think we can do something like:
> 
> make check S=<list of test to skip>
> 
> I don't have a reliable list to skip at the moment, as some of the tests
> fail on some systems but not on others. However, per early report,
> I guess we can start with the following skip list:
> 
> shell/integrity-caching.sh
> shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh
> shell/lvconvert-raid-reshape.sh

Thanks.  I've been iterating over it this morning, eventually growing
to:

make check
S=shell/integrity-caching.sh,shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh,shell/lvconvert-raid-reshape.sh,shell/lvconvert-raid-reshape-linear_to_striped-single-type.sh,shell/lvconvert-raid-reshape-linear_to_striped.sh,shell/lvchange-raid456.sh,shell/component-raid.sh,shell/lvconvert-raid-reshape-load.sh,shell/lvchange-raid-transient-failures.sh,shell/lvconvert-raid-reshape-striped_to_linear-single-type.sh,shell/lvconvert-raid-reshape-striped_to_linear.sh,shell/lvconvert-raid-reshape-stripes-load-fail.sh,shell/lvconvert-raid-reshape-stripes-load-reload.sh,shell/lvconvert-raid-reshape-stripes-load.sh,lvconvert-raid-reshape.sh,shell/lvconvert-raid-restripe-linear.sh,shell/lvconvert-raid-status-validation.sh,shell/lvconvert-raid-takeover-linear_to_raid4.sh,shell/lvconvert-raid-takeover-raid4_to_linear.sh,shell/lvconvert-raid-takeover-alloc-failure.sh

before giving up.  I then tried to run the md-6.9 branch that's
supposed to have the fixes, but I still see the same md_stop_writes
hangs.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-28 19:56               ` Christoph Hellwig
@ 2024-02-29  2:02                 ` Song Liu
  2024-02-29 13:20                   ` Christoph Hellwig
  0 siblings, 1 reply; 23+ messages in thread
From: Song Liu @ 2024-02-29  2:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mike Snitzer, Benjamin Marzinski, Xiao Ni, Zdenek Kabelac,
	Jens Axboe, Mikulas Patocka, Yu Kuai, dm-devel, linux-block,
	linux-raid, lvm-devel

On Wed, Feb 28, 2024 at 11:56 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Tue, Feb 27, 2024 at 01:50:19PM -0800, Song Liu wrote:
> > I think we can do something like:
> >
> > make check S=<list of test to skip>
> >
> > I don't have a reliable list to skip at the moment, as some of the tests
> > fail on some systems but not on others. However, per early report,
> > I guess we can start with the following skip list:
> >
> > shell/integrity-caching.sh
> > shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh
> > shell/lvconvert-raid-reshape.sh
>
> Thanks.  I've been iterating over it this morning, eventually growing
> to:
>
> make check
> S=shell/integrity-caching.sh,shell/lvconvert-raid-reshape-linear_to_raid6-single-type.sh,shell/lvconvert-raid-reshape.sh,shell/lvconvert-raid-reshape-linear_to_striped-single-type.sh,shell/lvconvert-raid-reshape-linear_to_striped.sh,shell/lvchange-raid456.sh,shell/component-raid.sh,shell/lvconvert-raid-reshape-load.sh,shell/lvchange-raid-transient-failures.sh,shell/lvconvert-raid-reshape-striped_to_linear-single-type.sh,shell/lvconvert-raid-reshape-striped_to_linear.sh,shell/lvconvert-raid-reshape-stripes-load-fail.sh,shell/lvconvert-raid-reshape-stripes-load-reload.sh,shell/lvconvert-raid-reshape-stripes-load.sh,lvconvert-raid-reshape.sh,shell/lvconvert-raid-restripe-linear.sh,shell/lvconvert-raid-status-validation.sh,shell/lvconvert-raid-takeover-linear_to_raid4.sh,shell/lvconvert-raid-takeover-raid4_to_linear.sh,shell/lvconvert-raid-takeover-alloc-failure.sh
>
> before giving up.  I then tried to run the md-6.9 branch that's
> supposed to have the fixes, but I still see the same md_stop_writes
> hangs.

md-6.9 branch doesn't have all the fixes, as some recent fixes
are routed via the md-6.8 branch. You can try on this branch, which
should provide a better base line. The set applies cleanly on this
branch.

https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-6.9-for-hch

Thanks,
Song

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: atomic queue limit updates for stackable devices
  2024-02-29  2:02                 ` Song Liu
@ 2024-02-29 13:20                   ` Christoph Hellwig
  0 siblings, 0 replies; 23+ messages in thread
From: Christoph Hellwig @ 2024-02-29 13:20 UTC (permalink / raw)
  To: Song Liu
  Cc: Christoph Hellwig, Mike Snitzer, Benjamin Marzinski, Xiao Ni,
	Zdenek Kabelac, Jens Axboe, Mikulas Patocka, Yu Kuai, dm-devel,
	linux-block, linux-raid, lvm-devel

On Wed, Feb 28, 2024 at 06:02:33PM -0800, Song Liu wrote:
> md-6.9 branch doesn't have all the fixes, as some recent fixes
> are routed via the md-6.8 branch. You can try on this branch, which
> should provide a better base line. The set applies cleanly on this
> branch.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-6.9-for-hch

This branch crashes for me when running the lvm2 test suite:

###      running: [ndev-vanilla] shell/lvconvert-raid-reshape.sh  0:26.281[
1108.566441] md: mdX: re.
[ 1108.694826] md/raid:mdX: device dm-67 operational as raid disk 0
[ 1108.695034] md/raid:mdX: device dm-69 operational as raid disk 1
[ 1108.695360] md/raid:mdX: device dm-71 operational as raid disk 2
[ 1108.695532] md/raid:mdX: device dm-73 operational as raid disk 3
[ 1108.696468] md/raid:mdX: raid level 5 active with 4 out of 4 devices,
algorithm 2
[ 1108.696801] device-mapper: raid: raid456 discard support disabled due to
discard_zeroes_data unce.
[ 1108.697059] device-mapper: raid: Set dm-raid.devices_handle_discard_safely=Y
to override.
[ 1109.129345] md/raid:mdX: device dm-67 operational as raid disk 0
[ 1109.129550] md/raid:mdX: device dm-69 operational as raid disk 1
[ 1109.129720] md/raid:mdX: device dm-71 operational as raid disk 2
[ 1109.129887] md/raid:mdX: device dm-73 operational as raid disk 3
[ 1109.130775] md/raid:mdX: raid level 5 active with 4 out of 4 devices,
algorithm 5
[ 1109.134517] device-mapper: raid: raid456 discard support disabled due to
discard_zeroes_data unce.
[ 1109.135207] device-mapper: raid: Set dm-raid.devices_handle_discard_safely=Y
to override.
[ 1112.713392] md: reshape of RAID array mdX
[ 1112.828252] BUG: kernel NULL pointer dereference, address: 0000000000000088
[ 1112.828467] #PF: supervisor read access in kernel mode
[ 1112.828613] #PF: error_code(0x0000) - not-present page
[ 1112.828755] PGD 0 P4D 0 
[ 1112.828829] Oops: 0000 [#2] PREEMPT SMP NOPTI
[ 1112.828955] CPU: 1 PID: 1785 Comm: kworker/1:2 Tainted: G      D W
6.8.0-rc3+ #2235
[ 1112.829181] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.2-debian-1.16.2-1 04/014
[ 1112.829422] Workqueue: md_misc md_start_sync
[ 1112.829542] RIP: 0010:md_start_sync+0x66/0x2e0
[ 1112.829666] Code: c0 0f 85 ef 00 00 00 48 83 bb 50 fd ff ff ff 0f 84 9d 01
00 00 48 8b 83 90 fb f5
[ 1112.830197] RSP: 0018:ffffc900016dbe28 EFLAGS: 00010213
[ 1112.830337] RAX: 0000000000000000 RBX: ffff888115a224d0 RCX:
0000000000000000
[ 1112.830527] RDX: 0000000000000000 RSI: ffffffff8301a09e RDI:
00000000ffffffff
[ 1112.830717] RBP: ffff888115a222b0 R08: 0000000000000001 R09:
0000000000000000
[ 1112.830906] R10: ffffc900016dbe28 R11: 0000000000000001 R12:
ffff888115a222b1
[ 1112.831094] R13: ffff888115a22058 R14: 0000000000000000 R15:
ffffffff81190681
[ 1112.831285] FS:  0000000000000000(0000) GS:ffff8881f9d00000(0000)
knlGS:0000000000000000
[ 1112.831497] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1112.831653] CR2: 0000000000000088 CR3: 000000010426c000 CR4:
0000000000750ef0
[ 1112.831879] PKRU: 55555554
[ 1112.831954] Call Trace:
[ 1112.832024]  <TASK>
[ 1112.832085]  ? __die+0x1e/0x60
[ 1112.832173]  ? page_fault_oops+0x154/0x450
[ 1112.832286]  ? do_user_addr_fault+0x69/0x7e0
[ 1112.832403]  ? exc_page_fault+0x6d/0x1c0
[ 1112.832512]  ? asm_exc_page_fault+0x26/0x30
[ 1112.832628]  ? process_one_work+0x171/0x4a0
[ 1112.832743]  ? md_start_sync+0x66/0x2e0
[ 1112.832849]  ? md_start_sync+0x35/0x2e0
[ 1112.832957]  process_one_work+0x1d8/0x4a0
[ 1112.833066]  worker_thread+0x1ce/0x3b0
[ 1112.833169]  ? wq_sysfs_prep_attrs+0x90/0x90
[ 1112.833285]  kthread+0xf2/0x120
[ 1112.833374]  ? kthread_complete_and_exit+0x20/0x20
[ 1112.833504]  ret_from_fork+0x2c/0x40
[ 1112.833616]  ? kthread_complete_and_exit+0x20/0x20
[ 1112.833746]  ret_from_fork_asm+0x11/0x20
[ 1112.833855]  </TASK>
[ 1112.833918] Modules linked in: dm_raid i2c_i801 crc32_pclmul i2c_smbus [last
unloaded: scsi_debug]
[ 1112.834156] CR2: 0000000000000088
[ 1112.834248] ---[ end trace 0000000000000000 ]---
[ 1112.834373] RIP: 0010:remove_and_add_spares+0x72/0x2f0

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2024-02-29 13:21 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-23 16:12 atomic queue limit updates for stackable devices Christoph Hellwig
2024-02-23 16:12 ` [PATCH 1/9] block: add a queue_limits_set helper Christoph Hellwig
2024-02-23 16:12 ` [PATCH 2/9] block: add a queue_limits_stack_bdev helper Christoph Hellwig
2024-02-23 16:12 ` [PATCH 3/9] dm: use queue_limits_set Christoph Hellwig
2024-02-23 17:30   ` Mike Snitzer
2024-02-23 16:12 ` [PATCH 4/9] md: add queue limit helpers Christoph Hellwig
2024-02-23 16:12 ` [PATCH 5/9] md/raid0: use the atomic queue limit update APIs Christoph Hellwig
2024-02-23 16:12 ` [PATCH 6/9] md/raid1: " Christoph Hellwig
2024-02-23 16:12 ` [PATCH 7/9] md/raid10: " Christoph Hellwig
2024-02-23 16:12 ` [PATCH 8/9] md/raid5: " Christoph Hellwig
2024-02-23 16:12 ` [PATCH 9/9] block: remove disk_stack_limits Christoph Hellwig
2024-02-23 17:36 ` atomic queue limit updates for stackable devices Mike Snitzer
2024-02-23 17:38   ` Mike Snitzer
2024-02-27 15:10     ` Christoph Hellwig
2024-02-27 15:16       ` Mike Snitzer
2024-02-27 15:17         ` Christoph Hellwig
2024-02-27 15:36           ` Mike Snitzer
2024-02-27 21:50             ` Song Liu
2024-02-28 19:56               ` Christoph Hellwig
2024-02-29  2:02                 ` Song Liu
2024-02-29 13:20                   ` Christoph Hellwig
2024-02-23 17:41   ` Christoph Hellwig
2024-02-27 15:09   ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).