* atomic queue limit updates for stackable devices v2
@ 2024-02-26 10:29 Christoph Hellwig
2024-02-26 10:29 ` [PATCH 01/16] block: add a queue_limits_set helper Christoph Hellwig
` (15 more replies)
0 siblings, 16 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Hi all,
this series adds new helpers for the atomic queue limit update
functionality and then switches dm and md over to it. The dm switch is
pretty trivial as it was basically implementing the model by hand
already, md is a bit more work.
I've run the mdadm testsuite, and it has the same (rather large) number
of failures as the baseline. I've still not managed to get the dm
testuite running unfortunately, but it survives xfstests which exercises
quite a few dm targets and blktests.
nvme-multipath will be handled separately as it is too tightly integrated
with the rest of nvme.
Changes since v1:
- a few kerneldoc fixes
- fix a line remove after testing in raid0
- also add drbd
Diffstat:
block/blk-settings.c | 47 ++++++---
drivers/block/drbd/drbd_main.c | 13 +-
drivers/block/drbd/drbd_nl.c | 210 +++++++++++++++++++----------------------
drivers/md/dm-table.c | 27 ++---
drivers/md/md.c | 37 +++++++
drivers/md/md.h | 3
drivers/md/raid0.c | 37 +++----
drivers/md/raid1.c | 24 +---
drivers/md/raid10.c | 52 ++++------
drivers/md/raid5.c | 123 ++++++++++--------------
include/linux/blkdev.h | 5
11 files changed, 305 insertions(+), 273 deletions(-)
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 01/16] block: add a queue_limits_set helper
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 10:29 ` [PATCH 02/16] block: add a queue_limits_stack_bdev helper Christoph Hellwig
` (14 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Add a small wrapper around queue_limits_commit_update for stacking
drivers that don't want to update existing limits, but set an
entirely new set.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-settings.c | 18 ++++++++++++++++++
include/linux/blkdev.h | 1 +
2 files changed, 19 insertions(+)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index b6bbe683d218fa..1989a177be201b 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -266,6 +266,24 @@ int queue_limits_commit_update(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(queue_limits_commit_update);
+/**
+ * queue_limits_commit_set - apply queue limits to queue
+ * @q: queue to update
+ * @lim: limits to apply
+ *
+ * Apply the limits in @lim that were freshly initialized to @q.
+ * To update existing limits use queue_limits_start_update() and
+ * queue_limits_commit_update() instead.
+ *
+ * Returns 0 if successful, else a negative error code.
+ */
+int queue_limits_set(struct request_queue *q, struct queue_limits *lim)
+{
+ mutex_lock(&q->limits_lock);
+ return queue_limits_commit_update(q, lim);
+}
+EXPORT_SYMBOL_GPL(queue_limits_set);
+
/**
* blk_queue_bounce_limit - set bounce buffer limit for queue
* @q: the request queue for the device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index a14ea934413850..dd510ad7ce4b45 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -889,6 +889,7 @@ queue_limits_start_update(struct request_queue *q)
}
int queue_limits_commit_update(struct request_queue *q,
struct queue_limits *lim);
+int queue_limits_set(struct request_queue *q, struct queue_limits *lim);
/*
* Access functions for manipulating queue properties
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 02/16] block: add a queue_limits_stack_bdev helper
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
2024-02-26 10:29 ` [PATCH 01/16] block: add a queue_limits_set helper Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 10:29 ` [PATCH 03/16] dm: use queue_limits_set Christoph Hellwig
` (13 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Add a small wrapper around blk_stack_limits that allows passing a bdev
for the bottom device and prints an error in case of misaligned
device. The name fits into the new queue limits API and the intent is
to eventually replace disk_stack_limits.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-settings.c | 25 +++++++++++++++++++++++++
include/linux/blkdev.h | 2 ++
2 files changed, 27 insertions(+)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 1989a177be201b..865fe4ebbf9b83 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -891,6 +891,31 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
}
EXPORT_SYMBOL(blk_stack_limits);
+/**
+ * queue_limits_stack_bdev - adjust queue_limits for stacked devices
+ * @t: the stacking driver limits (top device)
+ * @bdev: the underlying block device (bottom)
+ * @offset: offset to beginning of data within component device
+ * @pfx: prefix to use for warnings logged
+ *
+ * Description:
+ * This function is used by stacking drivers like MD and DM to ensure
+ * that all component devices have compatible block sizes and
+ * alignments. The stacking driver must provide a queue_limits
+ * struct (top) and then iteratively call the stacking function for
+ * all component (bottom) devices. The stacking function will
+ * attempt to combine the values and ensure proper alignment.
+ */
+void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
+ sector_t offset, const char *pfx)
+{
+ if (blk_stack_limits(t, &bdev_get_queue(bdev)->limits,
+ get_start_sect(bdev) + offset))
+ pr_notice("%s: Warning: Device %pg is misaligned\n",
+ pfx, bdev);
+}
+EXPORT_SYMBOL_GPL(queue_limits_stack_bdev);
+
/**
* disk_stack_limits - adjust queue limits for stacked drivers
* @disk: MD/DM gendisk (top)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index dd510ad7ce4b45..285e82723d641f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -924,6 +924,8 @@ extern void blk_set_queue_depth(struct request_queue *q, unsigned int depth);
extern void blk_set_stacking_limits(struct queue_limits *lim);
extern int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
sector_t offset);
+void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
+ sector_t offset, const char *pfx);
extern void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
sector_t offset);
extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 03/16] dm: use queue_limits_set
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
2024-02-26 10:29 ` [PATCH 01/16] block: add a queue_limits_set helper Christoph Hellwig
2024-02-26 10:29 ` [PATCH 02/16] block: add a queue_limits_stack_bdev helper Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 10:29 ` [PATCH 04/16] md: add queue limit helpers Christoph Hellwig
` (12 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Use queue_limits_set which validates the limits and takes care of
updating the readahead settings instead of directly assigning them to
the queue. For that make sure all limits are actually updated before
the assignment.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-settings.c | 2 +-
drivers/md/dm-table.c | 27 ++++++++++++---------------
2 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 865fe4ebbf9b83..13865a9f89726c 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -267,7 +267,7 @@ int queue_limits_commit_update(struct request_queue *q,
EXPORT_SYMBOL_GPL(queue_limits_commit_update);
/**
- * queue_limits_commit_set - apply queue limits to queue
+ * queue_limits_set - apply queue limits to queue
* @q: queue to update
* @lim: limits to apply
*
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 41f1d731ae5ac2..88114719fe187a 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1963,26 +1963,27 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
bool wc = false, fua = false;
int r;
- /*
- * Copy table's limits to the DM device's request_queue
- */
- q->limits = *limits;
-
if (dm_table_supports_nowait(t))
blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
else
blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, q);
if (!dm_table_supports_discards(t)) {
- q->limits.max_discard_sectors = 0;
- q->limits.max_hw_discard_sectors = 0;
- q->limits.discard_granularity = 0;
- q->limits.discard_alignment = 0;
- q->limits.discard_misaligned = 0;
+ limits->max_hw_discard_sectors = 0;
+ limits->discard_granularity = 0;
+ limits->discard_alignment = 0;
+ limits->discard_misaligned = 0;
}
+ if (!dm_table_supports_write_zeroes(t))
+ limits->max_write_zeroes_sectors = 0;
+
if (!dm_table_supports_secure_erase(t))
- q->limits.max_secure_erase_sectors = 0;
+ limits->max_secure_erase_sectors = 0;
+
+ r = queue_limits_set(q, limits);
+ if (r)
+ return r;
if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_WC))) {
wc = true;
@@ -2007,9 +2008,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
else
blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
- if (!dm_table_supports_write_zeroes(t))
- q->limits.max_write_zeroes_sectors = 0;
-
dm_table_verify_integrity(t);
/*
@@ -2047,7 +2045,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
}
dm_update_crypto_profile(q, t);
- disk_update_readahead(t->md->disk);
/*
* Check for request-based device is left to
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 04/16] md: add queue limit helpers
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (2 preceding siblings ...)
2024-02-26 10:29 ` [PATCH 03/16] dm: use queue_limits_set Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 11:38 ` Yu Kuai
2024-02-26 10:29 ` [PATCH 05/16] md/raid0: use the atomic queue limit update APIs Christoph Hellwig
` (11 subsequent siblings)
15 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Add a few helpers that wrap the block queue limits API for use in MD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/md.c | 37 +++++++++++++++++++++++++++++++++++++
drivers/md/md.h | 3 +++
2 files changed, 40 insertions(+)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 75266c34b1f99b..23823823f80c6b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5699,6 +5699,43 @@ static const struct kobj_type md_ktype = {
int mdp_major = 0;
+/* stack the limit for all rdevs into lim */
+void mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim)
+{
+ struct md_rdev *rdev;
+
+ rdev_for_each(rdev, mddev) {
+ queue_limits_stack_bdev(lim, rdev->bdev, rdev->data_offset,
+ mddev->gendisk->disk_name);
+ }
+}
+EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits);
+
+/* apply the extra stacking limits from a new rdev into mddev */
+int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
+{
+ struct queue_limits lim = queue_limits_start_update(mddev->queue);
+
+ queue_limits_stack_bdev(&lim, rdev->bdev, rdev->data_offset,
+ mddev->gendisk->disk_name);
+ return queue_limits_commit_update(mddev->queue, &lim);
+}
+EXPORT_SYMBOL_GPL(mddev_stack_new_rdev);
+
+/* update the optimal I/O size after a reshape */
+void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes)
+{
+ struct queue_limits lim;
+ int ret;
+
+ blk_mq_freeze_queue(mddev->queue);
+ lim = queue_limits_start_update(mddev->queue);
+ lim.io_opt = lim.io_min * nr_stripes;
+ ret = queue_limits_commit_update(mddev->queue, &lim);
+ blk_mq_unfreeze_queue(mddev->queue);
+}
+EXPORT_SYMBOL_GPL(mddev_update_io_opt);
+
static void mddev_delayed_delete(struct work_struct *ws)
{
struct mddev *mddev = container_of(ws, struct mddev, del_work);
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 8d881cc597992f..25b19614aa3239 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -860,6 +860,9 @@ void md_autostart_arrays(int part);
int md_set_array_info(struct mddev *mddev, struct mdu_array_info_s *info);
int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info);
int do_md_run(struct mddev *mddev);
+void mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim);
+int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev);
+void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes);
extern const struct block_device_operations md_fops;
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 05/16] md/raid0: use the atomic queue limit update APIs
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (3 preceding siblings ...)
2024-02-26 10:29 ` [PATCH 04/16] md: add queue limit helpers Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 10:29 ` [PATCH 06/16] md/raid1: " Christoph Hellwig
` (10 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Build the queue limits outside the queue and apply them using
queue_limits_set. Also remove the bogus ->gendisk and ->queue NULL
checks in the are while touching it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid0.c | 37 +++++++++++++++++--------------------
1 file changed, 17 insertions(+), 20 deletions(-)
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index c50a7abda744ad..dd070e9b2d5643 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -381,7 +381,8 @@ static void raid0_free(struct mddev *mddev, void *priv)
static int raid0_run(struct mddev *mddev)
{
- struct r0conf *conf;
+ struct r0conf *conf = mddev->private;
+ struct queue_limits lim;
int ret;
if (mddev->chunk_sectors == 0) {
@@ -391,29 +392,23 @@ static int raid0_run(struct mddev *mddev)
if (md_check_no_bitmap(mddev))
return -EINVAL;
- /* if private is not null, we are here after takeover */
- if (mddev->private == NULL) {
+ /* if conf is not null, we are here after takeover */
+ if (!conf) {
ret = create_strip_zones(mddev, &conf);
if (ret < 0)
return ret;
mddev->private = conf;
}
- conf = mddev->private;
- if (mddev->queue) {
- struct md_rdev *rdev;
-
- blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
- blk_queue_max_write_zeroes_sectors(mddev->queue, mddev->chunk_sectors);
- blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
- blk_queue_io_opt(mddev->queue,
- (mddev->chunk_sectors << 9) * mddev->raid_disks);
-
- rdev_for_each(rdev, mddev) {
- disk_stack_limits(mddev->gendisk, rdev->bdev,
- rdev->data_offset << 9);
- }
- }
+ blk_set_stacking_limits(&lim);
+ lim.max_hw_sectors = mddev->chunk_sectors;
+ lim.max_write_zeroes_sectors = mddev->chunk_sectors;
+ lim.io_min = mddev->chunk_sectors << 9;
+ lim.io_opt = lim.io_min * mddev->raid_disks;
+ mddev_stack_rdev_limits(mddev, &lim);
+ ret = queue_limits_set(mddev->queue, &lim);
+ if (ret)
+ goto out_free_conf;
/* calculate array device size */
md_set_array_sectors(mddev, raid0_size(mddev, 0, 0));
@@ -426,8 +421,10 @@ static int raid0_run(struct mddev *mddev)
ret = md_integrity_register(mddev);
if (ret)
- free_conf(mddev, conf);
-
+ goto out_free_conf;
+ return 0;
+out_free_conf:
+ free_conf(mddev, conf);
return ret;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 06/16] md/raid1: use the atomic queue limit update APIs
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (4 preceding siblings ...)
2024-02-26 10:29 ` [PATCH 05/16] md/raid0: use the atomic queue limit update APIs Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 11:29 ` Yu Kuai
2024-02-26 10:29 ` [PATCH 07/16] md/raid10: " Christoph Hellwig
` (9 subsequent siblings)
15 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Build the queue limits outside the queue and apply them using
queue_limits_set. Also remove the bogus ->gendisk and ->queue NULL
checks in the are while touching it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid1.c | 24 ++++++++++--------------
1 file changed, 10 insertions(+), 14 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 286f8b16c7bde7..752ff99736a636 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1791,10 +1791,9 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
for (mirror = first; mirror <= last; mirror++) {
p = conf->mirrors + mirror;
if (!p->rdev) {
- if (mddev->gendisk)
- disk_stack_limits(mddev->gendisk, rdev->bdev,
- rdev->data_offset << 9);
-
+ err = mddev_stack_new_rdev(mddev, rdev);
+ if (err)
+ return err;
p->head_position = 0;
rdev->raid_disk = mirror;
err = 0;
@@ -3089,9 +3088,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
static void raid1_free(struct mddev *mddev, void *priv);
static int raid1_run(struct mddev *mddev)
{
+ struct queue_limits lim;
struct r1conf *conf;
int i;
- struct md_rdev *rdev;
int ret;
if (mddev->level != 1) {
@@ -3118,15 +3117,12 @@ static int raid1_run(struct mddev *mddev)
if (IS_ERR(conf))
return PTR_ERR(conf);
- if (mddev->queue)
- blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
-
- rdev_for_each(rdev, mddev) {
- if (!mddev->gendisk)
- continue;
- disk_stack_limits(mddev->gendisk, rdev->bdev,
- rdev->data_offset << 9);
- }
+ blk_set_stacking_limits(&lim);
+ lim.max_write_zeroes_sectors = 0;
+ mddev_stack_rdev_limits(mddev, &lim);
+ ret = queue_limits_set(mddev->queue, &lim);
+ if (ret)
+ goto abort;
mddev->degraded = 0;
for (i = 0; i < conf->raid_disks; i++)
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 07/16] md/raid10: use the atomic queue limit update APIs
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (5 preceding siblings ...)
2024-02-26 10:29 ` [PATCH 06/16] md/raid1: " Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 10:29 ` [PATCH 08/16] md/raid5: " Christoph Hellwig
` (8 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Build the queue limits outside the queue and apply them using
queue_limits_set. Also remove the bogus ->gendisk and ->queue NULL
checks in the are while touching it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid10.c | 52 +++++++++++++++++++++------------------------
1 file changed, 24 insertions(+), 28 deletions(-)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 7412066ea22c7a..21d0aced9a0725 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2130,11 +2130,9 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
repl_slot = mirror;
continue;
}
-
- if (mddev->gendisk)
- disk_stack_limits(mddev->gendisk, rdev->bdev,
- rdev->data_offset << 9);
-
+ err = mddev_stack_new_rdev(mddev, rdev);
+ if (err)
+ return err;
p->head_position = 0;
p->recovery_disabled = mddev->recovery_disabled - 1;
rdev->raid_disk = mirror;
@@ -2150,10 +2148,9 @@ static int raid10_add_disk(struct mddev *mddev, struct md_rdev *rdev)
clear_bit(In_sync, &rdev->flags);
set_bit(Replacement, &rdev->flags);
rdev->raid_disk = repl_slot;
- err = 0;
- if (mddev->gendisk)
- disk_stack_limits(mddev->gendisk, rdev->bdev,
- rdev->data_offset << 9);
+ err = mddev_stack_new_rdev(mddev, rdev);
+ if (err)
+ return err;
conf->fullsync = 1;
WRITE_ONCE(p->replacement, rdev);
}
@@ -4002,18 +3999,18 @@ static struct r10conf *setup_conf(struct mddev *mddev)
return ERR_PTR(err);
}
-static void raid10_set_io_opt(struct r10conf *conf)
+static unsigned int raid10_nr_stripes(struct r10conf *conf)
{
- int raid_disks = conf->geo.raid_disks;
+ unsigned int raid_disks = conf->geo.raid_disks;
- if (!(conf->geo.raid_disks % conf->geo.near_copies))
- raid_disks /= conf->geo.near_copies;
- blk_queue_io_opt(conf->mddev->queue, (conf->mddev->chunk_sectors << 9) *
- raid_disks);
+ if (conf->geo.raid_disks % conf->geo.near_copies)
+ return raid_disks;
+ return raid_disks / conf->geo.near_copies;
}
static int raid10_run(struct mddev *mddev)
{
+ struct queue_limits lim;
struct r10conf *conf;
int i, disk_idx;
struct raid10_info *disk;
@@ -4021,6 +4018,7 @@ static int raid10_run(struct mddev *mddev)
sector_t size;
sector_t min_offset_diff = 0;
int first = 1;
+ int ret = -EIO;
if (mddev->private == NULL) {
conf = setup_conf(mddev);
@@ -4047,12 +4045,6 @@ static int raid10_run(struct mddev *mddev)
}
}
- if (mddev->queue) {
- blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
- blk_queue_io_min(mddev->queue, mddev->chunk_sectors << 9);
- raid10_set_io_opt(conf);
- }
-
rdev_for_each(rdev, mddev) {
long long diff;
@@ -4081,14 +4073,19 @@ static int raid10_run(struct mddev *mddev)
if (first || diff < min_offset_diff)
min_offset_diff = diff;
- if (mddev->gendisk)
- disk_stack_limits(mddev->gendisk, rdev->bdev,
- rdev->data_offset << 9);
-
disk->head_position = 0;
first = 0;
}
+ blk_set_stacking_limits(&lim);
+ lim.max_write_zeroes_sectors = 0;
+ lim.io_min = mddev->chunk_sectors << 9;
+ lim.io_opt = lim.io_min * raid10_nr_stripes(conf);
+ mddev_stack_rdev_limits(mddev, &lim);
+ ret = queue_limits_set(mddev->queue, &lim);
+ if (ret)
+ goto out_free_conf;
+
/* need to check that every block has at least one working mirror */
if (!enough(conf, -1)) {
pr_err("md/raid10:%s: not enough operational mirrors.\n",
@@ -4189,7 +4186,7 @@ static int raid10_run(struct mddev *mddev)
raid10_free_conf(conf);
mddev->private = NULL;
out:
- return -EIO;
+ return ret;
}
static void raid10_free(struct mddev *mddev, void *priv)
@@ -4966,8 +4963,7 @@ static void end_reshape(struct r10conf *conf)
conf->reshape_safe = MaxSector;
spin_unlock_irq(&conf->device_lock);
- if (conf->mddev->queue)
- raid10_set_io_opt(conf);
+ mddev_update_io_opt(conf->mddev, raid10_nr_stripes(conf));
conf->fullsync = 0;
}
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 08/16] md/raid5: use the atomic queue limit update APIs
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (6 preceding siblings ...)
2024-02-26 10:29 ` [PATCH 07/16] md/raid10: " Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 10:29 ` [PATCH 09/16] block: remove disk_stack_limits Christoph Hellwig
` (7 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Build the queue limits outside the queue and apply them using
queue_limits_set. Also remove the bogus ->gendisk and ->queue NULL
checks in the are while touching it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/md/raid5.c | 123 +++++++++++++++++++++------------------------
1 file changed, 56 insertions(+), 67 deletions(-)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 14f2cf75abbd72..3dd7c05d3ba2ab 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7682,12 +7682,6 @@ static int only_parity(int raid_disk, int algo, int raid_disks, int max_degraded
return 0;
}
-static void raid5_set_io_opt(struct r5conf *conf)
-{
- blk_queue_io_opt(conf->mddev->queue, (conf->chunk_sectors << 9) *
- (conf->raid_disks - conf->max_degraded));
-}
-
static int raid5_run(struct mddev *mddev)
{
struct r5conf *conf;
@@ -7695,9 +7689,12 @@ static int raid5_run(struct mddev *mddev)
struct md_rdev *rdev;
struct md_rdev *journal_dev = NULL;
sector_t reshape_offset = 0;
+ struct queue_limits lim;
int i;
long long min_offset_diff = 0;
int first = 1;
+ int data_disks, stripe;
+ int ret = -EIO;
if (mddev->recovery_cp != MaxSector)
pr_notice("md/raid:%s: not clean -- starting background reconstruction\n",
@@ -7950,67 +7947,59 @@ static int raid5_run(struct mddev *mddev)
mdname(mddev));
md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
- if (mddev->queue) {
- int chunk_size;
- /* read-ahead size must cover two whole stripes, which
- * is 2 * (datadisks) * chunksize where 'n' is the
- * number of raid devices
- */
- int data_disks = conf->previous_raid_disks - conf->max_degraded;
- int stripe = data_disks *
- ((mddev->chunk_sectors << 9) / PAGE_SIZE);
-
- chunk_size = mddev->chunk_sectors << 9;
- blk_queue_io_min(mddev->queue, chunk_size);
- raid5_set_io_opt(conf);
- mddev->queue->limits.raid_partial_stripes_expensive = 1;
- /*
- * We can only discard a whole stripe. It doesn't make sense to
- * discard data disk but write parity disk
- */
- stripe = stripe * PAGE_SIZE;
- stripe = roundup_pow_of_two(stripe);
- mddev->queue->limits.discard_granularity = stripe;
-
- blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
-
- rdev_for_each(rdev, mddev) {
- disk_stack_limits(mddev->gendisk, rdev->bdev,
- rdev->data_offset << 9);
- disk_stack_limits(mddev->gendisk, rdev->bdev,
- rdev->new_data_offset << 9);
- }
+ /*
+ * The read-ahead size must cover two whole stripes, which is
+ * 2 * (datadisks) * chunksize where 'n' is the number of raid devices.
+ */
+ data_disks = conf->previous_raid_disks - conf->max_degraded;
+ /*
+ * We can only discard a whole stripe. It doesn't make sense to
+ * discard data disk but write parity disk
+ */
+ stripe = roundup_pow_of_two(data_disks * (mddev->chunk_sectors << 9));
+
+ blk_set_stacking_limits(&lim);
+ lim.io_min = mddev->chunk_sectors << 9;
+ lim.io_opt = lim.io_min * (conf->raid_disks - conf->max_degraded);
+ lim.raid_partial_stripes_expensive = 1;
+ lim.discard_granularity = stripe;
+ lim.max_write_zeroes_sectors = 0;
+ mddev_stack_rdev_limits(mddev, &lim);
+ rdev_for_each(rdev, mddev) {
+ queue_limits_stack_bdev(&lim, rdev->bdev, rdev->new_data_offset,
+ mddev->gendisk->disk_name);
+ }
- /*
- * zeroing is required, otherwise data
- * could be lost. Consider a scenario: discard a stripe
- * (the stripe could be inconsistent if
- * discard_zeroes_data is 0); write one disk of the
- * stripe (the stripe could be inconsistent again
- * depending on which disks are used to calculate
- * parity); the disk is broken; The stripe data of this
- * disk is lost.
- *
- * We only allow DISCARD if the sysadmin has confirmed that
- * only safe devices are in use by setting a module parameter.
- * A better idea might be to turn DISCARD into WRITE_ZEROES
- * requests, as that is required to be safe.
- */
- if (!devices_handle_discard_safely ||
- mddev->queue->limits.max_discard_sectors < (stripe >> 9) ||
- mddev->queue->limits.discard_granularity < stripe)
- blk_queue_max_discard_sectors(mddev->queue, 0);
+ /*
+ * Zeroing is required for discard, otherwise data could be lost.
+ *
+ * Consider a scenario: discard a stripe (the stripe could be
+ * inconsistent if discard_zeroes_data is 0); write one disk of the
+ * stripe (the stripe could be inconsistent again depending on which
+ * disks are used to calculate parity); the disk is broken; The stripe
+ * data of this disk is lost.
+ *
+ * We only allow DISCARD if the sysadmin has confirmed that only safe
+ * devices are in use by setting a module parameter. A better idea
+ * might be to turn DISCARD into WRITE_ZEROES requests, as that is
+ * required to be safe.
+ */
+ if (!devices_handle_discard_safely ||
+ lim.max_discard_sectors < (stripe >> 9) ||
+ lim.discard_granularity < stripe)
+ lim.max_hw_discard_sectors = 0;
- /*
- * Requests require having a bitmap for each stripe.
- * Limit the max sectors based on this.
- */
- blk_queue_max_hw_sectors(mddev->queue,
- RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf));
+ /*
+ * Requests require having a bitmap for each stripe.
+ * Limit the max sectors based on this.
+ */
+ lim.max_hw_sectors = RAID5_MAX_REQ_STRIPES << RAID5_STRIPE_SHIFT(conf);
- /* No restrictions on the number of segments in the request */
- blk_queue_max_segments(mddev->queue, USHRT_MAX);
- }
+ /* No restrictions on the number of segments in the request */
+ lim.max_segments = USHRT_MAX;
+ ret = queue_limits_set(mddev->queue, &lim);
+ if (ret)
+ goto abort;
if (log_init(conf, journal_dev, raid5_has_ppl(conf)))
goto abort;
@@ -8022,7 +8011,7 @@ static int raid5_run(struct mddev *mddev)
free_conf(conf);
mddev->private = NULL;
pr_warn("md/raid:%s: failed to run raid set.\n", mdname(mddev));
- return -EIO;
+ return ret;
}
static void raid5_free(struct mddev *mddev, void *priv)
@@ -8554,8 +8543,8 @@ static void end_reshape(struct r5conf *conf)
spin_unlock_irq(&conf->device_lock);
wake_up(&conf->wait_for_overlap);
- if (conf->mddev->queue)
- raid5_set_io_opt(conf);
+ mddev_update_io_opt(conf->mddev,
+ conf->raid_disks - conf->max_degraded);
}
}
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 09/16] block: remove disk_stack_limits
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (7 preceding siblings ...)
2024-02-26 10:29 ` [PATCH 08/16] md/raid5: " Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 10:29 ` [PATCH 10/16] drbd: pass the max_hw_sectors limit to blk_alloc_disk Christoph Hellwig
` (6 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
disk_stack_limits is unused now, remove it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
block/blk-settings.c | 24 ------------------------
include/linux/blkdev.h | 2 --
2 files changed, 26 deletions(-)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 13865a9f89726c..3c7d8d638ab59d 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -916,30 +916,6 @@ void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
}
EXPORT_SYMBOL_GPL(queue_limits_stack_bdev);
-/**
- * disk_stack_limits - adjust queue limits for stacked drivers
- * @disk: MD/DM gendisk (top)
- * @bdev: the underlying block device (bottom)
- * @offset: offset to beginning of data within component device
- *
- * Description:
- * Merges the limits for a top level gendisk and a bottom level
- * block_device.
- */
-void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
- sector_t offset)
-{
- struct request_queue *t = disk->queue;
-
- if (blk_stack_limits(&t->limits, &bdev_get_queue(bdev)->limits,
- get_start_sect(bdev) + (offset >> 9)) < 0)
- pr_notice("%s: Warning: Device %pg is misaligned\n",
- disk->disk_name, bdev);
-
- disk_update_readahead(disk);
-}
-EXPORT_SYMBOL(disk_stack_limits);
-
/**
* blk_queue_update_dma_pad - update pad mask
* @q: the request queue for the device
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 285e82723d641f..75c909865a8b7b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -926,8 +926,6 @@ extern int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
sector_t offset);
void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
sector_t offset, const char *pfx);
-extern void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
- sector_t offset);
extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
extern void blk_queue_segment_boundary(struct request_queue *, unsigned long);
extern void blk_queue_virt_boundary(struct request_queue *, unsigned long);
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 10/16] drbd: pass the max_hw_sectors limit to blk_alloc_disk
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (8 preceding siblings ...)
2024-02-26 10:29 ` [PATCH 09/16] block: remove disk_stack_limits Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 10:29 ` [PATCH 11/16] drbd: refactor drbd_reconsider_queue_parameters Christoph Hellwig
` (5 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Pass a queue_limits structure with the max_hw_sectors limit to
blk_alloc_disk instead of updating the limit on the allocated gendisk.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/drbd/drbd_main.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index cea1e537fd56c1..113b441d4d3670 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2690,6 +2690,14 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
int id;
int vnr = adm_ctx->volume;
enum drbd_ret_code err = ERR_NOMEM;
+ struct queue_limits lim = {
+ /*
+ * Setting the max_hw_sectors to an odd value of 8kibyte here.
+ * This triggers a max_bio_size message upon first attach or
+ * connect.
+ */
+ .max_hw_sectors = DRBD_MAX_BIO_SIZE_SAFE >> 8,
+ };
device = minor_to_device(minor);
if (device)
@@ -2708,7 +2716,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
drbd_init_set_defaults(device);
- disk = blk_alloc_disk(NULL, NUMA_NO_NODE);
+ disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
if (IS_ERR(disk)) {
err = PTR_ERR(disk);
goto out_no_disk;
@@ -2729,9 +2737,6 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
blk_queue_write_cache(disk->queue, true, true);
- /* Setting the max_hw_sectors to an odd value of 8kibyte here
- This triggers a max_bio_size message upon first attach or connect */
- blk_queue_max_hw_sectors(disk->queue, DRBD_MAX_BIO_SIZE_SAFE >> 8);
device->md_io.page = alloc_page(GFP_KERNEL);
if (!device->md_io.page)
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 11/16] drbd: refactor drbd_reconsider_queue_parameters
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (9 preceding siblings ...)
2024-02-26 10:29 ` [PATCH 10/16] drbd: pass the max_hw_sectors limit to blk_alloc_disk Christoph Hellwig
@ 2024-02-26 10:29 ` Christoph Hellwig
2024-02-26 10:30 ` [PATCH 12/16] drbd: refactor the backing dev max_segments calculation Christoph Hellwig
` (4 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:29 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Split out a drbd_max_peer_bio_size helper for the peer I/O size,
and condense the various checks to a nested min3(..., max())) instead
of using a lot of local variables.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/drbd/drbd_nl.c | 84 +++++++++++++++++++++---------------
1 file changed, 49 insertions(+), 35 deletions(-)
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 43747a1aae4353..9135001a8e572d 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1189,6 +1189,33 @@ static int drbd_check_al_size(struct drbd_device *device, struct disk_conf *dc)
return 0;
}
+static unsigned int drbd_max_peer_bio_size(struct drbd_device *device)
+{
+ /*
+ * We may ignore peer limits if the peer is modern enough. From 8.3.8
+ * onwards the peer can use multiple BIOs for a single peer_request.
+ */
+ if (device->state.conn < C_WF_REPORT_PARAMS)
+ return device->peer_max_bio_size;
+
+ if (first_peer_device(device)->connection->agreed_pro_version < 94)
+ return min(device->peer_max_bio_size, DRBD_MAX_SIZE_H80_PACKET);
+
+ /*
+ * Correct old drbd (up to 8.3.7) if it believes it can do more than
+ * 32KiB.
+ */
+ if (first_peer_device(device)->connection->agreed_pro_version == 94)
+ return DRBD_MAX_SIZE_H80_PACKET;
+
+ /*
+ * drbd 8.3.8 onwards, before 8.4.0
+ */
+ if (first_peer_device(device)->connection->agreed_pro_version < 100)
+ return DRBD_MAX_BIO_SIZE_P95;
+ return DRBD_MAX_BIO_SIZE;
+}
+
static void blk_queue_discard_granularity(struct request_queue *q, unsigned int granularity)
{
q->limits.discard_granularity = granularity;
@@ -1303,48 +1330,35 @@ static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backi
fixup_discard_support(device, q);
}
-void drbd_reconsider_queue_parameters(struct drbd_device *device, struct drbd_backing_dev *bdev, struct o_qlim *o)
+void drbd_reconsider_queue_parameters(struct drbd_device *device,
+ struct drbd_backing_dev *bdev, struct o_qlim *o)
{
- unsigned int now, new, local, peer;
-
- now = queue_max_hw_sectors(device->rq_queue) << 9;
- local = device->local_max_bio_size; /* Eventually last known value, from volatile memory */
- peer = device->peer_max_bio_size; /* Eventually last known value, from meta data */
+ unsigned int now = queue_max_hw_sectors(device->rq_queue) <<
+ SECTOR_SHIFT;
+ unsigned int new;
if (bdev) {
- local = queue_max_hw_sectors(bdev->backing_bdev->bd_disk->queue) << 9;
- device->local_max_bio_size = local;
- }
- local = min(local, DRBD_MAX_BIO_SIZE);
-
- /* We may ignore peer limits if the peer is modern enough.
- Because new from 8.3.8 onwards the peer can use multiple
- BIOs for a single peer_request */
- if (device->state.conn >= C_WF_REPORT_PARAMS) {
- if (first_peer_device(device)->connection->agreed_pro_version < 94)
- peer = min(device->peer_max_bio_size, DRBD_MAX_SIZE_H80_PACKET);
- /* Correct old drbd (up to 8.3.7) if it believes it can do more than 32KiB */
- else if (first_peer_device(device)->connection->agreed_pro_version == 94)
- peer = DRBD_MAX_SIZE_H80_PACKET;
- else if (first_peer_device(device)->connection->agreed_pro_version < 100)
- peer = DRBD_MAX_BIO_SIZE_P95; /* drbd 8.3.8 onwards, before 8.4.0 */
- else
- peer = DRBD_MAX_BIO_SIZE;
+ struct request_queue *b = bdev->backing_bdev->bd_disk->queue;
- /* We may later detach and re-attach on a disconnected Primary.
- * Avoid this setting to jump back in that case.
- * We want to store what we know the peer DRBD can handle,
- * not what the peer IO backend can handle. */
- if (peer > device->peer_max_bio_size)
- device->peer_max_bio_size = peer;
+ device->local_max_bio_size =
+ queue_max_hw_sectors(b) << SECTOR_SHIFT;
}
- new = min(local, peer);
- if (device->state.role == R_PRIMARY && new < now)
- drbd_err(device, "ASSERT FAILED new < now; (%u < %u)\n", new, now);
-
- if (new != now)
+ /*
+ * We may later detach and re-attach on a disconnected Primary. Avoid
+ * decreasing the value in this case.
+ *
+ * We want to store what we know the peer DRBD can handle, not what the
+ * peer IO backend can handle.
+ */
+ new = min3(DRBD_MAX_BIO_SIZE, device->local_max_bio_size,
+ max(drbd_max_peer_bio_size(device), device->peer_max_bio_size));
+ if (new != now) {
+ if (device->state.role == R_PRIMARY && new < now)
+ drbd_err(device, "ASSERT FAILED new < now; (%u < %u)\n",
+ new, now);
drbd_info(device, "max BIO size = %u\n", new);
+ }
drbd_setup_queue_param(device, bdev, new, o);
}
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 12/16] drbd: refactor the backing dev max_segments calculation
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (10 preceding siblings ...)
2024-02-26 10:29 ` [PATCH 11/16] drbd: refactor drbd_reconsider_queue_parameters Christoph Hellwig
@ 2024-02-26 10:30 ` Christoph Hellwig
2024-02-26 10:30 ` [PATCH 13/16] drbd: merge drbd_setup_queue_param into drbd_reconsider_queue_parameters Christoph Hellwig
` (3 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:30 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Factor out a drbd_backing_dev_max_segments helper that checks the
backing device limitation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/drbd/drbd_nl.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 9135001a8e572d..0326b7322ceb48 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1295,30 +1295,39 @@ static void fixup_discard_support(struct drbd_device *device, struct request_que
}
}
+/* This is the workaround for "bio would need to, but cannot, be split" */
+static unsigned int drbd_backing_dev_max_segments(struct drbd_device *device)
+{
+ unsigned int max_segments;
+
+ rcu_read_lock();
+ max_segments = rcu_dereference(device->ldev->disk_conf)->max_bio_bvecs;
+ rcu_read_unlock();
+
+ if (!max_segments)
+ return BLK_MAX_SEGMENTS;
+ return max_segments;
+}
+
static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backing_dev *bdev,
unsigned int max_bio_size, struct o_qlim *o)
{
struct request_queue * const q = device->rq_queue;
unsigned int max_hw_sectors = max_bio_size >> 9;
- unsigned int max_segments = 0;
+ unsigned int max_segments = BLK_MAX_SEGMENTS;
struct request_queue *b = NULL;
- struct disk_conf *dc;
if (bdev) {
b = bdev->backing_bdev->bd_disk->queue;
max_hw_sectors = min(queue_max_hw_sectors(b), max_bio_size >> 9);
- rcu_read_lock();
- dc = rcu_dereference(device->ldev->disk_conf);
- max_segments = dc->max_bio_bvecs;
- rcu_read_unlock();
+ max_segments = drbd_backing_dev_max_segments(device);
blk_set_stacking_limits(&q->limits);
}
blk_queue_max_hw_sectors(q, max_hw_sectors);
- /* This is the workaround for "bio would need to, but cannot, be split" */
- blk_queue_max_segments(q, max_segments ? max_segments : BLK_MAX_SEGMENTS);
+ blk_queue_max_segments(q, max_segments);
blk_queue_segment_boundary(q, PAGE_SIZE-1);
decide_on_discard_support(device, bdev);
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 13/16] drbd: merge drbd_setup_queue_param into drbd_reconsider_queue_parameters
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (11 preceding siblings ...)
2024-02-26 10:30 ` [PATCH 12/16] drbd: refactor the backing dev max_segments calculation Christoph Hellwig
@ 2024-02-26 10:30 ` Christoph Hellwig
2024-02-26 10:30 ` [PATCH 14/16] drbd: don't set max_write_zeroes_sectors in decide_on_discard_support Christoph Hellwig
` (2 subsequent siblings)
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:30 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
drbd_setup_queue_param is only called by drbd_reconsider_queue_parameters
and there is no really clear boundary of responsibilities between the
two.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/drbd/drbd_nl.c | 56 ++++++++++++++----------------------
1 file changed, 22 insertions(+), 34 deletions(-)
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 0326b7322ceb48..0f40fdee089971 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1309,45 +1309,16 @@ static unsigned int drbd_backing_dev_max_segments(struct drbd_device *device)
return max_segments;
}
-static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backing_dev *bdev,
- unsigned int max_bio_size, struct o_qlim *o)
-{
- struct request_queue * const q = device->rq_queue;
- unsigned int max_hw_sectors = max_bio_size >> 9;
- unsigned int max_segments = BLK_MAX_SEGMENTS;
- struct request_queue *b = NULL;
-
- if (bdev) {
- b = bdev->backing_bdev->bd_disk->queue;
-
- max_hw_sectors = min(queue_max_hw_sectors(b), max_bio_size >> 9);
- max_segments = drbd_backing_dev_max_segments(device);
-
- blk_set_stacking_limits(&q->limits);
- }
-
- blk_queue_max_hw_sectors(q, max_hw_sectors);
- blk_queue_max_segments(q, max_segments);
- blk_queue_segment_boundary(q, PAGE_SIZE-1);
- decide_on_discard_support(device, bdev);
-
- if (b) {
- blk_stack_limits(&q->limits, &b->limits, 0);
- disk_update_readahead(device->vdisk);
- }
- fixup_write_zeroes(device, q);
- fixup_discard_support(device, q);
-}
-
void drbd_reconsider_queue_parameters(struct drbd_device *device,
struct drbd_backing_dev *bdev, struct o_qlim *o)
{
- unsigned int now = queue_max_hw_sectors(device->rq_queue) <<
- SECTOR_SHIFT;
+ struct request_queue * const q = device->rq_queue;
+ unsigned int now = queue_max_hw_sectors(q) << 9;
+ struct request_queue *b = NULL;
unsigned int new;
if (bdev) {
- struct request_queue *b = bdev->backing_bdev->bd_disk->queue;
+ b = bdev->backing_bdev->bd_disk->queue;
device->local_max_bio_size =
queue_max_hw_sectors(b) << SECTOR_SHIFT;
@@ -1369,7 +1340,24 @@ void drbd_reconsider_queue_parameters(struct drbd_device *device,
drbd_info(device, "max BIO size = %u\n", new);
}
- drbd_setup_queue_param(device, bdev, new, o);
+ if (bdev) {
+ blk_set_stacking_limits(&q->limits);
+ blk_queue_max_segments(q,
+ drbd_backing_dev_max_segments(device));
+ } else {
+ blk_queue_max_segments(q, BLK_MAX_SEGMENTS);
+ }
+
+ blk_queue_max_hw_sectors(q, new >> SECTOR_SHIFT);
+ blk_queue_segment_boundary(q, PAGE_SIZE - 1);
+ decide_on_discard_support(device, bdev);
+
+ if (bdev) {
+ blk_stack_limits(&q->limits, &b->limits, 0);
+ disk_update_readahead(device->vdisk);
+ }
+ fixup_write_zeroes(device, q);
+ fixup_discard_support(device, q);
}
/* Starts the worker thread */
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 14/16] drbd: don't set max_write_zeroes_sectors in decide_on_discard_support
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (12 preceding siblings ...)
2024-02-26 10:30 ` [PATCH 13/16] drbd: merge drbd_setup_queue_param into drbd_reconsider_queue_parameters Christoph Hellwig
@ 2024-02-26 10:30 ` Christoph Hellwig
2024-02-26 10:30 ` [PATCH 15/16] drbd: split out a drbd_discard_supported helper Christoph Hellwig
2024-02-26 10:30 ` [PATCH 16/16] drbd: atomically update queue limits in drbd_reconsider_queue_parameters Christoph Hellwig
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:30 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
fixup_write_zeroes always overrides the max_write_zeroes_sectors value
a little further down the callchain, so don't bother to setup a limit
in decide_on_discard_support.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/drbd/drbd_nl.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 0f40fdee089971..a79b7fe5335de4 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1260,7 +1260,6 @@ static void decide_on_discard_support(struct drbd_device *device,
blk_queue_discard_granularity(q, 512);
max_discard_sectors = drbd_max_discard_sectors(connection);
blk_queue_max_discard_sectors(q, max_discard_sectors);
- blk_queue_max_write_zeroes_sectors(q, max_discard_sectors);
return;
not_supported:
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 15/16] drbd: split out a drbd_discard_supported helper
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (13 preceding siblings ...)
2024-02-26 10:30 ` [PATCH 14/16] drbd: don't set max_write_zeroes_sectors in decide_on_discard_support Christoph Hellwig
@ 2024-02-26 10:30 ` Christoph Hellwig
2024-02-26 10:30 ` [PATCH 16/16] drbd: atomically update queue limits in drbd_reconsider_queue_parameters Christoph Hellwig
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:30 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Add a helper to check if discard is supported for a given connection /
backing device combination.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/drbd/drbd_nl.c | 25 +++++++++++++++++--------
1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index a79b7fe5335de4..94ed2b3ea6361d 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1231,24 +1231,33 @@ static unsigned int drbd_max_discard_sectors(struct drbd_connection *connection)
return AL_EXTENT_SIZE >> 9;
}
-static void decide_on_discard_support(struct drbd_device *device,
+static bool drbd_discard_supported(struct drbd_connection *connection,
struct drbd_backing_dev *bdev)
{
- struct drbd_connection *connection =
- first_peer_device(device)->connection;
- struct request_queue *q = device->rq_queue;
- unsigned int max_discard_sectors;
-
if (bdev && !bdev_max_discard_sectors(bdev->backing_bdev))
- goto not_supported;
+ return false;
if (connection->cstate >= C_CONNECTED &&
!(connection->agreed_features & DRBD_FF_TRIM)) {
drbd_info(connection,
"peer DRBD too old, does not support TRIM: disabling discards\n");
- goto not_supported;
+ return false;
}
+ return true;
+}
+
+static void decide_on_discard_support(struct drbd_device *device,
+ struct drbd_backing_dev *bdev)
+{
+ struct drbd_connection *connection =
+ first_peer_device(device)->connection;
+ struct request_queue *q = device->rq_queue;
+ unsigned int max_discard_sectors;
+
+ if (!drbd_discard_supported(connection, bdev))
+ goto not_supported;
+
/*
* We don't care for the granularity, really.
*
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* [PATCH 16/16] drbd: atomically update queue limits in drbd_reconsider_queue_parameters
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
` (14 preceding siblings ...)
2024-02-26 10:30 ` [PATCH 15/16] drbd: split out a drbd_discard_supported helper Christoph Hellwig
@ 2024-02-26 10:30 ` Christoph Hellwig
15 siblings, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-26 10:30 UTC (permalink / raw)
To: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid
Switch drbd_reconsider_queue_parameters to set up the queue parameters
in an on-stack queue_limits structure and apply the atomically. Remove
various helpers that have become so trivial that they can be folded into
drbd_reconsider_queue_parameters.
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
drivers/block/drbd/drbd_nl.c | 119 ++++++++++++++---------------------
1 file changed, 46 insertions(+), 73 deletions(-)
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 94ed2b3ea6361d..fbd92803dc1da4 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1216,11 +1216,6 @@ static unsigned int drbd_max_peer_bio_size(struct drbd_device *device)
return DRBD_MAX_BIO_SIZE;
}
-static void blk_queue_discard_granularity(struct request_queue *q, unsigned int granularity)
-{
- q->limits.discard_granularity = granularity;
-}
-
static unsigned int drbd_max_discard_sectors(struct drbd_connection *connection)
{
/* when we introduced REQ_WRITE_SAME support, we also bumped
@@ -1247,62 +1242,6 @@ static bool drbd_discard_supported(struct drbd_connection *connection,
return true;
}
-static void decide_on_discard_support(struct drbd_device *device,
- struct drbd_backing_dev *bdev)
-{
- struct drbd_connection *connection =
- first_peer_device(device)->connection;
- struct request_queue *q = device->rq_queue;
- unsigned int max_discard_sectors;
-
- if (!drbd_discard_supported(connection, bdev))
- goto not_supported;
-
- /*
- * We don't care for the granularity, really.
- *
- * Stacking limits below should fix it for the local device. Whether or
- * not it is a suitable granularity on the remote device is not our
- * problem, really. If you care, you need to use devices with similar
- * topology on all peers.
- */
- blk_queue_discard_granularity(q, 512);
- max_discard_sectors = drbd_max_discard_sectors(connection);
- blk_queue_max_discard_sectors(q, max_discard_sectors);
- return;
-
-not_supported:
- blk_queue_discard_granularity(q, 0);
- blk_queue_max_discard_sectors(q, 0);
-}
-
-static void fixup_write_zeroes(struct drbd_device *device, struct request_queue *q)
-{
- /* Fixup max_write_zeroes_sectors after blk_stack_limits():
- * if we can handle "zeroes" efficiently on the protocol,
- * we want to do that, even if our backend does not announce
- * max_write_zeroes_sectors itself. */
- struct drbd_connection *connection = first_peer_device(device)->connection;
- /* If the peer announces WZEROES support, use it. Otherwise, rather
- * send explicit zeroes than rely on some discard-zeroes-data magic. */
- if (connection->agreed_features & DRBD_FF_WZEROES)
- q->limits.max_write_zeroes_sectors = DRBD_MAX_BBIO_SECTORS;
- else
- q->limits.max_write_zeroes_sectors = 0;
-}
-
-static void fixup_discard_support(struct drbd_device *device, struct request_queue *q)
-{
- unsigned int max_discard = device->rq_queue->limits.max_discard_sectors;
- unsigned int discard_granularity =
- device->rq_queue->limits.discard_granularity >> SECTOR_SHIFT;
-
- if (discard_granularity > max_discard) {
- blk_queue_discard_granularity(q, 0);
- blk_queue_max_discard_sectors(q, 0);
- }
-}
-
/* This is the workaround for "bio would need to, but cannot, be split" */
static unsigned int drbd_backing_dev_max_segments(struct drbd_device *device)
{
@@ -1320,8 +1259,11 @@ static unsigned int drbd_backing_dev_max_segments(struct drbd_device *device)
void drbd_reconsider_queue_parameters(struct drbd_device *device,
struct drbd_backing_dev *bdev, struct o_qlim *o)
{
+ struct drbd_connection *connection =
+ first_peer_device(device)->connection;
struct request_queue * const q = device->rq_queue;
unsigned int now = queue_max_hw_sectors(q) << 9;
+ struct queue_limits lim;
struct request_queue *b = NULL;
unsigned int new;
@@ -1348,24 +1290,55 @@ void drbd_reconsider_queue_parameters(struct drbd_device *device,
drbd_info(device, "max BIO size = %u\n", new);
}
+ lim = queue_limits_start_update(q);
if (bdev) {
- blk_set_stacking_limits(&q->limits);
- blk_queue_max_segments(q,
- drbd_backing_dev_max_segments(device));
+ blk_set_stacking_limits(&lim);
+ lim.max_segments = drbd_backing_dev_max_segments(device);
} else {
- blk_queue_max_segments(q, BLK_MAX_SEGMENTS);
+ lim.max_segments = BLK_MAX_SEGMENTS;
}
- blk_queue_max_hw_sectors(q, new >> SECTOR_SHIFT);
- blk_queue_segment_boundary(q, PAGE_SIZE - 1);
- decide_on_discard_support(device, bdev);
+ lim.max_hw_sectors = new >> SECTOR_SHIFT;
+ lim.seg_boundary_mask = PAGE_SIZE - 1;
- if (bdev) {
- blk_stack_limits(&q->limits, &b->limits, 0);
- disk_update_readahead(device->vdisk);
+ /*
+ * We don't care for the granularity, really.
+ *
+ * Stacking limits below should fix it for the local device. Whether or
+ * not it is a suitable granularity on the remote device is not our
+ * problem, really. If you care, you need to use devices with similar
+ * topology on all peers.
+ */
+ if (drbd_discard_supported(connection, bdev)) {
+ lim.discard_granularity = 512;
+ lim.max_hw_discard_sectors =
+ drbd_max_discard_sectors(connection);
+ } else {
+ lim.discard_granularity = 0;
+ lim.max_hw_discard_sectors = 0;
}
- fixup_write_zeroes(device, q);
- fixup_discard_support(device, q);
+
+ if (bdev)
+ blk_stack_limits(&lim, &b->limits, 0);
+
+ /*
+ * If we can handle "zeroes" efficiently on the protocol, we want to do
+ * that, even if our backend does not announce max_write_zeroes_sectors
+ * itself.
+ */
+ if (connection->agreed_features & DRBD_FF_WZEROES)
+ lim.max_write_zeroes_sectors = DRBD_MAX_BBIO_SECTORS;
+ else
+ lim.max_write_zeroes_sectors = 0;
+
+ if ((lim.discard_granularity >> SECTOR_SHIFT) >
+ lim.max_hw_discard_sectors) {
+ lim.discard_granularity = 0;
+ lim.max_hw_discard_sectors = 0;
+ }
+
+ if (queue_limits_commit_update(q, &lim))
+ drbd_err(device, "setting new queue limits failed\n");
}
/* Starts the worker thread */
--
2.39.2
^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: [PATCH 06/16] md/raid1: use the atomic queue limit update APIs
2024-02-26 10:29 ` [PATCH 06/16] md/raid1: " Christoph Hellwig
@ 2024-02-26 11:29 ` Yu Kuai
2024-02-27 15:26 ` Christoph Hellwig
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2024-02-26 11:29 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe, Mike Snitzer, Mikulas Patocka,
Song Liu, Philipp Reisner, Lars Ellenberg,
Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid, yukuai (C)
Hi,
在 2024/02/26 18:29, Christoph Hellwig 写道:
> Build the queue limits outside the queue and apply them using
> queue_limits_set. Also remove the bogus ->gendisk and ->queue NULL
> checks in the are while touching it.
The checking of mddev->gendisk can't be removed, because this is used to
distinguish dm-raid and md/raid. And the same for following patches.
Thanks,
Kuai
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/raid1.c | 24 ++++++++++--------------
> 1 file changed, 10 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 286f8b16c7bde7..752ff99736a636 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1791,10 +1791,9 @@ static int raid1_add_disk(struct mddev *mddev, struct md_rdev *rdev)
> for (mirror = first; mirror <= last; mirror++) {
> p = conf->mirrors + mirror;
> if (!p->rdev) {
> - if (mddev->gendisk)
> - disk_stack_limits(mddev->gendisk, rdev->bdev,
> - rdev->data_offset << 9);
> -
> + err = mddev_stack_new_rdev(mddev, rdev);
> + if (err)
> + return err;
> p->head_position = 0;
> rdev->raid_disk = mirror;
> err = 0;
> @@ -3089,9 +3088,9 @@ static struct r1conf *setup_conf(struct mddev *mddev)
> static void raid1_free(struct mddev *mddev, void *priv);
> static int raid1_run(struct mddev *mddev)
> {
> + struct queue_limits lim;
> struct r1conf *conf;
> int i;
> - struct md_rdev *rdev;
> int ret;
>
> if (mddev->level != 1) {
> @@ -3118,15 +3117,12 @@ static int raid1_run(struct mddev *mddev)
> if (IS_ERR(conf))
> return PTR_ERR(conf);
>
> - if (mddev->queue)
> - blk_queue_max_write_zeroes_sectors(mddev->queue, 0);
> -
> - rdev_for_each(rdev, mddev) {
> - if (!mddev->gendisk)
> - continue;
> - disk_stack_limits(mddev->gendisk, rdev->bdev,
> - rdev->data_offset << 9);
> - }
> + blk_set_stacking_limits(&lim);
> + lim.max_write_zeroes_sectors = 0;
> + mddev_stack_rdev_limits(mddev, &lim);
> + ret = queue_limits_set(mddev->queue, &lim);
> + if (ret)
> + goto abort;
>
> mddev->degraded = 0;
> for (i = 0; i < conf->raid_disks; i++)
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 04/16] md: add queue limit helpers
2024-02-26 10:29 ` [PATCH 04/16] md: add queue limit helpers Christoph Hellwig
@ 2024-02-26 11:38 ` Yu Kuai
2024-02-27 14:36 ` Christoph Hellwig
0 siblings, 1 reply; 24+ messages in thread
From: Yu Kuai @ 2024-02-26 11:38 UTC (permalink / raw)
To: Christoph Hellwig, Jens Axboe, Mike Snitzer, Mikulas Patocka,
Song Liu, Philipp Reisner, Lars Ellenberg,
Christoph Böhmwalder
Cc: drbd-dev, dm-devel, linux-block, linux-raid, yukuai (C)
Hi,
在 2024/02/26 18:29, Christoph Hellwig 写道:
> Add a few helpers that wrap the block queue limits API for use in MD.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/md/md.c | 37 +++++++++++++++++++++++++++++++++++++
> drivers/md/md.h | 3 +++
> 2 files changed, 40 insertions(+)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 75266c34b1f99b..23823823f80c6b 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5699,6 +5699,43 @@ static const struct kobj_type md_ktype = {
>
> int mdp_major = 0;
>
> +/* stack the limit for all rdevs into lim */
> +void mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim)
> +{
> + struct md_rdev *rdev;
> +
> + rdev_for_each(rdev, mddev) {
> + queue_limits_stack_bdev(lim, rdev->bdev, rdev->data_offset,
> + mddev->gendisk->disk_name);
> + }
> +}
> +EXPORT_SYMBOL_GPL(mddev_stack_rdev_limits);
> +
> +/* apply the extra stacking limits from a new rdev into mddev */
> +int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev)
> +{
> + struct queue_limits lim = queue_limits_start_update(mddev->queue);
> +
> + queue_limits_stack_bdev(&lim, rdev->bdev, rdev->data_offset,
> + mddev->gendisk->disk_name);
> + return queue_limits_commit_update(mddev->queue, &lim);
> +}
> +EXPORT_SYMBOL_GPL(mddev_stack_new_rdev);
> +
> +/* update the optimal I/O size after a reshape */
> +void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes)
> +{
> + struct queue_limits lim;
> + int ret;
> +
> + blk_mq_freeze_queue(mddev->queue);
> + lim = queue_limits_start_update(mddev->queue);
> + lim.io_opt = lim.io_min * nr_stripes;
> + ret = queue_limits_commit_update(mddev->queue, &lim);
> + blk_mq_unfreeze_queue(mddev->queue);
Any reason to use blk_mq_freeze/unfreeze_queue ? I don't think this is
meaningful for raid, this only wait for IO submission, not IO done.
raid should already handle concurrent IO with reshape, so I think this
can just be removed.
Thanks,
Kuai
> +}
> +EXPORT_SYMBOL_GPL(mddev_update_io_opt);
> +
> static void mddev_delayed_delete(struct work_struct *ws)
> {
> struct mddev *mddev = container_of(ws, struct mddev, del_work);
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 8d881cc597992f..25b19614aa3239 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -860,6 +860,9 @@ void md_autostart_arrays(int part);
> int md_set_array_info(struct mddev *mddev, struct mdu_array_info_s *info);
> int md_add_new_disk(struct mddev *mddev, struct mdu_disk_info_s *info);
> int do_md_run(struct mddev *mddev);
> +void mddev_stack_rdev_limits(struct mddev *mddev, struct queue_limits *lim);
> +int mddev_stack_new_rdev(struct mddev *mddev, struct md_rdev *rdev);
> +void mddev_update_io_opt(struct mddev *mddev, unsigned int nr_stripes);
>
> extern const struct block_device_operations md_fops;
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 04/16] md: add queue limit helpers
2024-02-26 11:38 ` Yu Kuai
@ 2024-02-27 14:36 ` Christoph Hellwig
2024-02-28 1:38 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-27 14:36 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, Jens Axboe, Mike Snitzer, Mikulas Patocka,
Song Liu, Philipp Reisner, Lars Ellenberg,
Christoph Böhmwalder, drbd-dev, dm-devel, linux-block,
linux-raid, yukuai (C)
On Mon, Feb 26, 2024 at 07:38:17PM +0800, Yu Kuai wrote:
> Any reason to use blk_mq_freeze/unfreeze_queue ? I don't think this is
> meaningful for raid, this only wait for IO submission, not IO done.
>
> raid should already handle concurrent IO with reshape, so I think this
> can just be removed.
We can't just change limits under the driver if I/Os are being sumitted.
That is one of the points of the whole queue limits exercises.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 06/16] md/raid1: use the atomic queue limit update APIs
2024-02-26 11:29 ` Yu Kuai
@ 2024-02-27 15:26 ` Christoph Hellwig
2024-02-27 21:54 ` Song Liu
0 siblings, 1 reply; 24+ messages in thread
From: Christoph Hellwig @ 2024-02-27 15:26 UTC (permalink / raw)
To: Yu Kuai
Cc: Christoph Hellwig, Jens Axboe, Mike Snitzer, Mikulas Patocka,
Song Liu, Philipp Reisner, Lars Ellenberg,
Christoph Böhmwalder, drbd-dev, dm-devel, linux-block,
linux-raid, yukuai (C)
On Mon, Feb 26, 2024 at 07:29:08PM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2024/02/26 18:29, Christoph Hellwig 写道:
>> Build the queue limits outside the queue and apply them using
>> queue_limits_set. Also remove the bogus ->gendisk and ->queue NULL
>> checks in the are while touching it.
>
> The checking of mddev->gendisk can't be removed, because this is used to
> distinguish dm-raid and md/raid. And the same for following patches.
Ah. Well, we should make that more obvious then. This is what I
currently have:
http://git.infradead.org/?p=users/hch/block.git;a=shortlog;h=refs/heads/md-blk-limits
particularly:
http://git.infradead.org/?p=users/hch/block.git;a=commitdiff;h=24b2fd15f57f06629d2254ebec480e1e28b96636
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 06/16] md/raid1: use the atomic queue limit update APIs
2024-02-27 15:26 ` Christoph Hellwig
@ 2024-02-27 21:54 ` Song Liu
2024-02-28 1:42 ` Yu Kuai
0 siblings, 1 reply; 24+ messages in thread
From: Song Liu @ 2024-02-27 21:54 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Yu Kuai, Jens Axboe, Mike Snitzer, Mikulas Patocka,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
drbd-dev, dm-devel, linux-block, linux-raid, yukuai (C)
On Tue, Feb 27, 2024 at 7:26 AM Christoph Hellwig <hch@lst.de> wrote:
>
> On Mon, Feb 26, 2024 at 07:29:08PM +0800, Yu Kuai wrote:
> > Hi,
> >
> > 在 2024/02/26 18:29, Christoph Hellwig 写道:
> >> Build the queue limits outside the queue and apply them using
> >> queue_limits_set. Also remove the bogus ->gendisk and ->queue NULL
> >> checks in the are while touching it.
> >
> > The checking of mddev->gendisk can't be removed, because this is used to
> > distinguish dm-raid and md/raid. And the same for following patches.
>
> Ah. Well, we should make that more obvious then. This is what I
> currently have:
>
> http://git.infradead.org/?p=users/hch/block.git;a=shortlog;h=refs/heads/md-blk-limits
>
> particularly:
>
> http://git.infradead.org/?p=users/hch/block.git;a=commitdiff;h=24b2fd15f57f06629d2254ebec480e1e28b96636
Yes! I was thinking about something like mddev_is_dm() to make these
checks less confusing. Thanks!
Song
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 04/16] md: add queue limit helpers
2024-02-27 14:36 ` Christoph Hellwig
@ 2024-02-28 1:38 ` Yu Kuai
0 siblings, 0 replies; 24+ messages in thread
From: Yu Kuai @ 2024-02-28 1:38 UTC (permalink / raw)
To: Christoph Hellwig, Yu Kuai
Cc: Jens Axboe, Mike Snitzer, Mikulas Patocka, Song Liu,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
drbd-dev, dm-devel, linux-block, linux-raid, yukuai (C)
Hi,
在 2024/02/27 22:36, Christoph Hellwig 写道:
> On Mon, Feb 26, 2024 at 07:38:17PM +0800, Yu Kuai wrote:
>> Any reason to use blk_mq_freeze/unfreeze_queue ? I don't think this is
>> meaningful for raid, this only wait for IO submission, not IO done.
>>
>> raid should already handle concurrent IO with reshape, so I think this
>> can just be removed.
>
> We can't just change limits under the driver if I/Os are being sumitted.
> That is one of the points of the whole queue limits exercises.
>
Agree with this, it's just these apis can't gurantee this in raid, there
could still be IO inflight, perhaps you can use:
mddev_suspend(mddev)
...
mddev_resume(mddev)
Thanks,
Kuai
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 06/16] md/raid1: use the atomic queue limit update APIs
2024-02-27 21:54 ` Song Liu
@ 2024-02-28 1:42 ` Yu Kuai
0 siblings, 0 replies; 24+ messages in thread
From: Yu Kuai @ 2024-02-28 1:42 UTC (permalink / raw)
To: Song Liu, Christoph Hellwig
Cc: Yu Kuai, Jens Axboe, Mike Snitzer, Mikulas Patocka,
Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
drbd-dev, dm-devel, linux-block, linux-raid, yukuai (C)
Hi,
在 2024/02/28 5:54, Song Liu 写道:
> On Tue, Feb 27, 2024 at 7:26 AM Christoph Hellwig <hch@lst.de> wrote:
>>
>> On Mon, Feb 26, 2024 at 07:29:08PM +0800, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2024/02/26 18:29, Christoph Hellwig 写道:
>>>> Build the queue limits outside the queue and apply them using
>>>> queue_limits_set. Also remove the bogus ->gendisk and ->queue NULL
>>>> checks in the are while touching it.
>>>
>>> The checking of mddev->gendisk can't be removed, because this is used to
>>> distinguish dm-raid and md/raid. And the same for following patches.
>>
>> Ah. Well, we should make that more obvious then. This is what I
>> currently have:
>>
>> http://git.infradead.org/?p=users/hch/block.git;a=shortlog;h=refs/heads/md-blk-limits
>>
>> particularly:
>>
>> http://git.infradead.org/?p=users/hch/block.git;a=commitdiff;h=24b2fd15f57f06629d2254ebec480e1e28b96636
>
> Yes! I was thinking about something like mddev_is_dm() to make these
> checks less confusing. Thanks!
Yes, this looks good.
Thanks,
Kuai
>
> Song
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2024-02-28 1:42 UTC | newest]
Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-26 10:29 atomic queue limit updates for stackable devices v2 Christoph Hellwig
2024-02-26 10:29 ` [PATCH 01/16] block: add a queue_limits_set helper Christoph Hellwig
2024-02-26 10:29 ` [PATCH 02/16] block: add a queue_limits_stack_bdev helper Christoph Hellwig
2024-02-26 10:29 ` [PATCH 03/16] dm: use queue_limits_set Christoph Hellwig
2024-02-26 10:29 ` [PATCH 04/16] md: add queue limit helpers Christoph Hellwig
2024-02-26 11:38 ` Yu Kuai
2024-02-27 14:36 ` Christoph Hellwig
2024-02-28 1:38 ` Yu Kuai
2024-02-26 10:29 ` [PATCH 05/16] md/raid0: use the atomic queue limit update APIs Christoph Hellwig
2024-02-26 10:29 ` [PATCH 06/16] md/raid1: " Christoph Hellwig
2024-02-26 11:29 ` Yu Kuai
2024-02-27 15:26 ` Christoph Hellwig
2024-02-27 21:54 ` Song Liu
2024-02-28 1:42 ` Yu Kuai
2024-02-26 10:29 ` [PATCH 07/16] md/raid10: " Christoph Hellwig
2024-02-26 10:29 ` [PATCH 08/16] md/raid5: " Christoph Hellwig
2024-02-26 10:29 ` [PATCH 09/16] block: remove disk_stack_limits Christoph Hellwig
2024-02-26 10:29 ` [PATCH 10/16] drbd: pass the max_hw_sectors limit to blk_alloc_disk Christoph Hellwig
2024-02-26 10:29 ` [PATCH 11/16] drbd: refactor drbd_reconsider_queue_parameters Christoph Hellwig
2024-02-26 10:30 ` [PATCH 12/16] drbd: refactor the backing dev max_segments calculation Christoph Hellwig
2024-02-26 10:30 ` [PATCH 13/16] drbd: merge drbd_setup_queue_param into drbd_reconsider_queue_parameters Christoph Hellwig
2024-02-26 10:30 ` [PATCH 14/16] drbd: don't set max_write_zeroes_sectors in decide_on_discard_support Christoph Hellwig
2024-02-26 10:30 ` [PATCH 15/16] drbd: split out a drbd_discard_supported helper Christoph Hellwig
2024-02-26 10:30 ` [PATCH 16/16] drbd: atomically update queue limits in drbd_reconsider_queue_parameters Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).