atomic queue limits updates

linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* atomic queue limits updates
@ 2024-01-22 17:36 Christoph Hellwig
  2024-01-22 17:36 ` [PATCH 01/15] block: move max_{open,active}_zones to struct queue_limits Christoph Hellwig
                   ` (14 more replies)
  0 siblings, 15 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Hi Jens,

currently queue limits updates are a mess in that they are updated one
limit at a time, which makes both cross-checking them against other
limits hard, and also makes it hard to provide atomicy.

This series tries to change this by updating the whole set of queue
limits atomically.   This in done in two ways:

 - for the initial setup the queue_limits structure is simply passed to
   the queue/disk allocation helpers and applies there after validation.
 - for the (relatively few) cases that update limits at runtime a pair
   of helpers to take a snapshot of the current limits and to commit it
   after picking up the callers changes are provided.

As the series is big enough it only converts two drivers - virtio_blk as
a heavily used driver in virtualized setups, and loop as one that actually
does runtime updates while being fairly simple.  I plan to update most
drivers for this merge window, although SCSI will probably have to wait
for the next one given that it will need extensive API changes in the
LLDD and ULD interfaces.

Diffstat:
 arch/um/drivers/ubd_kern.c          |    2 
 block/blk-core.c                    |   29 ++-
 block/blk-mq.c                      |   26 +--
 block/blk-settings.c                |  172 +++++++++++++++++++-
 block/blk-sysfs.c                   |   59 +++----
 block/blk.h                         |    3 
 block/bsg-lib.c                     |    2 
 block/genhd.c                       |    4 
 drivers/block/amiflop.c             |    2 
 drivers/block/aoe/aoeblk.c          |    2 
 drivers/block/ataflop.c             |    2 
 drivers/block/floppy.c              |    2 
 drivers/block/loop.c                |   75 ++++-----
 drivers/block/mtip32xx/mtip32xx.c   |    2 
 drivers/block/nbd.c                 |    2 
 drivers/block/null_blk/main.c       |    2 
 drivers/block/ps3disk.c             |    2 
 drivers/block/rbd.c                 |    2 
 drivers/block/rnbd/rnbd-clt.c       |    2 
 drivers/block/sunvdc.c              |    2 
 drivers/block/swim.c                |    2 
 drivers/block/swim3.c               |    2 
 drivers/block/ublk_drv.c            |    2 
 drivers/block/virtio_blk.c          |  299 ++++++++++++++++++------------------
 drivers/block/xen-blkfront.c        |    2 
 drivers/block/z2ram.c               |    2 
 drivers/cdrom/gdrom.c               |    2 
 drivers/memstick/core/ms_block.c    |    2 
 drivers/memstick/core/mspro_block.c |    2 
 drivers/mmc/core/queue.c            |    2 
 drivers/mtd/mtd_blkdevs.c           |    2 
 drivers/mtd/ubi/block.c             |    2 
 drivers/nvme/host/apple.c           |    2 
 drivers/nvme/host/core.c            |   18 --
 drivers/s390/block/dasd_genhd.c     |    2 
 drivers/s390/block/scm_blk.c        |    2 
 drivers/scsi/scsi_scan.c            |    2 
 drivers/ufs/core/ufshcd.c           |    2 
 include/linux/blk-mq.h              |   10 -
 include/linux/blkdev.h              |   36 +++-
 40 files changed, 484 insertions(+), 305 deletions(-)

^ permalink raw reply	[flat|nested] 65+ messages in thread

* [PATCH 01/15] block: move max_{open,active}_zones to struct queue_limits
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  4:39   ` Damien Le Moal
  2024-01-24  6:00   ` Hannes Reinecke
  2024-01-22 17:36 ` [PATCH 02/15] block: refactor disk_update_readahead Christoph Hellwig
                   ` (13 subsequent siblings)
  14 siblings, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

The maximum number of open and active zones is a limit on the queue
and should be places there so that we can including it in the upcoming
queue limits batch update API.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 include/linux/blkdev.h | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 99e4f5e722132c..4a2e82c7971c86 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -189,8 +189,6 @@ struct gendisk {
 	 * blk_mq_unfreeze_queue().
 	 */
 	unsigned int		nr_zones;
-	unsigned int		max_open_zones;
-	unsigned int		max_active_zones;
 	unsigned long		*conv_zones_bitmap;
 	unsigned long		*seq_zones_wlock;
 #endif /* CONFIG_BLK_DEV_ZONED */
@@ -307,6 +305,8 @@ struct queue_limits {
 	unsigned char		discard_misaligned;
 	unsigned char		raid_partial_stripes_expensive;
 	bool			zoned;
+	unsigned int		max_open_zones;
+	unsigned int		max_active_zones;
 
 	/*
 	 * Drivers that set dma_alignment to less than 511 must be prepared to
@@ -639,23 +639,23 @@ static inline bool disk_zone_is_seq(struct gendisk *disk, sector_t sector)
 static inline void disk_set_max_open_zones(struct gendisk *disk,
 		unsigned int max_open_zones)
 {
-	disk->max_open_zones = max_open_zones;
+	disk->queue->limits.max_open_zones = max_open_zones;
 }
 
 static inline void disk_set_max_active_zones(struct gendisk *disk,
 		unsigned int max_active_zones)
 {
-	disk->max_active_zones = max_active_zones;
+	disk->queue->limits.max_active_zones = max_active_zones;
 }
 
 static inline unsigned int bdev_max_open_zones(struct block_device *bdev)
 {
-	return bdev->bd_disk->max_open_zones;
+	return bdev->bd_disk->queue->limits.max_open_zones;
 }
 
 static inline unsigned int bdev_max_active_zones(struct block_device *bdev)
 {
-	return bdev->bd_disk->max_active_zones;
+	return bdev->bd_disk->queue->limits.max_active_zones;
 }
 
 #else /* CONFIG_BLK_DEV_ZONED */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 02/15] block: refactor disk_update_readahead
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
  2024-01-22 17:36 ` [PATCH 01/15] block: move max_{open,active}_zones to struct queue_limits Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  4:41   ` Damien Le Moal
  2024-01-24  6:01   ` Hannes Reinecke
  2024-01-22 17:36 ` [PATCH 03/15] block: add an API to atomically update queue limits Christoph Hellwig
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Factor out a blk_apply_bdi_limits limits helper that can be used with
an explicit queue_limits argument, which will be useful later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-settings.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 06ea91e51b8b2e..e872b0e168525e 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -85,6 +85,17 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 }
 EXPORT_SYMBOL(blk_set_stacking_limits);
 
+static void blk_apply_bdi_limits(struct backing_dev_info *bdi,
+		struct queue_limits *lim)
+{
+	/*
+	 * For read-ahead of large files to be effective, we need to read ahead
+	 * at least twice the optimal I/O size.
+	 */
+	bdi->ra_pages = max(lim->io_opt * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
+	bdi->io_pages = lim->max_sectors >> (PAGE_SHIFT - 9);
+}
+
 /**
  * blk_queue_bounce_limit - set bounce buffer limit for queue
  * @q: the request queue for the device
@@ -393,15 +404,7 @@ EXPORT_SYMBOL(blk_queue_alignment_offset);
 
 void disk_update_readahead(struct gendisk *disk)
 {
-	struct request_queue *q = disk->queue;
-
-	/*
-	 * For read-ahead of large files to be effective, we need to read ahead
-	 * at least twice the optimal I/O size.
-	 */
-	disk->bdi->ra_pages =
-		max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
-	disk->bdi->io_pages = queue_max_sectors(q) >> (PAGE_SHIFT - 9);
+	blk_apply_bdi_limits(disk->bdi, &disk->queue->limits);
 }
 EXPORT_SYMBOL_GPL(disk_update_readahead);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 03/15] block: add an API to atomically update queue limits
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
  2024-01-22 17:36 ` [PATCH 01/15] block: move max_{open,active}_zones to struct queue_limits Christoph Hellwig
  2024-01-22 17:36 ` [PATCH 02/15] block: refactor disk_update_readahead Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  4:50   ` Damien Le Moal
                     ` (2 more replies)
  2024-01-22 17:36 ` [PATCH 04/15] block: use queue_limits_commit_update in queue_max_sectors_store Christoph Hellwig
                   ` (11 subsequent siblings)
  14 siblings, 3 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Add a new queue_limits_{start,commit}_update pair of functions that
allows taking an atomic snapshot of queue limits, update it, and
commit it if it passes validity checking.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c       |   1 +
 block/blk-settings.c   | 140 +++++++++++++++++++++++++++++++++++++++++
 block/blk.h            |   1 +
 include/linux/blkdev.h |  23 +++++++
 4 files changed, 165 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 11342af420d0c4..09f4a44a4aa3cc 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -424,6 +424,7 @@ struct request_queue *blk_alloc_queue(int node_id)
 	mutex_init(&q->debugfs_mutex);
 	mutex_init(&q->sysfs_lock);
 	mutex_init(&q->sysfs_dir_lock);
+	mutex_init(&q->limits_lock);
 	mutex_init(&q->rq_qos_mutex);
 	spin_lock_init(&q->queue_lock);
 
diff --git a/block/blk-settings.c b/block/blk-settings.c
index e872b0e168525e..b6692143ccf034 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -96,6 +96,146 @@ static void blk_apply_bdi_limits(struct backing_dev_info *bdi,
 	bdi->io_pages = lim->max_sectors >> (PAGE_SHIFT - 9);
 }
 
+static int blk_validate_zoned_limits(struct queue_limits *lim)
+{
+	if (!lim->zoned) {
+		if (WARN_ON_ONCE(lim->max_open_zones) ||
+		    WARN_ON_ONCE(lim->max_active_zones) ||
+		    WARN_ON_ONCE(lim->zone_write_granularity) ||
+		    WARN_ON_ONCE(lim->max_zone_append_sectors))
+			return -EINVAL;
+		return 0;
+	}
+
+	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_BLK_DEV_ZONED)))
+		return -EINVAL;
+
+	if (lim->zone_write_granularity < lim->logical_block_size)
+		lim->zone_write_granularity = lim->logical_block_size;
+
+	if (lim->max_zone_append_sectors) {
+		if (WARN_ON_ONCE(!lim->chunk_sectors))
+			return -EINVAL;
+		lim->max_zone_append_sectors =
+			min3(lim->max_hw_sectors,
+			     lim->max_zone_append_sectors,
+			     lim->chunk_sectors);
+	}
+
+	return 0;
+}
+
+int blk_validate_limits(struct queue_limits *lim)
+{
+	unsigned int max_hw_sectors;
+
+	if (!lim->logical_block_size)
+		lim->logical_block_size = SECTOR_SIZE;
+
+	if (!lim->physical_block_size)
+		lim->physical_block_size = SECTOR_SIZE;
+	if (lim->physical_block_size < lim->logical_block_size)
+		lim->physical_block_size = lim->physical_block_size;
+
+	if (!lim->io_min)
+		lim->io_min = SECTOR_SIZE;
+	if (lim->io_min < lim->physical_block_size)
+		lim->io_min = lim->physical_block_size;
+
+	if (!lim->max_hw_sectors)
+		lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
+	if (WARN_ON_ONCE(lim->max_hw_sectors < PAGE_SIZE / SECTOR_SIZE))
+		return -EINVAL;
+
+	lim->max_hw_sectors = round_down(lim->max_hw_sectors,
+			lim->logical_block_size >> SECTOR_SHIFT);
+
+	max_hw_sectors = min_not_zero(lim->max_hw_sectors,
+				lim->max_dev_sectors);
+	if (lim->max_user_sectors) {
+		if (lim->max_user_sectors > max_hw_sectors ||
+		    lim->max_user_sectors < PAGE_SIZE / SECTOR_SIZE)
+			return -EINVAL;
+		lim->max_sectors = min(max_hw_sectors, lim->max_user_sectors);
+	} else {
+		lim->max_sectors = min(max_hw_sectors, BLK_DEF_MAX_SECTORS_CAP);
+	}
+
+	lim->max_sectors = round_down(lim->max_sectors,
+			lim->logical_block_size >> SECTOR_SHIFT);
+
+	if (!lim->max_segments)
+		lim->max_segments = BLK_MAX_SEGMENTS;
+
+	lim->max_discard_sectors = lim->max_hw_discard_sectors;
+	if (!lim->max_discard_segments)
+		lim->max_discard_segments = 1;
+
+	if (lim->discard_granularity < lim->physical_block_size)
+		lim->discard_granularity = lim->physical_block_size;
+
+	if (!lim->seg_boundary_mask)
+		lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
+	if (WARN_ON_ONCE(lim->seg_boundary_mask < PAGE_SIZE - 1))
+		return -EINVAL;
+
+	if (!lim->max_segment_size)
+		lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
+	if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE))
+		return -EINVAL;
+
+	/*
+	 * Devices that require a virtual boundary do not support scatter/gather
+	 * I/O natively, but instead require a descriptor list entry for each
+	 * page (which might not be idential to the Linux PAGE_SIZE).  Because
+	 * of that they are not limited by our notion of "segment size".
+	 */
+	if (lim->virt_boundary_mask) {
+		if (WARN_ON_ONCE(lim->max_segment_size &&
+				 lim->max_segment_size != UINT_MAX))
+			return -EINVAL;
+		lim->max_segment_size = UINT_MAX;
+	}
+
+	if (!lim->dma_alignment)
+		lim->dma_alignment = SECTOR_SIZE - 1;
+	if (WARN_ON_ONCE(lim->dma_alignment > PAGE_SIZE))
+		return -EINVAL;
+
+	if (lim->alignment_offset) {
+		lim->alignment_offset &= (lim->physical_block_size - 1);
+		lim->misaligned = 0;
+	}
+
+	return blk_validate_zoned_limits(lim);
+}
+
+/**
+ * queue_limits_commit_update - commit an atomic update of queue limits
+ * @q:		queue to update
+ * @lim:	limits to apply
+ *
+ * Apply the limits in @lim that were obtained from queue_limits_start_update()
+ * and updated by the caller to @q.
+ *
+ * Returns 0 if successful, else a negative error code.
+ */
+int queue_limits_commit_update(struct request_queue *q,
+		struct queue_limits *lim)
+	__releases(q->limits_lock)
+{
+	int error = blk_validate_limits(lim);
+
+	if (!error) {
+		q->limits = *lim;
+		if (q->disk)
+			blk_apply_bdi_limits(q->disk->bdi, lim);
+	}
+	mutex_unlock(&q->limits_lock);
+	return error;
+}
+EXPORT_SYMBOL_GPL(queue_limits_commit_update);
+
 /**
  * blk_queue_bounce_limit - set bounce buffer limit for queue
  * @q: the request queue for the device
diff --git a/block/blk.h b/block/blk.h
index 1ef920f72e0f87..58b5dbac2a487d 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -447,6 +447,7 @@ static inline void bio_release_page(struct bio *bio, struct page *page)
 		unpin_user_page(page);
 }
 
+int blk_validate_limits(struct queue_limits *lim);
 struct request_queue *blk_alloc_queue(int node_id);
 
 int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4a2e82c7971c86..5b5d3b238de1e7 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -473,6 +473,7 @@ struct request_queue {
 
 	struct mutex		sysfs_lock;
 	struct mutex		sysfs_dir_lock;
+	struct mutex		limits_lock;
 
 	/*
 	 * for reusing dead hctx instance in case of updating
@@ -861,6 +862,28 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset,
 	return chunk_sectors - (offset & (chunk_sectors - 1));
 }
 
+/**
+ * queue_limits_start_update - start an atomic update of queue limits
+ * @q:		queue to update
+ *
+ * This functions starts an atomic update of the queue limits.  It takes a lock
+ * to prevent other updates and returns a snapshot of the current limits that
+ * the caller can modify.  The caller must call queue_limits_commit_update()
+ * to finish the update.
+ *
+ * Context: process context.  The caller must have frozen the queue or ensured
+ * that there is outstanding I/O by other means.
+ */
+static inline struct queue_limits
+queue_limits_start_update(struct request_queue *q)
+	__acquires(q->limits_lock)
+{
+	mutex_lock(&q->limits_lock);
+	return q->limits;
+}
+int queue_limits_commit_update(struct request_queue *q,
+		struct queue_limits *lim);
+
 /*
  * Access functions for manipulating queue properties
  */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 04/15] block: use queue_limits_commit_update in queue_max_sectors_store
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 03/15] block: add an API to atomically update queue limits Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:07   ` Damien Le Moal
  2024-01-24  6:09   ` Hannes Reinecke
  2024-01-22 17:36 ` [PATCH 05/15] block: add a max_user_discard_sectors queue limit Christoph Hellwig
                   ` (10 subsequent siblings)
  14 siblings, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Convert queue_max_sectors_store to use queue_limits_commit_update to
check and updated the max_sectors limit and freeze the queue before
doing so to ensure we don't have request in flight while changing
the limits.

Note that this removes the previously held queue_lock that doesn't
protect against any other read or writer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-sysfs.c | 37 ++++++++++++-------------------------
 1 file changed, 12 insertions(+), 25 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 6b2429cad81af1..26607f9825cb05 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -226,35 +226,22 @@ static ssize_t queue_zone_append_max_show(struct request_queue *q, char *page)
 static ssize_t
 queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
 {
-	unsigned long var;
-	unsigned int max_sectors_kb,
-		max_hw_sectors_kb = queue_max_hw_sectors(q) >> 1,
-			page_kb = 1 << (PAGE_SHIFT - 10);
-	ssize_t ret = queue_var_store(&var, page, count);
+	unsigned long max_sectors_kb;
+	struct queue_limits lim;
+	ssize_t ret;
+	int err;
 
+	ret = queue_var_store(&max_sectors_kb, page, count);
 	if (ret < 0)
 		return ret;
 
-	max_sectors_kb = (unsigned int)var;
-	max_hw_sectors_kb = min_not_zero(max_hw_sectors_kb,
-					 q->limits.max_dev_sectors >> 1);
-	if (max_sectors_kb == 0) {
-		q->limits.max_user_sectors = 0;
-		max_sectors_kb = min(max_hw_sectors_kb,
-				     BLK_DEF_MAX_SECTORS_CAP >> 1);
-	} else {
-		if (max_sectors_kb > max_hw_sectors_kb ||
-		    max_sectors_kb < page_kb)
-			return -EINVAL;
-		q->limits.max_user_sectors = max_sectors_kb << 1;
-	}
-
-	spin_lock_irq(&q->queue_lock);
-	q->limits.max_sectors = max_sectors_kb << 1;
-	if (q->disk)
-		q->disk->bdi->io_pages = max_sectors_kb >> (PAGE_SHIFT - 10);
-	spin_unlock_irq(&q->queue_lock);
-
+	blk_mq_freeze_queue(q);
+	lim = queue_limits_start_update(q);
+	lim.max_user_sectors = max_sectors_kb << 1;
+	err = queue_limits_commit_update(q, &lim);
+	blk_mq_unfreeze_queue(q);
+	if (err)
+		return err;
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 05/15] block: add a max_user_discard_sectors queue limit
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 04/15] block: use queue_limits_commit_update in queue_max_sectors_store Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-22 18:27   ` Keith Busch
  2024-01-24  6:10   ` Hannes Reinecke
  2024-01-22 17:36 ` [PATCH 06/15] nvme: remove the hack to not update the discard limits in nvme_config_discard Christoph Hellwig
                   ` (9 subsequent siblings)
  14 siblings, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Add a new max_user_discard_sectors limit that mirrors max_user_sectors
and stores the value that the user manually set.  This now allows
updates of the max_hw_discard_sectors to not worry about the user
limit.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-settings.c   | 13 ++++++++++---
 block/blk-sysfs.c      | 18 +++++++++---------
 include/linux/blkdev.h |  1 +
 3 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b6692143ccf034..f58da0810a00f2 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -47,6 +47,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->max_zone_append_sectors = 0;
 	lim->max_discard_sectors = 0;
 	lim->max_hw_discard_sectors = 0;
+	lim->max_user_discard_sectors = 0;
 	lim->max_secure_erase_sectors = 0;
 	lim->discard_granularity = 512;
 	lim->discard_alignment = 0;
@@ -167,7 +168,10 @@ int blk_validate_limits(struct queue_limits *lim)
 	if (!lim->max_segments)
 		lim->max_segments = BLK_MAX_SEGMENTS;
 
-	lim->max_discard_sectors = lim->max_hw_discard_sectors;
+	lim->max_discard_sectors =
+		min_not_zero(lim->max_hw_discard_sectors,
+			lim->max_user_discard_sectors);
+
 	if (!lim->max_discard_segments)
 		lim->max_discard_segments = 1;
 
@@ -328,8 +332,11 @@ EXPORT_SYMBOL(blk_queue_chunk_sectors);
 void blk_queue_max_discard_sectors(struct request_queue *q,
 		unsigned int max_discard_sectors)
 {
-	q->limits.max_hw_discard_sectors = max_discard_sectors;
-	q->limits.max_discard_sectors = max_discard_sectors;
+	struct queue_limits *lim = &q->limits;
+
+	lim->max_hw_discard_sectors = max_discard_sectors;
+	lim->max_discard_sectors = min_not_zero(max_discard_sectors,
+			lim->max_user_discard_sectors);
 }
 EXPORT_SYMBOL(blk_queue_max_discard_sectors);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 26607f9825cb05..54e10604ddb1dd 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -174,23 +174,23 @@ static ssize_t queue_discard_max_show(struct request_queue *q, char *page)
 static ssize_t queue_discard_max_store(struct request_queue *q,
 				       const char *page, size_t count)
 {
-	unsigned long max_discard;
-	ssize_t ret = queue_var_store(&max_discard, page, count);
+	unsigned long max_discard_bytes;
+	ssize_t ret;
 
+	ret = queue_var_store(&max_discard_bytes, page, count);
 	if (ret < 0)
 		return ret;
 
-	if (max_discard & (q->limits.discard_granularity - 1))
+	if (max_discard_bytes & (q->limits.discard_granularity - 1))
 		return -EINVAL;
 
-	max_discard >>= 9;
-	if (max_discard > UINT_MAX)
+	if ((max_discard_bytes >> SECTOR_SHIFT) > UINT_MAX)
 		return -EINVAL;
 
-	if (max_discard > q->limits.max_hw_discard_sectors)
-		max_discard = q->limits.max_hw_discard_sectors;
-
-	q->limits.max_discard_sectors = max_discard;
+	q->limits.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
+	q->limits.max_discard_sectors =
+		min_not_zero(q->limits.max_hw_discard_sectors,
+			     q->limits.max_user_discard_sectors);
 	return ret;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5b5d3b238de1e7..700ec5055b668d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -290,6 +290,7 @@ struct queue_limits {
 	unsigned int		io_opt;
 	unsigned int		max_discard_sectors;
 	unsigned int		max_hw_discard_sectors;
+	unsigned int		max_user_discard_sectors;
 	unsigned int		max_secure_erase_sectors;
 	unsigned int		max_write_zeroes_sectors;
 	unsigned int		max_zone_append_sectors;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 06/15] nvme: remove the hack to not update the discard limits in nvme_config_discard
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 05/15] block: add a max_user_discard_sectors queue limit Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:12   ` Damien Le Moal
  2024-01-24  6:11   ` Hannes Reinecke
  2024-01-22 17:36 ` [PATCH 07/15] block: use queue_limits_commit_update in queue_discard_max_store Christoph Hellwig
                   ` (8 subsequent siblings)
  14 siblings, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Now that the block layer tracks a separate user max discard limit, there
is no need to prevent nvme from updating it on controller capability
changes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/nvme/host/core.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 85ab0fcf9e8864..ef70268dccbc5a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1754,16 +1754,6 @@ static void nvme_config_discard(struct nvme_ctrl *ctrl, struct gendisk *disk,
 	BUILD_BUG_ON(PAGE_SIZE / sizeof(struct nvme_dsm_range) <
 			NVME_DSM_MAX_RANGES);
 
-	/*
-	 * If discard is already enabled, don't reset queue limits.
-	 *
-	 * This works around the fact that the block layer can't cope well with
-	 * updating the hardware limits when overridden through sysfs.  This is
-	 * harmless because discard limits in NVMe are purely advisory.
-	 */
-	if (queue->limits.max_discard_sectors)
-		return;
-
 	blk_queue_max_discard_sectors(queue, max_discard_sectors);
 	if (ctrl->dmrl)
 		blk_queue_max_discard_segments(queue, ctrl->dmrl);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 07/15] block: use queue_limits_commit_update in queue_discard_max_store
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 06/15] nvme: remove the hack to not update the discard limits in nvme_config_discard Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:16   ` Damien Le Moal
  2024-01-24  6:12   ` Hannes Reinecke
  2024-01-22 17:36 ` [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue Christoph Hellwig
                   ` (7 subsequent siblings)
  14 siblings, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Convert queue_discard_max_store to use queue_limits_commit_update to
check and updated the max_sectors limit and freeze the queue before
doing so to ensure we don't have request in flight while changing
the limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-sysfs.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 54e10604ddb1dd..8c8f69d8ba48ee 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -175,7 +175,9 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
 				       const char *page, size_t count)
 {
 	unsigned long max_discard_bytes;
+	struct queue_limits lim;
 	ssize_t ret;
+	int err;
 
 	ret = queue_var_store(&max_discard_bytes, page, count);
 	if (ret < 0)
@@ -187,10 +189,14 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
 	if ((max_discard_bytes >> SECTOR_SHIFT) > UINT_MAX)
 		return -EINVAL;
 
-	q->limits.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
-	q->limits.max_discard_sectors =
-		min_not_zero(q->limits.max_hw_discard_sectors,
-			     q->limits.max_user_discard_sectors);
+	blk_mq_freeze_queue(q);
+	lim = queue_limits_start_update(q);
+	lim.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
+	err = queue_limits_commit_update(q, &lim);
+	blk_mq_unfreeze_queue(q);
+
+	if (err)
+		return err;
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 07/15] block: use queue_limits_commit_update in queue_discard_max_store Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:17   ` Damien Le Moal
                     ` (2 more replies)
  2024-01-22 17:36 ` [PATCH 09/15] block: pass a queue_limits argument to blk_mq_init_queue Christoph Hellwig
                   ` (6 subsequent siblings)
  14 siblings, 3 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Pass a queue_limits to blk_alloc_queue and apply it if non-NULL.  This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c | 28 +++++++++++++++++++++-------
 block/blk-mq.c   |  6 +++---
 block/blk.h      |  2 +-
 block/genhd.c    |  4 ++--
 4 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 09f4a44a4aa3cc..9f1af8fba4dcd2 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -393,9 +393,10 @@ static void blk_timeout_work(struct work_struct *work)
 {
 }
 
-struct request_queue *blk_alloc_queue(int node_id)
+struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id)
 {
 	struct request_queue *q;
+	int error;
 
 	q = kmem_cache_alloc_node(blk_requestq_cachep, GFP_KERNEL | __GFP_ZERO,
 				  node_id);
@@ -404,13 +405,26 @@ struct request_queue *blk_alloc_queue(int node_id)
 
 	q->last_merge = NULL;
 
+	if (lim) {
+		error = blk_validate_limits(lim);
+		if (error)
+			goto fail_q;
+		q->limits = *lim;
+	} else {
+		blk_set_default_limits(&q->limits);
+	}
+
 	q->id = ida_alloc(&blk_queue_ida, GFP_KERNEL);
-	if (q->id < 0)
+	if (q->id < 0) {
+		error = q->id;
 		goto fail_q;
+	}
 
 	q->stats = blk_alloc_queue_stats();
-	if (!q->stats)
+	if (!q->stats) {
+		error = -ENOMEM;
 		goto fail_id;
+	}
 
 	q->node = node_id;
 
@@ -435,12 +449,12 @@ struct request_queue *blk_alloc_queue(int node_id)
 	 * Init percpu_ref in atomic mode so that it's faster to shutdown.
 	 * See blk_register_queue() for details.
 	 */
-	if (percpu_ref_init(&q->q_usage_counter,
+	error = percpu_ref_init(&q->q_usage_counter,
 				blk_queue_usage_counter_release,
-				PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
+				PERCPU_REF_INIT_ATOMIC, GFP_KERNEL);
+	if (error)
 		goto fail_stats;
 
-	blk_set_default_limits(&q->limits);
 	q->nr_requests = BLKDEV_DEFAULT_RQ;
 
 	return q;
@@ -451,7 +465,7 @@ struct request_queue *blk_alloc_queue(int node_id)
 	ida_free(&blk_queue_ida, q->id);
 fail_q:
 	kmem_cache_free(blk_requestq_cachep, q);
-	return NULL;
+	return ERR_PTR(error);
 }
 
 /**
diff --git a/block/blk-mq.c b/block/blk-mq.c
index aa87fcfda1ecfc..2ddbefdeae93e4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4092,9 +4092,9 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
 	struct request_queue *q;
 	int ret;
 
-	q = blk_alloc_queue(set->numa_node);
-	if (!q)
-		return ERR_PTR(-ENOMEM);
+	q = blk_alloc_queue(NULL, set->numa_node);
+	if (IS_ERR(q))
+		return q;
 	q->queuedata = queuedata;
 	ret = blk_mq_init_allocated_queue(set, q);
 	if (ret) {
diff --git a/block/blk.h b/block/blk.h
index 58b5dbac2a487d..100c7a02854bfd 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -448,7 +448,7 @@ static inline void bio_release_page(struct bio *bio, struct page *page)
 }
 
 int blk_validate_limits(struct queue_limits *lim);
-struct request_queue *blk_alloc_queue(int node_id);
+struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id);
 
 int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode);
 
diff --git a/block/genhd.c b/block/genhd.c
index d74fb5b4ae6818..defcd35b421bdd 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -1396,8 +1396,8 @@ struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass)
 	struct request_queue *q;
 	struct gendisk *disk;
 
-	q = blk_alloc_queue(node);
-	if (!q)
+	q = blk_alloc_queue(NULL, node);
+	if (IS_ERR(q))
 		return NULL;
 
 	disk = __alloc_disk_node(q, node, lkclass);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 09/15] block: pass a queue_limits argument to blk_mq_init_queue
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:19   ` Damien Le Moal
  2024-01-24  6:16   ` Hannes Reinecke
  2024-01-22 17:36 ` [PATCH 10/15] block: pass a queue_limits argument to blk_mq_alloc_disk Christoph Hellwig
                   ` (5 subsequent siblings)
  14 siblings, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Pass a queue_limits to blk_mq_init_queue and apply it if non-NULL.  This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.

Also rename the function to blk_mq_alloc_queue as that is a much better
name for a function that allocates a queue and always pass the queuedata
argument instead of having a separate version for the extra argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c            | 19 +++++++------------
 block/bsg-lib.c           |  2 +-
 drivers/nvme/host/apple.c |  2 +-
 drivers/nvme/host/core.c  |  6 +++---
 drivers/scsi/scsi_scan.c  |  2 +-
 drivers/ufs/core/ufshcd.c |  2 +-
 include/linux/blk-mq.h    |  3 ++-
 7 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2ddbefdeae93e4..87fbf4a6d1dbed 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4086,13 +4086,13 @@ void blk_mq_release(struct request_queue *q)
 	blk_mq_sysfs_deinit(q);
 }
 
-static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
-		void *queuedata)
+struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
+		struct queue_limits *lim, void *queuedata)
 {
 	struct request_queue *q;
 	int ret;
 
-	q = blk_alloc_queue(NULL, set->numa_node);
+	q = blk_alloc_queue(lim, set->numa_node);
 	if (IS_ERR(q))
 		return q;
 	q->queuedata = queuedata;
@@ -4103,20 +4103,15 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
 	}
 	return q;
 }
-
-struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *set)
-{
-	return blk_mq_init_queue_data(set, NULL);
-}
-EXPORT_SYMBOL(blk_mq_init_queue);
+EXPORT_SYMBOL(blk_mq_alloc_queue);
 
 /**
  * blk_mq_destroy_queue - shutdown a request queue
  * @q: request queue to shutdown
  *
- * This shuts down a request queue allocated by blk_mq_init_queue(). All future
+ * This shuts down a request queue allocated by blk_mq_alloc_queue(). All future
  * requests will be failed with -ENODEV. The caller is responsible for dropping
- * the reference from blk_mq_init_queue() by calling blk_put_queue().
+ * the reference from blk_mq_alloc_queue() by calling blk_put_queue().
  *
  * Context: can sleep
  */
@@ -4143,7 +4138,7 @@ struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
 	struct request_queue *q;
 	struct gendisk *disk;
 
-	q = blk_mq_init_queue_data(set, queuedata);
+	q = blk_mq_alloc_queue(set, NULL, queuedata);
 	if (IS_ERR(q))
 		return ERR_CAST(q);
 
diff --git a/block/bsg-lib.c b/block/bsg-lib.c
index b3acdbdb6e7ea8..bcc7dee6abced6 100644
--- a/block/bsg-lib.c
+++ b/block/bsg-lib.c
@@ -383,7 +383,7 @@ struct request_queue *bsg_setup_queue(struct device *dev, const char *name,
 	if (blk_mq_alloc_tag_set(set))
 		goto out_tag_set;
 
-	q = blk_mq_init_queue(set);
+	q = blk_mq_alloc_queue(set, NULL, NULL);
 	if (IS_ERR(q)) {
 		ret = PTR_ERR(q);
 		goto out_queue;
diff --git a/drivers/nvme/host/apple.c b/drivers/nvme/host/apple.c
index 596bb11eeba5a9..02d01614c729dd 100644
--- a/drivers/nvme/host/apple.c
+++ b/drivers/nvme/host/apple.c
@@ -1515,7 +1515,7 @@ static int apple_nvme_probe(struct platform_device *pdev)
 		goto put_dev;
 	}
 
-	anv->ctrl.admin_q = blk_mq_init_queue(&anv->admin_tagset);
+	anv->ctrl.admin_q = blk_mq_alloc_queue(&anv->admin_tagset, NULL, NULL);
 	if (IS_ERR(anv->ctrl.admin_q)) {
 		ret = -ENOMEM;
 		goto put_dev;
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ef70268dccbc5a..9ffff9e01a8336 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4303,14 +4303,14 @@ int nvme_alloc_admin_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 	if (ret)
 		return ret;
 
-	ctrl->admin_q = blk_mq_init_queue(set);
+	ctrl->admin_q = blk_mq_alloc_queue(set, NULL, NULL);
 	if (IS_ERR(ctrl->admin_q)) {
 		ret = PTR_ERR(ctrl->admin_q);
 		goto out_free_tagset;
 	}
 
 	if (ctrl->ops->flags & NVME_F_FABRICS) {
-		ctrl->fabrics_q = blk_mq_init_queue(set);
+		ctrl->fabrics_q = blk_mq_alloc_queue(set, NULL, NULL);
 		if (IS_ERR(ctrl->fabrics_q)) {
 			ret = PTR_ERR(ctrl->fabrics_q);
 			goto out_cleanup_admin_q;
@@ -4374,7 +4374,7 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 		return ret;
 
 	if (ctrl->ops->flags & NVME_F_FABRICS) {
-		ctrl->connect_q = blk_mq_init_queue(set);
+		ctrl->connect_q = blk_mq_alloc_queue(set, NULL, NULL);
         	if (IS_ERR(ctrl->connect_q)) {
 			ret = PTR_ERR(ctrl->connect_q);
 			goto out_free_tag_set;
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 44680f65ea1455..9969f4e2f1c3d9 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -332,7 +332,7 @@ static struct scsi_device *scsi_alloc_sdev(struct scsi_target *starget,
 
 	sdev->sg_reserved_size = INT_MAX;
 
-	q = blk_mq_init_queue(&sdev->host->tag_set);
+	q = blk_mq_alloc_queue(&sdev->host->tag_set, NULL, NULL);
 	if (IS_ERR(q)) {
 		/* release fn is set up in scsi_sysfs_device_initialise, so
 		 * have to free and put manually here */
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 029d017fc1b66b..c502a86db16b30 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -10592,7 +10592,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq)
 	err = blk_mq_alloc_tag_set(&hba->tmf_tag_set);
 	if (err < 0)
 		goto out_remove_scsi_host;
-	hba->tmf_queue = blk_mq_init_queue(&hba->tmf_tag_set);
+	hba->tmf_queue = blk_mq_alloc_queue(&hba->tmf_tag_set, NULL, NULL);
 	if (IS_ERR(hba->tmf_queue)) {
 		err = PTR_ERR(hba->tmf_queue);
 		goto free_tmf_tag_set;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7a8150a5f05133..7d42c359e2ab28 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -692,7 +692,8 @@ struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
 })
 struct gendisk *blk_mq_alloc_disk_for_queue(struct request_queue *q,
 		struct lock_class_key *lkclass);
-struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *);
+struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
+		struct queue_limits *lim, void *queuedata);
 int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		struct request_queue *q);
 void blk_mq_destroy_queue(struct request_queue *);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 10/15] block: pass a queue_limits argument to blk_mq_alloc_disk
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 09/15] block: pass a queue_limits argument to blk_mq_init_queue Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:22   ` Damien Le Moal
  2024-01-24  6:17   ` Hannes Reinecke
  2024-01-22 17:36 ` [PATCH 11/15] virtio_blk: split virtblk_probe Christoph Hellwig
                   ` (4 subsequent siblings)
  14 siblings, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL.  This
will allow allocating queues with valid queue limits instead of setting
the values one at a time later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/um/drivers/ubd_kern.c          | 2 +-
 block/blk-mq.c                      | 5 +++--
 drivers/block/amiflop.c             | 2 +-
 drivers/block/aoe/aoeblk.c          | 2 +-
 drivers/block/ataflop.c             | 2 +-
 drivers/block/floppy.c              | 2 +-
 drivers/block/loop.c                | 2 +-
 drivers/block/mtip32xx/mtip32xx.c   | 2 +-
 drivers/block/nbd.c                 | 2 +-
 drivers/block/null_blk/main.c       | 2 +-
 drivers/block/ps3disk.c             | 2 +-
 drivers/block/rbd.c                 | 2 +-
 drivers/block/rnbd/rnbd-clt.c       | 2 +-
 drivers/block/sunvdc.c              | 2 +-
 drivers/block/swim.c                | 2 +-
 drivers/block/swim3.c               | 2 +-
 drivers/block/ublk_drv.c            | 2 +-
 drivers/block/virtio_blk.c          | 2 +-
 drivers/block/xen-blkfront.c        | 2 +-
 drivers/block/z2ram.c               | 2 +-
 drivers/cdrom/gdrom.c               | 2 +-
 drivers/memstick/core/ms_block.c    | 2 +-
 drivers/memstick/core/mspro_block.c | 2 +-
 drivers/mmc/core/queue.c            | 2 +-
 drivers/mtd/mtd_blkdevs.c           | 2 +-
 drivers/mtd/ubi/block.c             | 2 +-
 drivers/nvme/host/core.c            | 2 +-
 drivers/s390/block/dasd_genhd.c     | 2 +-
 drivers/s390/block/scm_blk.c        | 2 +-
 include/linux/blk-mq.h              | 7 ++++---
 30 files changed, 35 insertions(+), 33 deletions(-)

diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index 92ee2697ff3984..25f1b18ce7d4e9 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -906,7 +906,7 @@ static int ubd_add(int n, char **error_out)
 	if (err)
 		goto out;
 
-	disk = blk_mq_alloc_disk(&ubd_dev->tag_set, ubd_dev);
+	disk = blk_mq_alloc_disk(&ubd_dev->tag_set, NULL, ubd_dev);
 	if (IS_ERR(disk)) {
 		err = PTR_ERR(disk);
 		goto out_cleanup_tags;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 87fbf4a6d1dbed..b103187e51cf77 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4132,13 +4132,14 @@ void blk_mq_destroy_queue(struct request_queue *q)
 }
 EXPORT_SYMBOL(blk_mq_destroy_queue);
 
-struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
+struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set,
+		struct queue_limits *lim, void *queuedata,
 		struct lock_class_key *lkclass)
 {
 	struct request_queue *q;
 	struct gendisk *disk;
 
-	q = blk_mq_alloc_queue(set, NULL, queuedata);
+	q = blk_mq_alloc_queue(set, lim, queuedata);
 	if (IS_ERR(q))
 		return ERR_CAST(q);
 
diff --git a/drivers/block/amiflop.c b/drivers/block/amiflop.c
index 2b98114a9fe092..a25414228e4741 100644
--- a/drivers/block/amiflop.c
+++ b/drivers/block/amiflop.c
@@ -1779,7 +1779,7 @@ static int fd_alloc_disk(int drive, int system)
 	struct gendisk *disk;
 	int err;
 
-	disk = blk_mq_alloc_disk(&unit[drive].tag_set, NULL);
+	disk = blk_mq_alloc_disk(&unit[drive].tag_set, NULL, NULL);
 	if (IS_ERR(disk))
 		return PTR_ERR(disk);
 
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index d2dbf8aaccb5b1..e346a013cc1662 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -370,7 +370,7 @@ aoeblk_gdalloc(void *vp)
 		goto err_mempool;
 	}
 
-	gd = blk_mq_alloc_disk(set, d);
+	gd = blk_mq_alloc_disk(set, NULL, d);
 	if (IS_ERR(gd)) {
 		pr_err("aoe: cannot allocate block queue for %ld.%d\n",
 			d->aoemajor, d->aoeminor);
diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
index 50949207798d2a..cacc4ba942a814 100644
--- a/drivers/block/ataflop.c
+++ b/drivers/block/ataflop.c
@@ -1994,7 +1994,7 @@ static int ataflop_alloc_disk(unsigned int drive, unsigned int type)
 {
 	struct gendisk *disk;
 
-	disk = blk_mq_alloc_disk(&unit[drive].tag_set, NULL);
+	disk = blk_mq_alloc_disk(&unit[drive].tag_set, NULL, NULL);
 	if (IS_ERR(disk))
 		return PTR_ERR(disk);
 
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index d0e41d52d6a9b5..6f765d221b3814 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4515,7 +4515,7 @@ static int floppy_alloc_disk(unsigned int drive, unsigned int type)
 {
 	struct gendisk *disk;
 
-	disk = blk_mq_alloc_disk(&tag_sets[drive], NULL);
+	disk = blk_mq_alloc_disk(&tag_sets[drive], NULL, NULL);
 	if (IS_ERR(disk))
 		return PTR_ERR(disk);
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index f8145499da38c8..3f855cc79c29f5 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -2025,7 +2025,7 @@ static int loop_add(int i)
 	if (err)
 		goto out_free_idr;
 
-	disk = lo->lo_disk = blk_mq_alloc_disk(&lo->tag_set, lo);
+	disk = lo->lo_disk = blk_mq_alloc_disk(&lo->tag_set, NULL, lo);
 	if (IS_ERR(disk)) {
 		err = PTR_ERR(disk);
 		goto out_cleanup_tags;
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index b200950e8fb5f9..ac08dea73552f4 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3431,7 +3431,7 @@ static int mtip_block_initialize(struct driver_data *dd)
 		goto block_queue_alloc_tag_error;
 	}
 
-	dd->disk = blk_mq_alloc_disk(&dd->tags, dd);
+	dd->disk = blk_mq_alloc_disk(&dd->tags, NULL, dd);
 	if (IS_ERR(dd->disk)) {
 		dev_err(&dd->pdev->dev,
 			"Unable to allocate request queue\n");
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 33a8f37bb6a1f5..30ae3cc12e7787 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1823,7 +1823,7 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs)
 	if (err < 0)
 		goto out_free_tags;
 
-	disk = blk_mq_alloc_disk(&nbd->tag_set, NULL);
+	disk = blk_mq_alloc_disk(&nbd->tag_set, NULL, NULL);
 	if (IS_ERR(disk)) {
 		err = PTR_ERR(disk);
 		goto out_free_idr;
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 36755f263e8ec0..499e0d03bffc0f 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -2136,7 +2136,7 @@ static int null_add_dev(struct nullb_device *dev)
 			goto out_cleanup_queues;
 
 		nullb->tag_set->timeout = 5 * HZ;
-		nullb->disk = blk_mq_alloc_disk(nullb->tag_set, nullb);
+		nullb->disk = blk_mq_alloc_disk(nullb->tag_set, NULL, nullb);
 		if (IS_ERR(nullb->disk)) {
 			rv = PTR_ERR(nullb->disk);
 			goto out_cleanup_tags;
diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 36d7b36c60c76b..dfd3860df4f880 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -431,7 +431,7 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
 	if (error)
 		goto fail_teardown;
 
-	gendisk = blk_mq_alloc_disk(&priv->tag_set, dev);
+	gendisk = blk_mq_alloc_disk(&priv->tag_set, NULL, dev);
 	if (IS_ERR(gendisk)) {
 		dev_err(&dev->sbd.core, "%s:%u: blk_mq_alloc_disk failed\n",
 			__func__, __LINE__);
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index a999b698b131f7..46c0a5cc563dd3 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4960,7 +4960,7 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
 	if (err)
 		return err;
 
-	disk = blk_mq_alloc_disk(&rbd_dev->tag_set, rbd_dev);
+	disk = blk_mq_alloc_disk(&rbd_dev->tag_set, NULL, rbd_dev);
 	if (IS_ERR(disk)) {
 		err = PTR_ERR(disk);
 		goto out_tag_set;
diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
index 4044c369d22a5f..d51be4f2df61a3 100644
--- a/drivers/block/rnbd/rnbd-clt.c
+++ b/drivers/block/rnbd/rnbd-clt.c
@@ -1408,7 +1408,7 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev *dev,
 	dev->size = le64_to_cpu(rsp->nsectors) *
 			le16_to_cpu(rsp->logical_block_size);
 
-	dev->gd = blk_mq_alloc_disk(&dev->sess->tag_set, dev);
+	dev->gd = blk_mq_alloc_disk(&dev->sess->tag_set, NULL, dev);
 	if (IS_ERR(dev->gd))
 		return PTR_ERR(dev->gd);
 	dev->queue = dev->gd->queue;
diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index 7bf4b48e2282e7..a1f74dd1eae5d5 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -824,7 +824,7 @@ static int probe_disk(struct vdc_port *port)
 	if (err)
 		return err;
 
-	g = blk_mq_alloc_disk(&port->tag_set, port);
+	g = blk_mq_alloc_disk(&port->tag_set, NULL, port);
 	if (IS_ERR(g)) {
 		printk(KERN_ERR PFX "%s: Could not allocate gendisk.\n",
 		       port->vio.name);
diff --git a/drivers/block/swim.c b/drivers/block/swim.c
index f85b6af414b431..16bdf62067d8b1 100644
--- a/drivers/block/swim.c
+++ b/drivers/block/swim.c
@@ -820,7 +820,7 @@ static int swim_floppy_init(struct swim_priv *swd)
 			goto exit_put_disks;
 
 		swd->unit[drive].disk =
-			blk_mq_alloc_disk(&swd->unit[drive].tag_set,
+			blk_mq_alloc_disk(&swd->unit[drive].tag_set, NULL,
 					  &swd->unit[drive]);
 		if (IS_ERR(swd->unit[drive].disk)) {
 			blk_mq_free_tag_set(&swd->unit[drive].tag_set);
diff --git a/drivers/block/swim3.c b/drivers/block/swim3.c
index c2bc85826358e9..a04756ac778ee8 100644
--- a/drivers/block/swim3.c
+++ b/drivers/block/swim3.c
@@ -1210,7 +1210,7 @@ static int swim3_attach(struct macio_dev *mdev,
 	if (rc)
 		goto out_unregister;
 
-	disk = blk_mq_alloc_disk(&fs->tag_set, fs);
+	disk = blk_mq_alloc_disk(&fs->tag_set, NULL, fs);
 	if (IS_ERR(disk)) {
 		rc = PTR_ERR(disk);
 		goto out_free_tag_set;
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 1dfb2e77898ba6..c5b6552707984b 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -2222,7 +2222,7 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
 		goto out_unlock;
 	}
 
-	disk = blk_mq_alloc_disk(&ub->tag_set, NULL);
+	disk = blk_mq_alloc_disk(&ub->tag_set, NULL, NULL);
 	if (IS_ERR(disk)) {
 		ret = PTR_ERR(disk);
 		goto out_unlock;
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 5bf98fd6a651a5..a23fce4eca4408 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1330,7 +1330,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 	if (err)
 		goto out_free_vq;
 
-	vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, vblk);
+	vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, NULL, vblk);
 	if (IS_ERR(vblk->disk)) {
 		err = PTR_ERR(vblk->disk);
 		goto out_free_tags;
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 434fab30677743..4cc2884e748463 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1136,7 +1136,7 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 	if (err)
 		goto out_release_minors;
 
-	gd = blk_mq_alloc_disk(&info->tag_set, info);
+	gd = blk_mq_alloc_disk(&info->tag_set, NULL, info);
 	if (IS_ERR(gd)) {
 		err = PTR_ERR(gd);
 		goto out_free_tag_set;
diff --git a/drivers/block/z2ram.c b/drivers/block/z2ram.c
index 11493167b0a848..7c5f4e4d9b5037 100644
--- a/drivers/block/z2ram.c
+++ b/drivers/block/z2ram.c
@@ -318,7 +318,7 @@ static int z2ram_register_disk(int minor)
 	struct gendisk *disk;
 	int err;
 
-	disk = blk_mq_alloc_disk(&tag_set, NULL);
+	disk = blk_mq_alloc_disk(&tag_set, NULL, NULL);
 	if (IS_ERR(disk))
 		return PTR_ERR(disk);
 
diff --git a/drivers/cdrom/gdrom.c b/drivers/cdrom/gdrom.c
index d668b174ace92f..1d044779f5e42a 100644
--- a/drivers/cdrom/gdrom.c
+++ b/drivers/cdrom/gdrom.c
@@ -778,7 +778,7 @@ static int probe_gdrom(struct platform_device *devptr)
 	if (err)
 		goto probe_fail_free_cd_info;
 
-	gd.disk = blk_mq_alloc_disk(&gd.tag_set, NULL);
+	gd.disk = blk_mq_alloc_disk(&gd.tag_set, NULL, NULL);
 	if (IS_ERR(gd.disk)) {
 		err = PTR_ERR(gd.disk);
 		goto probe_fail_free_tag_set;
diff --git a/drivers/memstick/core/ms_block.c b/drivers/memstick/core/ms_block.c
index 04115cd92433bf..d3277c901d16bb 100644
--- a/drivers/memstick/core/ms_block.c
+++ b/drivers/memstick/core/ms_block.c
@@ -2093,7 +2093,7 @@ static int msb_init_disk(struct memstick_dev *card)
 	if (rc)
 		goto out_release_id;
 
-	msb->disk = blk_mq_alloc_disk(&msb->tag_set, card);
+	msb->disk = blk_mq_alloc_disk(&msb->tag_set, NULL, card);
 	if (IS_ERR(msb->disk)) {
 		rc = PTR_ERR(msb->disk);
 		goto out_free_tag_set;
diff --git a/drivers/memstick/core/mspro_block.c b/drivers/memstick/core/mspro_block.c
index 5a69ed33999b4c..db0e2a42ca3c32 100644
--- a/drivers/memstick/core/mspro_block.c
+++ b/drivers/memstick/core/mspro_block.c
@@ -1138,7 +1138,7 @@ static int mspro_block_init_disk(struct memstick_dev *card)
 	if (rc)
 		goto out_release_id;
 
-	msb->disk = blk_mq_alloc_disk(&msb->tag_set, card);
+	msb->disk = blk_mq_alloc_disk(&msb->tag_set, NULL, card);
 	if (IS_ERR(msb->disk)) {
 		rc = PTR_ERR(msb->disk);
 		goto out_free_tag_set;
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index a0a2412f62a730..67ad186d132a69 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -447,7 +447,7 @@ struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
 		return ERR_PTR(ret);
 		
 
-	disk = blk_mq_alloc_disk(&mq->tag_set, mq);
+	disk = blk_mq_alloc_disk(&mq->tag_set, NULL, mq);
 	if (IS_ERR(disk)) {
 		blk_mq_free_tag_set(&mq->tag_set);
 		return disk;
diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index f0526dcc216276..b8878a2457afa7 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -333,7 +333,7 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
 		goto out_kfree_tag_set;
 
 	/* Create gendisk */
-	gd = blk_mq_alloc_disk(new->tag_set, new);
+	gd = blk_mq_alloc_disk(new->tag_set, NULL, new);
 	if (IS_ERR(gd)) {
 		ret = PTR_ERR(gd);
 		goto out_free_tag_set;
diff --git a/drivers/mtd/ubi/block.c b/drivers/mtd/ubi/block.c
index 654bd7372cd8c0..9be87c231a2eba 100644
--- a/drivers/mtd/ubi/block.c
+++ b/drivers/mtd/ubi/block.c
@@ -393,7 +393,7 @@ int ubiblock_create(struct ubi_volume_info *vi)
 
 
 	/* Initialize the gendisk of this ubiblock device */
-	gd = blk_mq_alloc_disk(&dev->tag_set, dev);
+	gd = blk_mq_alloc_disk(&dev->tag_set, NULL, dev);
 	if (IS_ERR(gd)) {
 		ret = PTR_ERR(gd);
 		goto out_free_tags;
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9ffff9e01a8336..d0e7cf6e5899f7 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3633,7 +3633,7 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 	if (!ns)
 		return;
 
-	disk = blk_mq_alloc_disk(ctrl->tagset, ns);
+	disk = blk_mq_alloc_disk(ctrl->tagset, NULL, ns);
 	if (IS_ERR(disk))
 		goto out_free_ns;
 	disk->fops = &nvme_bdev_ops;
diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
index 55e3abe94cde2f..4d5bf29215d5fc 100644
--- a/drivers/s390/block/dasd_genhd.c
+++ b/drivers/s390/block/dasd_genhd.c
@@ -58,7 +58,7 @@ int dasd_gendisk_alloc(struct dasd_block *block)
 	if (rc)
 		return rc;
 
-	gdp = blk_mq_alloc_disk(&block->tag_set, block);
+	gdp = blk_mq_alloc_disk(&block->tag_set, NULL, block);
 	if (IS_ERR(gdp)) {
 		blk_mq_free_tag_set(&block->tag_set);
 		return PTR_ERR(gdp);
diff --git a/drivers/s390/block/scm_blk.c b/drivers/s390/block/scm_blk.c
index ade95e91b3c8db..d05b2e2799a47a 100644
--- a/drivers/s390/block/scm_blk.c
+++ b/drivers/s390/block/scm_blk.c
@@ -462,7 +462,7 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct scm_device *scmdev)
 	if (ret)
 		goto out;
 
-	bdev->gendisk = blk_mq_alloc_disk(&bdev->tag_set, scmdev);
+	bdev->gendisk = blk_mq_alloc_disk(&bdev->tag_set, NULL, scmdev);
 	if (IS_ERR(bdev->gendisk)) {
 		ret = PTR_ERR(bdev->gendisk);
 		goto out_tag;
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 7d42c359e2ab28..390d35fa003295 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -682,13 +682,14 @@ enum {
 
 #define BLK_MQ_NO_HCTX_IDX	(-1U)
 
-struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata,
+struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set,
+		struct queue_limits *lim, void *queuedata,
 		struct lock_class_key *lkclass);
-#define blk_mq_alloc_disk(set, queuedata)				\
+#define blk_mq_alloc_disk(set, lim, queuedata)				\
 ({									\
 	static struct lock_class_key __key;				\
 									\
-	__blk_mq_alloc_disk(set, queuedata, &__key);			\
+	__blk_mq_alloc_disk(set, lim, queuedata, &__key);		\
 })
 struct gendisk *blk_mq_alloc_disk_for_queue(struct request_queue *q,
 		struct lock_class_key *lkclass);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 11/15] virtio_blk: split virtblk_probe
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (9 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 10/15] block: pass a queue_limits argument to blk_mq_alloc_disk Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:26   ` Damien Le Moal
                     ` (3 more replies)
  2024-01-22 17:36 ` [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
                   ` (3 subsequent siblings)
  14 siblings, 4 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Split out a virtblk_read_limits helper that just reads the various
queue limits to separate it from the higher level probing logic.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/virtio_blk.c | 193 +++++++++++++++++++------------------
 1 file changed, 101 insertions(+), 92 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index a23fce4eca4408..dd46ccd9f84c7d 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1248,31 +1248,17 @@ static const struct blk_mq_ops virtio_mq_ops = {
 static unsigned int virtblk_queue_depth;
 module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
 
-static int virtblk_probe(struct virtio_device *vdev)
+static int virtblk_read_limits(struct virtio_blk *vblk)
 {
-	struct virtio_blk *vblk;
-	struct request_queue *q;
-	int err, index;
-
+	struct request_queue *q = vblk->disk->queue;
+	struct virtio_device *vdev = vblk->vdev;
 	u32 v, blk_size, max_size, sg_elems, opt_io_size;
 	u32 max_discard_segs = 0;
 	u32 discard_granularity = 0;
 	u16 min_io_size;
 	u8 physical_block_exp, alignment_offset;
-	unsigned int queue_depth;
 	size_t max_dma_size;
-
-	if (!vdev->config->get) {
-		dev_err(&vdev->dev, "%s failure: config access disabled\n",
-			__func__);
-		return -EINVAL;
-	}
-
-	err = ida_alloc_range(&vd_index_ida, 0,
-			      minor_to_index(1 << MINORBITS) - 1, GFP_KERNEL);
-	if (err < 0)
-		goto out;
-	index = err;
+	int err;
 
 	/* We need to know how many segments before we allocate. */
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_SEG_MAX,
@@ -1286,73 +1272,6 @@ static int virtblk_probe(struct virtio_device *vdev)
 	/* Prevent integer overflows and honor max vq size */
 	sg_elems = min_t(u32, sg_elems, VIRTIO_BLK_MAX_SG_ELEMS - 2);
 
-	vdev->priv = vblk = kmalloc(sizeof(*vblk), GFP_KERNEL);
-	if (!vblk) {
-		err = -ENOMEM;
-		goto out_free_index;
-	}
-
-	mutex_init(&vblk->vdev_mutex);
-
-	vblk->vdev = vdev;
-
-	INIT_WORK(&vblk->config_work, virtblk_config_changed_work);
-
-	err = init_vq(vblk);
-	if (err)
-		goto out_free_vblk;
-
-	/* Default queue sizing is to fill the ring. */
-	if (!virtblk_queue_depth) {
-		queue_depth = vblk->vqs[0].vq->num_free;
-		/* ... but without indirect descs, we use 2 descs per req */
-		if (!virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC))
-			queue_depth /= 2;
-	} else {
-		queue_depth = virtblk_queue_depth;
-	}
-
-	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
-	vblk->tag_set.ops = &virtio_mq_ops;
-	vblk->tag_set.queue_depth = queue_depth;
-	vblk->tag_set.numa_node = NUMA_NO_NODE;
-	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
-	vblk->tag_set.cmd_size =
-		sizeof(struct virtblk_req) +
-		sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
-	vblk->tag_set.driver_data = vblk;
-	vblk->tag_set.nr_hw_queues = vblk->num_vqs;
-	vblk->tag_set.nr_maps = 1;
-	if (vblk->io_queues[HCTX_TYPE_POLL])
-		vblk->tag_set.nr_maps = 3;
-
-	err = blk_mq_alloc_tag_set(&vblk->tag_set);
-	if (err)
-		goto out_free_vq;
-
-	vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, NULL, vblk);
-	if (IS_ERR(vblk->disk)) {
-		err = PTR_ERR(vblk->disk);
-		goto out_free_tags;
-	}
-	q = vblk->disk->queue;
-
-	virtblk_name_format("vd", index, vblk->disk->disk_name, DISK_NAME_LEN);
-
-	vblk->disk->major = major;
-	vblk->disk->first_minor = index_to_minor(index);
-	vblk->disk->minors = 1 << PART_BITS;
-	vblk->disk->private_data = vblk;
-	vblk->disk->fops = &virtblk_fops;
-	vblk->index = index;
-
-	/* configure queue flush support */
-	virtblk_update_cache_mode(vdev);
-
-	/* If disk is read-only in the host, the guest should obey */
-	if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
-		set_disk_ro(vblk->disk, 1);
-
 	/* We can handle whatever the host told us to handle. */
 	blk_queue_max_segments(q, sg_elems);
 
@@ -1381,7 +1300,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 			dev_err(&vdev->dev,
 				"virtio_blk: invalid block size: 0x%x\n",
 				blk_size);
-			goto out_cleanup_disk;
+			return err;
 		}
 
 		blk_queue_logical_block_size(q, blk_size);
@@ -1455,8 +1374,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 		if (!v) {
 			dev_err(&vdev->dev,
 				"virtio_blk: secure_erase_sector_alignment can't be 0\n");
-			err = -EINVAL;
-			goto out_cleanup_disk;
+			return -EINVAL;
 		}
 
 		discard_granularity = min_not_zero(discard_granularity, v);
@@ -1470,8 +1388,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 		if (!v) {
 			dev_err(&vdev->dev,
 				"virtio_blk: max_secure_erase_sectors can't be 0\n");
-			err = -EINVAL;
-			goto out_cleanup_disk;
+			return -EINVAL;
 		}
 
 		blk_queue_max_secure_erase_sectors(q, v);
@@ -1485,8 +1402,7 @@ static int virtblk_probe(struct virtio_device *vdev)
 		if (!v) {
 			dev_err(&vdev->dev,
 				"virtio_blk: max_secure_erase_seg can't be 0\n");
-			err = -EINVAL;
-			goto out_cleanup_disk;
+			return -EINVAL;
 		}
 
 		max_discard_segs = min_not_zero(max_discard_segs, v);
@@ -1511,6 +1427,99 @@ static int virtblk_probe(struct virtio_device *vdev)
 			q->limits.discard_granularity = blk_size;
 	}
 
+	return 0;
+}
+
+static int virtblk_probe(struct virtio_device *vdev)
+{
+	struct virtio_blk *vblk;
+	struct request_queue *q;
+	int err, index;
+	unsigned int queue_depth;
+
+	if (!vdev->config->get) {
+		dev_err(&vdev->dev, "%s failure: config access disabled\n",
+			__func__);
+		return -EINVAL;
+	}
+
+	err = ida_alloc_range(&vd_index_ida, 0,
+			      minor_to_index(1 << MINORBITS) - 1, GFP_KERNEL);
+	if (err < 0)
+		goto out;
+	index = err;
+
+	vdev->priv = vblk = kmalloc(sizeof(*vblk), GFP_KERNEL);
+	if (!vblk) {
+		err = -ENOMEM;
+		goto out_free_index;
+	}
+
+	mutex_init(&vblk->vdev_mutex);
+
+	vblk->vdev = vdev;
+
+	INIT_WORK(&vblk->config_work, virtblk_config_changed_work);
+
+	err = init_vq(vblk);
+	if (err)
+		goto out_free_vblk;
+
+	/* Default queue sizing is to fill the ring. */
+	if (!virtblk_queue_depth) {
+		queue_depth = vblk->vqs[0].vq->num_free;
+		/* ... but without indirect descs, we use 2 descs per req */
+		if (!virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC))
+			queue_depth /= 2;
+	} else {
+		queue_depth = virtblk_queue_depth;
+	}
+
+	memset(&vblk->tag_set, 0, sizeof(vblk->tag_set));
+	vblk->tag_set.ops = &virtio_mq_ops;
+	vblk->tag_set.queue_depth = queue_depth;
+	vblk->tag_set.numa_node = NUMA_NO_NODE;
+	vblk->tag_set.flags = BLK_MQ_F_SHOULD_MERGE;
+	vblk->tag_set.cmd_size =
+		sizeof(struct virtblk_req) +
+		sizeof(struct scatterlist) * VIRTIO_BLK_INLINE_SG_CNT;
+	vblk->tag_set.driver_data = vblk;
+	vblk->tag_set.nr_hw_queues = vblk->num_vqs;
+	vblk->tag_set.nr_maps = 1;
+	if (vblk->io_queues[HCTX_TYPE_POLL])
+		vblk->tag_set.nr_maps = 3;
+
+	err = blk_mq_alloc_tag_set(&vblk->tag_set);
+	if (err)
+		goto out_free_vq;
+
+	vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, NULL, vblk);
+	if (IS_ERR(vblk->disk)) {
+		err = PTR_ERR(vblk->disk);
+		goto out_free_tags;
+	}
+	q = vblk->disk->queue;
+
+	virtblk_name_format("vd", index, vblk->disk->disk_name, DISK_NAME_LEN);
+
+	vblk->disk->major = major;
+	vblk->disk->first_minor = index_to_minor(index);
+	vblk->disk->minors = 1 << PART_BITS;
+	vblk->disk->private_data = vblk;
+	vblk->disk->fops = &virtblk_fops;
+	vblk->index = index;
+
+	/* configure queue flush support */
+	virtblk_update_cache_mode(vdev);
+
+	/* If disk is read-only in the host, the guest should obey */
+	if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
+		set_disk_ro(vblk->disk, 1);
+
+	err = virtblk_read_limits(vblk);
+	if (err)
+		goto out_cleanup_disk;
+
 	virtblk_update_capacity(vblk, false);
 	virtio_device_ready(vdev);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (10 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 11/15] virtio_blk: split virtblk_probe Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:27   ` Damien Le Moal
                     ` (3 more replies)
  2024-01-22 17:36 ` [PATCH 13/15] loop: cleanup loop_config_discard Christoph Hellwig
                   ` (2 subsequent siblings)
  14 siblings, 4 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Call virtblk_read_limits and most of virtblk_probe_zoned_device before
allocating the gendisk and thus request_queue and make them read into
a queue_limits structure instead.  Pass this initialized queue_limits
to blk_mq_alloc_disk to set the queue up with the right parameters
from the start and only leave a few final touches for zoned devices
to be done just before adding the disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/virtio_blk.c | 130 ++++++++++++++++++-------------------
 1 file changed, 64 insertions(+), 66 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index dd46ccd9f84c7d..d8b55874cd5950 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -720,16 +720,15 @@ static int virtblk_report_zones(struct gendisk *disk, sector_t sector,
 	return ret;
 }
 
-static int virtblk_probe_zoned_device(struct virtio_device *vdev,
-				       struct virtio_blk *vblk,
-				       struct request_queue *q)
+static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
+		struct queue_limits *lim)
 {
+	struct virtio_device *vdev = vblk->vdev;
 	u32 v, wg;
 
 	dev_dbg(&vdev->dev, "probing host-managed zoned device\n");
 
-	disk_set_zoned(vblk->disk);
-	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
+	lim->zoned = true;
 
 	virtio_cread(vdev, struct virtio_blk_config,
 		     zoned.max_open_zones, &v);
@@ -747,8 +746,8 @@ static int virtblk_probe_zoned_device(struct virtio_device *vdev,
 		dev_warn(&vdev->dev, "zero write granularity reported\n");
 		return -ENODEV;
 	}
-	blk_queue_physical_block_size(q, wg);
-	blk_queue_io_min(q, wg);
+	lim->physical_block_size = wg;
+	lim->io_min = wg;
 
 	dev_dbg(&vdev->dev, "write granularity = %u\n", wg);
 
@@ -764,13 +763,13 @@ static int virtblk_probe_zoned_device(struct virtio_device *vdev,
 			vblk->zone_sectors);
 		return -ENODEV;
 	}
-	blk_queue_chunk_sectors(q, vblk->zone_sectors);
+	lim->chunk_sectors = vblk->zone_sectors;
 	dev_dbg(&vdev->dev, "zone sectors = %u\n", vblk->zone_sectors);
 
 	if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
 		dev_warn(&vblk->vdev->dev,
 			 "ignoring negotiated F_DISCARD for zoned device\n");
-		blk_queue_max_discard_sectors(q, 0);
+		lim->max_hw_discard_sectors = 0;
 	}
 
 	virtio_cread(vdev, struct virtio_blk_config,
@@ -785,25 +784,21 @@ static int virtblk_probe_zoned_device(struct virtio_device *vdev,
 			wg, v);
 		return -ENODEV;
 	}
-	blk_queue_max_zone_append_sectors(q, v);
+	lim->max_zone_append_sectors = v;
 	dev_dbg(&vdev->dev, "max append sectors = %u\n", v);
 
-	return blk_revalidate_disk_zones(vblk->disk, NULL);
+	return 0;
 }
-
 #else
-
 /*
- * Zoned block device support is not configured in this kernel.
- * Host-managed zoned devices can't be supported, but others are
- * good to go as regular block devices.
+ * Zoned block device support is not configured in this kernel, host-managed
+ * zoned devices can't be supported.
  */
 #define virtblk_report_zones       NULL
-
-static inline int virtblk_probe_zoned_device(struct virtio_device *vdev,
-			struct virtio_blk *vblk, struct request_queue *q)
+static inline int virtblk_read_zoned_limits(struct virtio_blk *vblk,
+		struct queue_limits *lim)
 {
-	dev_err(&vdev->dev,
+	dev_err(&vblk->vdev->dev,
 		"virtio_blk: zoned devices are not supported");
 	return -EOPNOTSUPP;
 }
@@ -1248,9 +1243,9 @@ static const struct blk_mq_ops virtio_mq_ops = {
 static unsigned int virtblk_queue_depth;
 module_param_named(queue_depth, virtblk_queue_depth, uint, 0444);
 
-static int virtblk_read_limits(struct virtio_blk *vblk)
+static int virtblk_read_limits(struct virtio_blk *vblk,
+		struct queue_limits *lim)
 {
-	struct request_queue *q = vblk->disk->queue;
 	struct virtio_device *vdev = vblk->vdev;
 	u32 v, blk_size, max_size, sg_elems, opt_io_size;
 	u32 max_discard_segs = 0;
@@ -1273,10 +1268,10 @@ static int virtblk_read_limits(struct virtio_blk *vblk)
 	sg_elems = min_t(u32, sg_elems, VIRTIO_BLK_MAX_SG_ELEMS - 2);
 
 	/* We can handle whatever the host told us to handle. */
-	blk_queue_max_segments(q, sg_elems);
+	lim->max_segments = sg_elems;
 
 	/* No real sector limit. */
-	blk_queue_max_hw_sectors(q, UINT_MAX);
+	lim->max_hw_sectors = UINT_MAX;
 
 	max_dma_size = virtio_max_dma_size(vdev);
 	max_size = max_dma_size > U32_MAX ? U32_MAX : max_dma_size;
@@ -1288,7 +1283,7 @@ static int virtblk_read_limits(struct virtio_blk *vblk)
 	if (!err)
 		max_size = min(max_size, v);
 
-	blk_queue_max_segment_size(q, max_size);
+	lim->max_segment_size = max_size;
 
 	/* Host can optionally specify the block size of the device */
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
@@ -1303,35 +1298,34 @@ static int virtblk_read_limits(struct virtio_blk *vblk)
 			return err;
 		}
 
-		blk_queue_logical_block_size(q, blk_size);
+		lim->logical_block_size = blk_size;
 	} else
-		blk_size = queue_logical_block_size(q);
+		blk_size = lim->logical_block_size;
 
 	/* Use topology information if available */
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
 				   struct virtio_blk_config, physical_block_exp,
 				   &physical_block_exp);
 	if (!err && physical_block_exp)
-		blk_queue_physical_block_size(q,
-				blk_size * (1 << physical_block_exp));
+		lim->physical_block_size = blk_size * (1 << physical_block_exp);
 
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
 				   struct virtio_blk_config, alignment_offset,
 				   &alignment_offset);
 	if (!err && alignment_offset)
-		blk_queue_alignment_offset(q, blk_size * alignment_offset);
+		lim->alignment_offset = blk_size * alignment_offset;
 
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
 				   struct virtio_blk_config, min_io_size,
 				   &min_io_size);
 	if (!err && min_io_size)
-		blk_queue_io_min(q, blk_size * min_io_size);
+		lim->io_min = blk_size * min_io_size;
 
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
 				   struct virtio_blk_config, opt_io_size,
 				   &opt_io_size);
 	if (!err && opt_io_size)
-		blk_queue_io_opt(q, blk_size * opt_io_size);
+		lim->io_opt = blk_size * opt_io_size;
 
 	if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
 		virtio_cread(vdev, struct virtio_blk_config,
@@ -1339,7 +1333,7 @@ static int virtblk_read_limits(struct virtio_blk *vblk)
 
 		virtio_cread(vdev, struct virtio_blk_config,
 			     max_discard_sectors, &v);
-		blk_queue_max_discard_sectors(q, v ? v : UINT_MAX);
+		lim->max_hw_discard_sectors = v ? v : UINT_MAX;
 
 		virtio_cread(vdev, struct virtio_blk_config, max_discard_seg,
 			     &max_discard_segs);
@@ -1348,7 +1342,7 @@ static int virtblk_read_limits(struct virtio_blk *vblk)
 	if (virtio_has_feature(vdev, VIRTIO_BLK_F_WRITE_ZEROES)) {
 		virtio_cread(vdev, struct virtio_blk_config,
 			     max_write_zeroes_sectors, &v);
-		blk_queue_max_write_zeroes_sectors(q, v ? v : UINT_MAX);
+		lim->max_write_zeroes_sectors = v ? v : UINT_MAX;
 	}
 
 	/* The discard and secure erase limits are combined since the Linux
@@ -1391,7 +1385,7 @@ static int virtblk_read_limits(struct virtio_blk *vblk)
 			return -EINVAL;
 		}
 
-		blk_queue_max_secure_erase_sectors(q, v);
+		lim->max_secure_erase_sectors = v;
 
 		virtio_cread(vdev, struct virtio_blk_config,
 			     max_secure_erase_seg, &v);
@@ -1418,13 +1412,34 @@ static int virtblk_read_limits(struct virtio_blk *vblk)
 		if (!max_discard_segs)
 			max_discard_segs = sg_elems;
 
-		blk_queue_max_discard_segments(q,
-					       min(max_discard_segs, MAX_DISCARD_SEGMENTS));
+		lim->max_discard_segments =
+			min(max_discard_segs, MAX_DISCARD_SEGMENTS);
 
 		if (discard_granularity)
-			q->limits.discard_granularity = discard_granularity << SECTOR_SHIFT;
+			lim->discard_granularity =
+				discard_granularity << SECTOR_SHIFT;
 		else
-			q->limits.discard_granularity = blk_size;
+			lim->discard_granularity = blk_size;
+	}
+
+	if (virtio_has_feature(vdev, VIRTIO_BLK_F_ZONED)) {
+		u8 model;
+
+		virtio_cread(vdev, struct virtio_blk_config, zoned.model, &model);
+		switch (model) {
+		case VIRTIO_BLK_Z_NONE:
+		case VIRTIO_BLK_Z_HA:
+			/* treat host-aware devices as non-zoned */
+			return 0;
+		case VIRTIO_BLK_Z_HM:
+			err = virtblk_read_zoned_limits(vblk, lim);
+			if (err)
+				return err;
+			break;
+		default:
+			dev_err(&vdev->dev, "unsupported zone model %d\n", model);
+			return -EINVAL;
+		}
 	}
 
 	return 0;
@@ -1433,7 +1448,7 @@ static int virtblk_read_limits(struct virtio_blk *vblk)
 static int virtblk_probe(struct virtio_device *vdev)
 {
 	struct virtio_blk *vblk;
-	struct request_queue *q;
+	struct queue_limits lim = { };
 	int err, index;
 	unsigned int queue_depth;
 
@@ -1493,12 +1508,15 @@ static int virtblk_probe(struct virtio_device *vdev)
 	if (err)
 		goto out_free_vq;
 
-	vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, NULL, vblk);
+	err = virtblk_read_limits(vblk, &lim);
+	if (err)
+		goto out_free_tags;
+
+	vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, &lim, vblk);
 	if (IS_ERR(vblk->disk)) {
 		err = PTR_ERR(vblk->disk);
 		goto out_free_tags;
 	}
-	q = vblk->disk->queue;
 
 	virtblk_name_format("vd", index, vblk->disk->disk_name, DISK_NAME_LEN);
 
@@ -1516,10 +1534,6 @@ static int virtblk_probe(struct virtio_device *vdev)
 	if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
 		set_disk_ro(vblk->disk, 1);
 
-	err = virtblk_read_limits(vblk);
-	if (err)
-		goto out_cleanup_disk;
-
 	virtblk_update_capacity(vblk, false);
 	virtio_device_ready(vdev);
 
@@ -1527,27 +1541,11 @@ static int virtblk_probe(struct virtio_device *vdev)
 	 * All steps that follow use the VQs therefore they need to be
 	 * placed after the virtio_device_ready() call above.
 	 */
-	if (virtio_has_feature(vdev, VIRTIO_BLK_F_ZONED)) {
-		u8 model;
-
-		virtio_cread(vdev, struct virtio_blk_config, zoned.model,
-				&model);
-		switch (model) {
-		case VIRTIO_BLK_Z_NONE:
-		case VIRTIO_BLK_Z_HA:
-			/* Present the host-aware device as non-zoned */
-			break;
-		case VIRTIO_BLK_Z_HM:
-			err = virtblk_probe_zoned_device(vdev, vblk, q);
-			if (err)
-				goto out_cleanup_disk;
-			break;
-		default:
-			dev_err(&vdev->dev, "unsupported zone model %d\n",
-				model);
-			err = -EINVAL;
+	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && lim.zoned) {
+		blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, vblk->disk->queue);
+		err = blk_revalidate_disk_zones(vblk->disk, NULL);
+		if (err)
 			goto out_cleanup_disk;
-		}
 	}
 
 	err = device_add_disk(&vdev->dev, vblk->disk, virtblk_attr_groups);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 13/15] loop: cleanup loop_config_discard
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (11 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:28   ` Damien Le Moal
                     ` (2 more replies)
  2024-01-22 17:36 ` [PATCH 14/15] loop: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
  2024-01-22 17:36 ` [PATCH 15/15] loop: use the atomic queue limits update API Christoph Hellwig
  14 siblings, 3 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Initialize the local variables for the discard max sectors and
granularity to zero as a sensible default, and then merge the
calls assigning them to the queue limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 27 ++++++++-------------------
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 3f855cc79c29f5..7abeb586942677 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -755,7 +755,8 @@ static void loop_config_discard(struct loop_device *lo)
 	struct file *file = lo->lo_backing_file;
 	struct inode *inode = file->f_mapping->host;
 	struct request_queue *q = lo->lo_queue;
-	u32 granularity, max_discard_sectors;
+	u32 granularity = 0, max_discard_sectors = 0;
+	struct kstatfs sbuf;
 
 	/*
 	 * If the backing device is a block device, mirror its zeroing
@@ -775,29 +776,17 @@ static void loop_config_discard(struct loop_device *lo)
 	 * We use punch hole to reclaim the free space used by the
 	 * image a.k.a. discard.
 	 */
-	} else if (!file->f_op->fallocate) {
-		max_discard_sectors = 0;
-		granularity = 0;
-
-	} else {
-		struct kstatfs sbuf;
-
+	} else if (file->f_op->fallocate && !vfs_statfs(&file->f_path, &sbuf)) {
 		max_discard_sectors = UINT_MAX >> 9;
-		if (!vfs_statfs(&file->f_path, &sbuf))
-			granularity = sbuf.f_bsize;
-		else
-			max_discard_sectors = 0;
+		granularity = sbuf.f_bsize;
 	}
 
-	if (max_discard_sectors) {
+	blk_queue_max_discard_sectors(q, max_discard_sectors);
+	blk_queue_max_write_zeroes_sectors(q, max_discard_sectors);
+	if (max_discard_sectors)
 		q->limits.discard_granularity = granularity;
-		blk_queue_max_discard_sectors(q, max_discard_sectors);
-		blk_queue_max_write_zeroes_sectors(q, max_discard_sectors);
-	} else {
+	else
 		q->limits.discard_granularity = 0;
-		blk_queue_max_discard_sectors(q, 0);
-		blk_queue_max_write_zeroes_sectors(q, 0);
-	}
 }
 
 struct loop_worker {
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 14/15] loop: pass queue_limits to blk_mq_alloc_disk
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (12 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 13/15] loop: cleanup loop_config_discard Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:29   ` Damien Le Moal
  2024-01-24  6:22   ` Hannes Reinecke
  2024-01-22 17:36 ` [PATCH 15/15] loop: use the atomic queue limits update API Christoph Hellwig
  14 siblings, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Pass the max_hw_sector limit loop sets at initialization time directly to
blk_mq_alloc_disk instead of updating it right after the allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 7abeb586942677..26c8ea79086798 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1971,6 +1971,12 @@ static const struct blk_mq_ops loop_mq_ops = {
 
 static int loop_add(int i)
 {
+	struct queue_limits lim = {
+		/*
+		 * Random number picked from the historic block max_sectors cap.
+		 */
+		.max_hw_sectors		= 2560u,
+	};
 	struct loop_device *lo;
 	struct gendisk *disk;
 	int err;
@@ -2014,16 +2020,13 @@ static int loop_add(int i)
 	if (err)
 		goto out_free_idr;
 
-	disk = lo->lo_disk = blk_mq_alloc_disk(&lo->tag_set, NULL, lo);
+	disk = lo->lo_disk = blk_mq_alloc_disk(&lo->tag_set, &lim, lo);
 	if (IS_ERR(disk)) {
 		err = PTR_ERR(disk);
 		goto out_cleanup_tags;
 	}
 	lo->lo_queue = lo->lo_disk->queue;
 
-	/* random number picked from the history block max_sectors cap */
-	blk_queue_max_hw_sectors(lo->lo_queue, 2560u);
-
 	/*
 	 * By default, we do buffer IO, so it doesn't make sense to enable
 	 * merge because the I/O submitted to backing file is handled page by
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* [PATCH 15/15] loop: use the atomic queue limits update API
  2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
                   ` (13 preceding siblings ...)
  2024-01-22 17:36 ` [PATCH 14/15] loop: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
@ 2024-01-22 17:36 ` Christoph Hellwig
  2024-01-23  5:31   ` Damien Le Moal
  2024-01-24  6:22   ` Hannes Reinecke
  14 siblings, 2 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 17:36 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

Pass the default limits to blk_mq_alloc_disk and then use the
queue_limits_{start,commit}_update API to change the limits in an
atomic way on existing loop gendisks.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 41 +++++++++++++++++++++++++----------------
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 26c8ea79086798..28a95fd366fea5 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -750,11 +750,11 @@ static void loop_sysfs_exit(struct loop_device *lo)
 				   &loop_attribute_group);
 }
 
-static void loop_config_discard(struct loop_device *lo)
+static void loop_config_discard(struct loop_device *lo,
+		struct queue_limits *lim)
 {
 	struct file *file = lo->lo_backing_file;
 	struct inode *inode = file->f_mapping->host;
-	struct request_queue *q = lo->lo_queue;
 	u32 granularity = 0, max_discard_sectors = 0;
 	struct kstatfs sbuf;
 
@@ -781,12 +781,12 @@ static void loop_config_discard(struct loop_device *lo)
 		granularity = sbuf.f_bsize;
 	}
 
-	blk_queue_max_discard_sectors(q, max_discard_sectors);
-	blk_queue_max_write_zeroes_sectors(q, max_discard_sectors);
+	lim->max_hw_discard_sectors = max_discard_sectors;
+	lim->max_write_zeroes_sectors = max_discard_sectors;
 	if (max_discard_sectors)
-		q->limits.discard_granularity = granularity;
+		lim->discard_granularity = granularity;
 	else
-		q->limits.discard_granularity = 0;
+		lim->discard_granularity = 0;
 }
 
 struct loop_worker {
@@ -975,6 +975,20 @@ loop_set_status_from_info(struct loop_device *lo,
 	return 0;
 }
 
+static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize,
+		bool update_discard_settings)
+{
+	struct queue_limits lim;
+
+	lim = queue_limits_start_update(lo->lo_queue);
+	lim.logical_block_size = bsize;
+	lim.physical_block_size = bsize;
+	lim.io_min = bsize;
+	if (update_discard_settings)
+		loop_config_discard(lo, &lim);
+	return queue_limits_commit_update(lo->lo_queue, &lim);
+}
+
 static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 			  struct block_device *bdev,
 			  const struct loop_config *config)
@@ -1072,11 +1086,10 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 	else
 		bsize = 512;
 
-	blk_queue_logical_block_size(lo->lo_queue, bsize);
-	blk_queue_physical_block_size(lo->lo_queue, bsize);
-	blk_queue_io_min(lo->lo_queue, bsize);
+	error = loop_reconfigure_limits(lo, bsize, true);
+	if (WARN_ON_ONCE(error))
+		goto out_unlock;
 
-	loop_config_discard(lo);
 	loop_update_rotational(lo);
 	loop_update_dio(lo);
 	loop_sysfs_init(lo);
@@ -1143,9 +1156,7 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 	lo->lo_offset = 0;
 	lo->lo_sizelimit = 0;
 	memset(lo->lo_file_name, 0, LO_NAME_SIZE);
-	blk_queue_logical_block_size(lo->lo_queue, 512);
-	blk_queue_physical_block_size(lo->lo_queue, 512);
-	blk_queue_io_min(lo->lo_queue, 512);
+	loop_reconfigure_limits(lo, 512, false);
 	invalidate_disk(lo->lo_disk);
 	loop_sysfs_exit(lo);
 	/* let user-space know about this change */
@@ -1477,9 +1488,7 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
 	invalidate_bdev(lo->lo_device);
 
 	blk_mq_freeze_queue(lo->lo_queue);
-	blk_queue_logical_block_size(lo->lo_queue, arg);
-	blk_queue_physical_block_size(lo->lo_queue, arg);
-	blk_queue_io_min(lo->lo_queue, arg);
+	err = loop_reconfigure_limits(lo, arg, false);
 	loop_update_dio(lo);
 	blk_mq_unfreeze_queue(lo->lo_queue);
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/15] block: add a max_user_discard_sectors queue limit
  2024-01-22 17:36 ` [PATCH 05/15] block: add a max_user_discard_sectors queue limit Christoph Hellwig
@ 2024-01-22 18:27   ` Keith Busch
  2024-01-22 18:38     ` Christoph Hellwig
  2024-01-24  6:10   ` Hannes Reinecke
  1 sibling, 1 reply; 65+ messages in thread
From: Keith Busch @ 2024-01-22 18:27 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Damien Le Moal, Sagi Grimberg, linux-block, linux-nvme,
	virtualization

On Mon, Jan 22, 2024 at 06:36:35PM +0100, Christoph Hellwig wrote:
> @@ -174,23 +174,23 @@ static ssize_t queue_discard_max_show(struct request_queue *q, char *page)
>  static ssize_t queue_discard_max_store(struct request_queue *q,
>  				       const char *page, size_t count)
>  {
> -	unsigned long max_discard;
> -	ssize_t ret = queue_var_store(&max_discard, page, count);
> +	unsigned long max_discard_bytes;
> +	ssize_t ret;
>  
> +	ret = queue_var_store(&max_discard_bytes, page, count);
>  	if (ret < 0)
>  		return ret;
>  
> -	if (max_discard & (q->limits.discard_granularity - 1))
> +	if (max_discard_bytes & (q->limits.discard_granularity - 1))
>  		return -EINVAL;
>  
> -	max_discard >>= 9;
> -	if (max_discard > UINT_MAX)
> +	if ((max_discard_bytes >> SECTOR_SHIFT) > UINT_MAX)
>  		return -EINVAL;
>  
> -	if (max_discard > q->limits.max_hw_discard_sectors)
> -		max_discard = q->limits.max_hw_discard_sectors;
> -
> -	q->limits.max_discard_sectors = max_discard;
> +	q->limits.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
> +	q->limits.max_discard_sectors =
> +		min_not_zero(q->limits.max_hw_discard_sectors,
> +			     q->limits.max_user_discard_sectors);

Shouldn't writing 0 disable discards?

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/15] block: add a max_user_discard_sectors queue limit
  2024-01-22 18:27   ` Keith Busch
@ 2024-01-22 18:38     ` Christoph Hellwig
  2024-01-24 15:44       ` Keith Busch
  0 siblings, 1 reply; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-22 18:38 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Damien Le Moal, Sagi Grimberg, linux-block, linux-nvme,
	virtualization

On Mon, Jan 22, 2024 at 11:27:15AM -0700, Keith Busch wrote:
> > +	q->limits.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
> > +	q->limits.max_discard_sectors =
> > +		min_not_zero(q->limits.max_hw_discard_sectors,
> > +			     q->limits.max_user_discard_sectors);
> 
> Shouldn't writing 0 disable discards?

I mirror the max_user_sectors behavior here, where 0 disables the
user limiting.  But yes, that would be a behavior change.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/15] block: move max_{open,active}_zones to struct queue_limits
  2024-01-22 17:36 ` [PATCH 01/15] block: move max_{open,active}_zones to struct queue_limits Christoph Hellwig
@ 2024-01-23  4:39   ` Damien Le Moal
  2024-01-24  6:00   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  4:39 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> The maximum number of open and active zones is a limit on the queue
> and should be places there so that we can including it in the upcoming
> queue limits batch update API.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/15] block: refactor disk_update_readahead
  2024-01-22 17:36 ` [PATCH 02/15] block: refactor disk_update_readahead Christoph Hellwig
@ 2024-01-23  4:41   ` Damien Le Moal
  2024-01-23  8:40     ` Christoph Hellwig
  2024-01-24  6:01   ` Hannes Reinecke
  1 sibling, 1 reply; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  4:41 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Factor out a blk_apply_bdi_limits limits helper that can be used with
> an explicit queue_limits argument, which will be useful later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-settings.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 06ea91e51b8b2e..e872b0e168525e 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -85,6 +85,17 @@ void blk_set_stacking_limits(struct queue_limits *lim)
>  }
>  EXPORT_SYMBOL(blk_set_stacking_limits);
>  
> +static void blk_apply_bdi_limits(struct backing_dev_info *bdi,
> +		struct queue_limits *lim)
> +{
> +	/*
> +	 * For read-ahead of large files to be effective, we need to read ahead
> +	 * at least twice the optimal I/O size.
> +	 */
> +	bdi->ra_pages = max(lim->io_opt * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);

Nit: while at it, you could replace that division by PAGE_SIZE with a right
shift by PAGE_SHIFT.

Other than that, looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> +	bdi->io_pages = lim->max_sectors >> (PAGE_SHIFT - 9);
> +}
> +
>  /**
>   * blk_queue_bounce_limit - set bounce buffer limit for queue
>   * @q: the request queue for the device
> @@ -393,15 +404,7 @@ EXPORT_SYMBOL(blk_queue_alignment_offset);
>  
>  void disk_update_readahead(struct gendisk *disk)
>  {
> -	struct request_queue *q = disk->queue;
> -
> -	/*
> -	 * For read-ahead of large files to be effective, we need to read ahead
> -	 * at least twice the optimal I/O size.
> -	 */
> -	disk->bdi->ra_pages =
> -		max(queue_io_opt(q) * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
> -	disk->bdi->io_pages = queue_max_sectors(q) >> (PAGE_SHIFT - 9);
> +	blk_apply_bdi_limits(disk->bdi, &disk->queue->limits);
>  }
>  EXPORT_SYMBOL_GPL(disk_update_readahead);
>  

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/15] block: add an API to atomically update queue limits
  2024-01-22 17:36 ` [PATCH 03/15] block: add an API to atomically update queue limits Christoph Hellwig
@ 2024-01-23  4:50   ` Damien Le Moal
  2024-01-23  8:44     ` Christoph Hellwig
  2024-01-24  6:08   ` Hannes Reinecke
  2024-01-25 10:28   ` John Garry
  2 siblings, 1 reply; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  4:50 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Add a new queue_limits_{start,commit}_update pair of functions that
> allows taking an atomic snapshot of queue limits, update it, and
> commit it if it passes validity checking.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  block/blk-core.c       |   1 +
>  block/blk-settings.c   | 140 +++++++++++++++++++++++++++++++++++++++++
>  block/blk.h            |   1 +
>  include/linux/blkdev.h |  23 +++++++
>  4 files changed, 165 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 11342af420d0c4..09f4a44a4aa3cc 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -424,6 +424,7 @@ struct request_queue *blk_alloc_queue(int node_id)
>  	mutex_init(&q->debugfs_mutex);
>  	mutex_init(&q->sysfs_lock);
>  	mutex_init(&q->sysfs_dir_lock);
> +	mutex_init(&q->limits_lock);
>  	mutex_init(&q->rq_qos_mutex);
>  	spin_lock_init(&q->queue_lock);
>  
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index e872b0e168525e..b6692143ccf034 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -96,6 +96,146 @@ static void blk_apply_bdi_limits(struct backing_dev_info *bdi,
>  	bdi->io_pages = lim->max_sectors >> (PAGE_SHIFT - 9);
>  }
>  
> +static int blk_validate_zoned_limits(struct queue_limits *lim)
> +{
> +	if (!lim->zoned) {
> +		if (WARN_ON_ONCE(lim->max_open_zones) ||
> +		    WARN_ON_ONCE(lim->max_active_zones) ||
> +		    WARN_ON_ONCE(lim->zone_write_granularity) ||
> +		    WARN_ON_ONCE(lim->max_zone_append_sectors))
> +			return -EINVAL;
> +		return 0;
> +	}
> +
> +	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_BLK_DEV_ZONED)))
> +		return -EINVAL;
> +
> +	if (lim->zone_write_granularity < lim->logical_block_size)
> +		lim->zone_write_granularity = lim->logical_block_size;

This check and change needs to be against the physical block size. Otherwise,
SMR drives will choke on it.

> +
> +	if (lim->max_zone_append_sectors) {
> +		if (WARN_ON_ONCE(!lim->chunk_sectors))
> +			return -EINVAL;

chunk_sectors is the zone size. So we should probably check that it is set after
the IS_ENABLED() check earlier in the function, no ?

> +		lim->max_zone_append_sectors =
> +			min3(lim->max_hw_sectors,
> +			     lim->max_zone_append_sectors,
> +			     lim->chunk_sectors);
> +	}
> +
> +	return 0;
> +}
> +
> +int blk_validate_limits(struct queue_limits *lim)
> +{
> +	unsigned int max_hw_sectors;
> +
> +	if (!lim->logical_block_size)
> +		lim->logical_block_size = SECTOR_SIZE;
> +
> +	if (!lim->physical_block_size)
> +		lim->physical_block_size = SECTOR_SIZE;
> +	if (lim->physical_block_size < lim->logical_block_size)
> +		lim->physical_block_size = lim->physical_block_size;
> +
> +	if (!lim->io_min)
> +		lim->io_min = SECTOR_SIZE;

This should be lim->logical_block_size, no ?

> +	if (lim->io_min < lim->physical_block_size)
> +		lim->io_min = lim->physical_block_size;

But then given that log <= phys, you could set io_min to physical_block_size if
it is not set.

> +
> +	if (!lim->max_hw_sectors)
> +		lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
> +	if (WARN_ON_ONCE(lim->max_hw_sectors < PAGE_SIZE / SECTOR_SIZE))

You can use PAGE_SECTORS here.

> +		return -EINVAL;
> +
> +	lim->max_hw_sectors = round_down(lim->max_hw_sectors,
> +			lim->logical_block_size >> SECTOR_SHIFT);
> +
> +	max_hw_sectors = min_not_zero(lim->max_hw_sectors,
> +				lim->max_dev_sectors);
> +	if (lim->max_user_sectors) {
> +		if (lim->max_user_sectors > max_hw_sectors ||
> +		    lim->max_user_sectors < PAGE_SIZE / SECTOR_SIZE)

same here.

> +			return -EINVAL;
> +		lim->max_sectors = min(max_hw_sectors, lim->max_user_sectors);
> +	} else {
> +		lim->max_sectors = min(max_hw_sectors, BLK_DEF_MAX_SECTORS_CAP);
> +	}
> +
> +	lim->max_sectors = round_down(lim->max_sectors,
> +			lim->logical_block_size >> SECTOR_SHIFT);
> +
> +	if (!lim->max_segments)
> +		lim->max_segments = BLK_MAX_SEGMENTS;
> +
> +	lim->max_discard_sectors = lim->max_hw_discard_sectors;
> +	if (!lim->max_discard_segments)
> +		lim->max_discard_segments = 1;
> +
> +	if (lim->discard_granularity < lim->physical_block_size)
> +		lim->discard_granularity = lim->physical_block_size;
> +
> +	if (!lim->seg_boundary_mask)
> +		lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
> +	if (WARN_ON_ONCE(lim->seg_boundary_mask < PAGE_SIZE - 1))
> +		return -EINVAL;
> +
> +	if (!lim->max_segment_size)
> +		lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
> +	if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE))
> +		return -EINVAL;
> +
> +	/*
> +	 * Devices that require a virtual boundary do not support scatter/gather
> +	 * I/O natively, but instead require a descriptor list entry for each
> +	 * page (which might not be idential to the Linux PAGE_SIZE).  Because
> +	 * of that they are not limited by our notion of "segment size".
> +	 */
> +	if (lim->virt_boundary_mask) {
> +		if (WARN_ON_ONCE(lim->max_segment_size &&
> +				 lim->max_segment_size != UINT_MAX))
> +			return -EINVAL;
> +		lim->max_segment_size = UINT_MAX;
> +	}
> +
> +	if (!lim->dma_alignment)
> +		lim->dma_alignment = SECTOR_SIZE - 1;
> +	if (WARN_ON_ONCE(lim->dma_alignment > PAGE_SIZE))
> +		return -EINVAL;
> +
> +	if (lim->alignment_offset) {
> +		lim->alignment_offset &= (lim->physical_block_size - 1);
> +		lim->misaligned = 0;
> +	}
> +
> +	return blk_validate_zoned_limits(lim);
> +}
> +
> +/**
> + * queue_limits_commit_update - commit an atomic update of queue limits
> + * @q:		queue to update
> + * @lim:	limits to apply
> + *
> + * Apply the limits in @lim that were obtained from queue_limits_start_update()
> + * and updated by the caller to @q.
> + *
> + * Returns 0 if successful, else a negative error code.
> + */
> +int queue_limits_commit_update(struct request_queue *q,
> +		struct queue_limits *lim)
> +	__releases(q->limits_lock)
> +{
> +	int error = blk_validate_limits(lim);
> +
> +	if (!error) {
> +		q->limits = *lim;
> +		if (q->disk)
> +			blk_apply_bdi_limits(q->disk->bdi, lim);
> +	}
> +	mutex_unlock(&q->limits_lock);
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(queue_limits_commit_update);
> +
>  /**
>   * blk_queue_bounce_limit - set bounce buffer limit for queue
>   * @q: the request queue for the device
> diff --git a/block/blk.h b/block/blk.h
> index 1ef920f72e0f87..58b5dbac2a487d 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -447,6 +447,7 @@ static inline void bio_release_page(struct bio *bio, struct page *page)
>  		unpin_user_page(page);
>  }
>  
> +int blk_validate_limits(struct queue_limits *lim);
>  struct request_queue *blk_alloc_queue(int node_id);
>  
>  int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 4a2e82c7971c86..5b5d3b238de1e7 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -473,6 +473,7 @@ struct request_queue {
>  
>  	struct mutex		sysfs_lock;
>  	struct mutex		sysfs_dir_lock;
> +	struct mutex		limits_lock;
>  
>  	/*
>  	 * for reusing dead hctx instance in case of updating
> @@ -861,6 +862,28 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset,
>  	return chunk_sectors - (offset & (chunk_sectors - 1));
>  }
>  
> +/**
> + * queue_limits_start_update - start an atomic update of queue limits
> + * @q:		queue to update
> + *
> + * This functions starts an atomic update of the queue limits.  It takes a lock
> + * to prevent other updates and returns a snapshot of the current limits that
> + * the caller can modify.  The caller must call queue_limits_commit_update()
> + * to finish the update.
> + *
> + * Context: process context.  The caller must have frozen the queue or ensured
> + * that there is outstanding I/O by other means.
> + */
> +static inline struct queue_limits
> +queue_limits_start_update(struct request_queue *q)
> +	__acquires(q->limits_lock)
> +{
> +	mutex_lock(&q->limits_lock);
> +	return q->limits;
> +}
> +int queue_limits_commit_update(struct request_queue *q,
> +		struct queue_limits *lim);
> +
>  /*
>   * Access functions for manipulating queue properties
>   */

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/15] block: use queue_limits_commit_update in queue_max_sectors_store
  2024-01-22 17:36 ` [PATCH 04/15] block: use queue_limits_commit_update in queue_max_sectors_store Christoph Hellwig
@ 2024-01-23  5:07   ` Damien Le Moal
  2024-01-24  6:09   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:07 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Convert queue_max_sectors_store to use queue_limits_commit_update to
> check and updated the max_sectors limit and freeze the queue before

s/updated/update

> doing so to ensure we don't have request in flight while changing

s/request/requests

> the limits.
> 
> Note that this removes the previously held queue_lock that doesn't
> protect against any other read or writer.

s/read/reader

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Other than these typos, looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 06/15] nvme: remove the hack to not update the discard limits in nvme_config_discard
  2024-01-22 17:36 ` [PATCH 06/15] nvme: remove the hack to not update the discard limits in nvme_config_discard Christoph Hellwig
@ 2024-01-23  5:12   ` Damien Le Moal
  2024-01-23  8:45     ` Christoph Hellwig
  2024-01-24  6:11   ` Hannes Reinecke
  1 sibling, 1 reply; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:12 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Now that the block layer tracks a separate user max discard limit, there
> is no need to prevent nvme from updating it on controller capability
> changes.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/nvme/host/core.c | 10 ----------
>  1 file changed, 10 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 85ab0fcf9e8864..ef70268dccbc5a 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -1754,16 +1754,6 @@ static void nvme_config_discard(struct nvme_ctrl *ctrl, struct gendisk *disk,
>  	BUILD_BUG_ON(PAGE_SIZE / sizeof(struct nvme_dsm_range) <
>  			NVME_DSM_MAX_RANGES);
>  
> -	/*
> -	 * If discard is already enabled, don't reset queue limits.
> -	 *
> -	 * This works around the fact that the block layer can't cope well with
> -	 * updating the hardware limits when overridden through sysfs.  This is
> -	 * harmless because discard limits in NVMe are purely advisory.
> -	 */
> -	if (queue->limits.max_discard_sectors)
> -		return;
> -
>  	blk_queue_max_discard_sectors(queue, max_discard_sectors);

This function references max_user_discard_sectors but that access is done
without holding the queue limits mutex. Is that safe ?

>  	if (ctrl->dmrl)
>  		blk_queue_max_discard_segments(queue, ctrl->dmrl);

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 07/15] block: use queue_limits_commit_update in queue_discard_max_store
  2024-01-22 17:36 ` [PATCH 07/15] block: use queue_limits_commit_update in queue_discard_max_store Christoph Hellwig
@ 2024-01-23  5:16   ` Damien Le Moal
  2024-01-24  6:12   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Convert queue_discard_max_store to use queue_limits_commit_update to
> check and updated the max_sectors limit and freeze the queue before

s/updated/update
s/max_sectors/max_discard_sectors

> doing so to ensure we don't have request in flight while changing

s/request/requests

> the limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

With the typos fixed, looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> ---
>  block/blk-sysfs.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 54e10604ddb1dd..8c8f69d8ba48ee 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -175,7 +175,9 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
>  				       const char *page, size_t count)
>  {
>  	unsigned long max_discard_bytes;
> +	struct queue_limits lim;
>  	ssize_t ret;
> +	int err;
>  
>  	ret = queue_var_store(&max_discard_bytes, page, count);
>  	if (ret < 0)
> @@ -187,10 +189,14 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
>  	if ((max_discard_bytes >> SECTOR_SHIFT) > UINT_MAX)
>  		return -EINVAL;
>  
> -	q->limits.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
> -	q->limits.max_discard_sectors =
> -		min_not_zero(q->limits.max_hw_discard_sectors,
> -			     q->limits.max_user_discard_sectors);
> +	blk_mq_freeze_queue(q);
> +	lim = queue_limits_start_update(q);
> +	lim.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
> +	err = queue_limits_commit_update(q, &lim);
> +	blk_mq_unfreeze_queue(q);
> +
> +	if (err)
> +		return err;
>  	return ret;
>  }
>  

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue
  2024-01-22 17:36 ` [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue Christoph Hellwig
@ 2024-01-23  5:17   ` Damien Le Moal
  2024-01-24  6:14   ` Hannes Reinecke
  2024-01-25  9:45   ` John Garry
  2 siblings, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:17 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Pass a queue_limits to blk_alloc_queue and apply it if non-NULL.  This
> will allow allocating queues with valid queue limits instead of setting
> the values one at a time later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 09/15] block: pass a queue_limits argument to blk_mq_init_queue
  2024-01-22 17:36 ` [PATCH 09/15] block: pass a queue_limits argument to blk_mq_init_queue Christoph Hellwig
@ 2024-01-23  5:19   ` Damien Le Moal
  2024-01-24  6:16   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:19 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Pass a queue_limits to blk_mq_init_queue and apply it if non-NULL.  This
> will allow allocating queues with valid queue limits instead of setting
> the values one at a time later.
> 
> Also rename the function to blk_mq_alloc_queue as that is a much better
> name for a function that allocates a queue and always pass the queuedata
> argument instead of having a separate version for the extra argument.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 10/15] block: pass a queue_limits argument to blk_mq_alloc_disk
  2024-01-22 17:36 ` [PATCH 10/15] block: pass a queue_limits argument to blk_mq_alloc_disk Christoph Hellwig
@ 2024-01-23  5:22   ` Damien Le Moal
  2024-01-24  6:17   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:22 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL.  This
> will allow allocating queues with valid queue limits instead of setting
> the values one at a time later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 11/15] virtio_blk: split virtblk_probe
  2024-01-22 17:36 ` [PATCH 11/15] virtio_blk: split virtblk_probe Christoph Hellwig
@ 2024-01-23  5:26   ` Damien Le Moal
  2024-01-23 14:16   ` Stefan Hajnoczi
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:26 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Split out a virtblk_read_limits helper that just reads the various
> queue limits to separate it from the higher level probing logic.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk
  2024-01-22 17:36 ` [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
@ 2024-01-23  5:27   ` Damien Le Moal
  2024-01-23 14:19   ` Stefan Hajnoczi
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:27 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Call virtblk_read_limits and most of virtblk_probe_zoned_device before
> allocating the gendisk and thus request_queue and make them read into
> a queue_limits structure instead.  Pass this initialized queue_limits
> to blk_mq_alloc_disk to set the queue up with the right parameters
> from the start and only leave a few final touches for zoned devices
> to be done just before adding the disk.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/15] loop: cleanup loop_config_discard
  2024-01-22 17:36 ` [PATCH 13/15] loop: cleanup loop_config_discard Christoph Hellwig
@ 2024-01-23  5:28   ` Damien Le Moal
  2024-01-24  5:29   ` Chaitanya Kulkarni
  2024-01-24  6:21   ` Hannes Reinecke
  2 siblings, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:28 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Initialize the local variables for the discard max sectors and
> granularity to zero as a sensible default, and then merge the
> calls assigning them to the queue limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 14/15] loop: pass queue_limits to blk_mq_alloc_disk
  2024-01-22 17:36 ` [PATCH 14/15] loop: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
@ 2024-01-23  5:29   ` Damien Le Moal
  2024-01-24  6:22   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:29 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Pass the max_hw_sector limit loop sets at initialization time directly to
> blk_mq_alloc_disk instead of updating it right after the allocation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 15/15] loop: use the atomic queue limits update API
  2024-01-22 17:36 ` [PATCH 15/15] loop: use the atomic queue limits update API Christoph Hellwig
@ 2024-01-23  5:31   ` Damien Le Moal
  2024-01-24  6:22   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Damien Le Moal @ 2024-01-23  5:31 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Keith Busch, Sagi Grimberg,
	linux-block, linux-nvme, virtualization

On 1/23/24 02:36, Christoph Hellwig wrote:
> Pass the default limits to blk_mq_alloc_disk and then use the
> queue_limits_{start,commit}_update API to change the limits in an
> atomic way on existing loop gendisks.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/15] block: refactor disk_update_readahead
  2024-01-23  4:41   ` Damien Le Moal
@ 2024-01-23  8:40     ` Christoph Hellwig
  0 siblings, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-23  8:40 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Keith Busch, Sagi Grimberg, linux-block, linux-nvme,
	virtualization

On Tue, Jan 23, 2024 at 01:41:05PM +0900, Damien Le Moal wrote:
> > +{
> > +	/*
> > +	 * For read-ahead of large files to be effective, we need to read ahead
> > +	 * at least twice the optimal I/O size.
> > +	 */
> > +	bdi->ra_pages = max(lim->io_opt * 2 / PAGE_SIZE, VM_READAHEAD_PAGES);
> 
> Nit: while at it, you could replace that division by PAGE_SIZE with a right
> shift by PAGE_SHIFT.

I don't really see the point, this is everything but a fast path,
and for simple constant divisions compilers aren't actually bad at
doing optimizations.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/15] block: add an API to atomically update queue limits
  2024-01-23  4:50   ` Damien Le Moal
@ 2024-01-23  8:44     ` Christoph Hellwig
  0 siblings, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-23  8:44 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Keith Busch, Sagi Grimberg, linux-block, linux-nvme,
	virtualization

On Tue, Jan 23, 2024 at 01:50:32PM +0900, Damien Le Moal wrote:
> > +			return -EINVAL;
> > +		return 0;
> > +	}
> > +
> > +	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_BLK_DEV_ZONED)))
> > +		return -EINVAL;
> > +
> > +	if (lim->zone_write_granularity < lim->logical_block_size)
> > +		lim->zone_write_granularity = lim->logical_block_size;
> 
> This check and change needs to be against the physical block size. Otherwise,
> SMR drives will choke on it.

It probably should, but this mirrors what is done in
blk_queue_zone_write_granularity.  And I want to match that behavior
at least for now, we can then improve and document it once this is
the only interface to validate the limits.

> > +
> > +	if (lim->max_zone_append_sectors) {
> > +		if (WARN_ON_ONCE(!lim->chunk_sectors))
> > +			return -EINVAL;
> 
> chunk_sectors is the zone size. So we should probably check that it is set after
> the IS_ENABLED() check earlier in the function, no ?

Possibly.  In fact I'm wondering where the check comes from, as we
don't seem to have it in the existing helpers.

> > +	if (!lim->logical_block_size)
> > +		lim->logical_block_size = SECTOR_SIZE;
> > +
> > +	if (!lim->physical_block_size)
> > +		lim->physical_block_size = SECTOR_SIZE;
> > +	if (lim->physical_block_size < lim->logical_block_size)
> > +		lim->physical_block_size = lim->physical_block_size;
> > +
> > +	if (!lim->io_min)
> > +		lim->io_min = SECTOR_SIZE;
> 
> This should be lim->logical_block_size, no ?

This comes from the default in blk_set_default_limits.

> 
> > +	if (lim->io_min < lim->physical_block_size)
> > +		lim->io_min = lim->physical_block_size;
> 
> But then given that log <= phys, you could set io_min to physical_block_size if
> it is not set.

Which is what we do here, so the above is actually redundant and
can be removed.

> > +	if (!lim->max_hw_sectors)
> > +		lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
> > +	if (WARN_ON_ONCE(lim->max_hw_sectors < PAGE_SIZE / SECTOR_SIZE))
> 
> You can use PAGE_SECTORS here.

Yes.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 06/15] nvme: remove the hack to not update the discard limits in nvme_config_discard
  2024-01-23  5:12   ` Damien Le Moal
@ 2024-01-23  8:45     ` Christoph Hellwig
  0 siblings, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-23  8:45 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Keith Busch, Sagi Grimberg, linux-block, linux-nvme,
	virtualization

On Tue, Jan 23, 2024 at 02:12:37PM +0900, Damien Le Moal wrote:
> >  	blk_queue_max_discard_sectors(queue, max_discard_sectors);
> 
> This function references max_user_discard_sectors but that access is done
> without holding the queue limits mutex. Is that safe ?

No.  But fixing nvme will be done in a follow series.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 11/15] virtio_blk: split virtblk_probe
  2024-01-22 17:36 ` [PATCH 11/15] virtio_blk: split virtblk_probe Christoph Hellwig
  2024-01-23  5:26   ` Damien Le Moal
@ 2024-01-23 14:16   ` Stefan Hajnoczi
  2024-01-23 20:23   ` Chaitanya Kulkarni
  2024-01-24  6:19   ` Hannes Reinecke
  3 siblings, 0 replies; 65+ messages in thread
From: Stefan Hajnoczi @ 2024-01-23 14:16 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Paolo Bonzini, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

[-- Attachment #1: Type: text/plain, Size: 450 bytes --]

On Mon, Jan 22, 2024 at 06:36:41PM +0100, Christoph Hellwig wrote:
> Split out a virtblk_read_limits helper that just reads the various
> queue limits to separate it from the higher level probing logic.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/block/virtio_blk.c | 193 +++++++++++++++++++------------------
>  1 file changed, 101 insertions(+), 92 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk
  2024-01-22 17:36 ` [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
  2024-01-23  5:27   ` Damien Le Moal
@ 2024-01-23 14:19   ` Stefan Hajnoczi
  2024-01-24  6:21   ` Hannes Reinecke
  2024-06-28 14:25   ` John Garry
  3 siblings, 0 replies; 65+ messages in thread
From: Stefan Hajnoczi @ 2024-01-23 14:19 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Paolo Bonzini, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

[-- Attachment #1: Type: text/plain, Size: 715 bytes --]

On Mon, Jan 22, 2024 at 06:36:42PM +0100, Christoph Hellwig wrote:
> Call virtblk_read_limits and most of virtblk_probe_zoned_device before
> allocating the gendisk and thus request_queue and make them read into
> a queue_limits structure instead.  Pass this initialized queue_limits
> to blk_mq_alloc_disk to set the queue up with the right parameters
> from the start and only leave a few final touches for zoned devices
> to be done just before adding the disk.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/block/virtio_blk.c | 130 ++++++++++++++++++-------------------
>  1 file changed, 64 insertions(+), 66 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 11/15] virtio_blk: split virtblk_probe
  2024-01-22 17:36 ` [PATCH 11/15] virtio_blk: split virtblk_probe Christoph Hellwig
  2024-01-23  5:26   ` Damien Le Moal
  2024-01-23 14:16   ` Stefan Hajnoczi
@ 2024-01-23 20:23   ` Chaitanya Kulkarni
  2024-01-24  6:19   ` Hannes Reinecke
  3 siblings, 0 replies; 65+ messages in thread
From: Chaitanya Kulkarni @ 2024-01-23 20:23 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Jens Axboe, Jason Wang, Xuan Zhuo,
	Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Damien Le Moal, Keith Busch, Sagi Grimberg,
	linux-block@vger.kernel.org, linux-nvme@lists.infradead.org,
	virtualization@lists.linux.dev

On 1/22/2024 9:36 AM, Christoph Hellwig wrote:
> Split out a virtblk_read_limits helper that just reads the various
> queue limits to separate it from the higher level probing logic.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---

Thanks for doing this.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/15] loop: cleanup loop_config_discard
  2024-01-22 17:36 ` [PATCH 13/15] loop: cleanup loop_config_discard Christoph Hellwig
  2024-01-23  5:28   ` Damien Le Moal
@ 2024-01-24  5:29   ` Chaitanya Kulkarni
  2024-01-24  6:21   ` Hannes Reinecke
  2 siblings, 0 replies; 65+ messages in thread
From: Chaitanya Kulkarni @ 2024-01-24  5:29 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block@vger.kernel.org,
	linux-nvme@lists.infradead.org, virtualization@lists.linux.dev

On 1/22/24 09:36, Christoph Hellwig wrote:
> Initialize the local variables for the discard max sectors and
> granularity to zero as a sensible default, and then merge the
> calls assigning them to the queue limits.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck



^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 01/15] block: move max_{open,active}_zones to struct queue_limits
  2024-01-22 17:36 ` [PATCH 01/15] block: move max_{open,active}_zones to struct queue_limits Christoph Hellwig
  2024-01-23  4:39   ` Damien Le Moal
@ 2024-01-24  6:00   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:00 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> The maximum number of open and active zones is a limit on the queue
> and should be places there so that we can including it in the upcoming
> queue limits batch update API.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   include/linux/blkdev.h | 12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 02/15] block: refactor disk_update_readahead
  2024-01-22 17:36 ` [PATCH 02/15] block: refactor disk_update_readahead Christoph Hellwig
  2024-01-23  4:41   ` Damien Le Moal
@ 2024-01-24  6:01   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:01 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Factor out a blk_apply_bdi_limits limits helper that can be used with
> an explicit queue_limits argument, which will be useful later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-settings.c | 21 ++++++++++++---------
>   1 file changed, 12 insertions(+), 9 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/15] block: add an API to atomically update queue limits
  2024-01-22 17:36 ` [PATCH 03/15] block: add an API to atomically update queue limits Christoph Hellwig
  2024-01-23  4:50   ` Damien Le Moal
@ 2024-01-24  6:08   ` Hannes Reinecke
  2024-01-24  9:21     ` Christoph Hellwig
  2024-01-25 10:28   ` John Garry
  2 siblings, 1 reply; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:08 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Add a new queue_limits_{start,commit}_update pair of functions that
> allows taking an atomic snapshot of queue limits, update it, and
> commit it if it passes validity checking.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-core.c       |   1 +
>   block/blk-settings.c   | 140 +++++++++++++++++++++++++++++++++++++++++
>   block/blk.h            |   1 +
>   include/linux/blkdev.h |  23 +++++++
>   4 files changed, 165 insertions(+)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 11342af420d0c4..09f4a44a4aa3cc 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -424,6 +424,7 @@ struct request_queue *blk_alloc_queue(int node_id)
>   	mutex_init(&q->debugfs_mutex);
>   	mutex_init(&q->sysfs_lock);
>   	mutex_init(&q->sysfs_dir_lock);
> +	mutex_init(&q->limits_lock);
>   	mutex_init(&q->rq_qos_mutex);
>   	spin_lock_init(&q->queue_lock);
>   
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index e872b0e168525e..b6692143ccf034 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -96,6 +96,146 @@ static void blk_apply_bdi_limits(struct backing_dev_info *bdi,
>   	bdi->io_pages = lim->max_sectors >> (PAGE_SHIFT - 9);
>   }
>   
> +static int blk_validate_zoned_limits(struct queue_limits *lim)
> +{
> +	if (!lim->zoned) {
> +		if (WARN_ON_ONCE(lim->max_open_zones) ||
> +		    WARN_ON_ONCE(lim->max_active_zones) ||
> +		    WARN_ON_ONCE(lim->zone_write_granularity) ||
> +		    WARN_ON_ONCE(lim->max_zone_append_sectors))
> +			return -EINVAL;
> +		return 0;
> +	}
> +
> +	if (WARN_ON_ONCE(!IS_ENABLED(CONFIG_BLK_DEV_ZONED)))
> +		return -EINVAL;
> +
> +	if (lim->zone_write_granularity < lim->logical_block_size)
> +		lim->zone_write_granularity = lim->logical_block_size;
> +
> +	if (lim->max_zone_append_sectors) {
> +		if (WARN_ON_ONCE(!lim->chunk_sectors))
> +			return -EINVAL;
> +		lim->max_zone_append_sectors =
> +			min3(lim->max_hw_sectors,
> +			     lim->max_zone_append_sectors,
> +			     lim->chunk_sectors);
> +	}
> +
> +	return 0;
> +}
> +
> +int blk_validate_limits(struct queue_limits *lim)
> +{
> +	unsigned int max_hw_sectors;
> +
> +	if (!lim->logical_block_size)
> +		lim->logical_block_size = SECTOR_SIZE;
> +
> +	if (!lim->physical_block_size)
> +		lim->physical_block_size = SECTOR_SIZE;
> +	if (lim->physical_block_size < lim->logical_block_size)
> +		lim->physical_block_size = lim->physical_block_size;
> +
> +	if (!lim->io_min)
> +		lim->io_min = SECTOR_SIZE;
> +	if (lim->io_min < lim->physical_block_size)
> +		lim->io_min = lim->physical_block_size;
> +
> +	if (!lim->max_hw_sectors)
> +		lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
> +	if (WARN_ON_ONCE(lim->max_hw_sectors < PAGE_SIZE / SECTOR_SIZE))
> +		return -EINVAL;
> +
> +	lim->max_hw_sectors = round_down(lim->max_hw_sectors,
> +			lim->logical_block_size >> SECTOR_SHIFT);
> +
> +	max_hw_sectors = min_not_zero(lim->max_hw_sectors,
> +				lim->max_dev_sectors);
> +	if (lim->max_user_sectors) {
> +		if (lim->max_user_sectors > max_hw_sectors ||
> +		    lim->max_user_sectors < PAGE_SIZE / SECTOR_SIZE)
> +			return -EINVAL;
> +		lim->max_sectors = min(max_hw_sectors, lim->max_user_sectors);
> +	} else {
> +		lim->max_sectors = min(max_hw_sectors, BLK_DEF_MAX_SECTORS_CAP);
> +	}
> +
> +	lim->max_sectors = round_down(lim->max_sectors,
> +			lim->logical_block_size >> SECTOR_SHIFT);
> +
> +	if (!lim->max_segments)
> +		lim->max_segments = BLK_MAX_SEGMENTS;
> +
> +	lim->max_discard_sectors = lim->max_hw_discard_sectors;
> +	if (!lim->max_discard_segments)
> +		lim->max_discard_segments = 1;
> +
> +	if (lim->discard_granularity < lim->physical_block_size)
> +		lim->discard_granularity = lim->physical_block_size;
> +
> +	if (!lim->seg_boundary_mask)
> +		lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
> +	if (WARN_ON_ONCE(lim->seg_boundary_mask < PAGE_SIZE - 1))
> +		return -EINVAL;
> +
> +	if (!lim->max_segment_size)
> +		lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
> +	if (WARN_ON_ONCE(lim->max_segment_size < PAGE_SIZE))
> +		return -EINVAL;
> +
> +	/*
> +	 * Devices that require a virtual boundary do not support scatter/gather
> +	 * I/O natively, but instead require a descriptor list entry for each
> +	 * page (which might not be idential to the Linux PAGE_SIZE).  Because
> +	 * of that they are not limited by our notion of "segment size".
> +	 */
> +	if (lim->virt_boundary_mask) {
> +		if (WARN_ON_ONCE(lim->max_segment_size &&
> +				 lim->max_segment_size != UINT_MAX))
> +			return -EINVAL;
> +		lim->max_segment_size = UINT_MAX;
> +	}
> +
> +	if (!lim->dma_alignment)
> +		lim->dma_alignment = SECTOR_SIZE - 1;
> +	if (WARN_ON_ONCE(lim->dma_alignment > PAGE_SIZE))
> +		return -EINVAL;
> +
> +	if (lim->alignment_offset) {
> +		lim->alignment_offset &= (lim->physical_block_size - 1);
> +		lim->misaligned = 0;
> +	}
> +
> +	return blk_validate_zoned_limits(lim);
> +}
> +
> +/**
> + * queue_limits_commit_update - commit an atomic update of queue limits
> + * @q:		queue to update
> + * @lim:	limits to apply
> + *
> + * Apply the limits in @lim that were obtained from queue_limits_start_update()
> + * and updated by the caller to @q.
> + *
> + * Returns 0 if successful, else a negative error code.
> + */
> +int queue_limits_commit_update(struct request_queue *q,
> +		struct queue_limits *lim)
> +	__releases(q->limits_lock)
> +{
> +	int error = blk_validate_limits(lim);
> +
> +	if (!error) {
> +		q->limits = *lim;
> +		if (q->disk)
> +			blk_apply_bdi_limits(q->disk->bdi, lim);
> +	}
> +	mutex_unlock(&q->limits_lock);
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(queue_limits_commit_update);
> +
>   /**
>    * blk_queue_bounce_limit - set bounce buffer limit for queue
>    * @q: the request queue for the device
> diff --git a/block/blk.h b/block/blk.h
> index 1ef920f72e0f87..58b5dbac2a487d 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -447,6 +447,7 @@ static inline void bio_release_page(struct bio *bio, struct page *page)
>   		unpin_user_page(page);
>   }
>   
> +int blk_validate_limits(struct queue_limits *lim);
>   struct request_queue *blk_alloc_queue(int node_id);
>   
>   int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 4a2e82c7971c86..5b5d3b238de1e7 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -473,6 +473,7 @@ struct request_queue {
>   
>   	struct mutex		sysfs_lock;
>   	struct mutex		sysfs_dir_lock;
> +	struct mutex		limits_lock;
>   
>   	/*
>   	 * for reusing dead hctx instance in case of updating
> @@ -861,6 +862,28 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset,
>   	return chunk_sectors - (offset & (chunk_sectors - 1));
>   }
>   
> +/**
> + * queue_limits_start_update - start an atomic update of queue limits
> + * @q:		queue to update
> + *
> + * This functions starts an atomic update of the queue limits.  It takes a lock
> + * to prevent other updates and returns a snapshot of the current limits that
> + * the caller can modify.  The caller must call queue_limits_commit_update()
> + * to finish the update.
> + *
> + * Context: process context.  The caller must have frozen the queue or ensured
> + * that there is outstanding I/O by other means.
> + */
> +static inline struct queue_limits
> +queue_limits_start_update(struct request_queue *q)
> +	__acquires(q->limits_lock)
> +{
> +	mutex_lock(&q->limits_lock);
> +	return q->limits;
> +}

I'm slightly confused about the lifetime of the returned structure.
By my understanding, the returned 'struct queue_limit' is allocated
on the stack of the caller, right?
Shouldn't we note this somewhere such that people don't start passing
the structure around, or, worse, calling 'kfree()' on it?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 04/15] block: use queue_limits_commit_update in queue_max_sectors_store
  2024-01-22 17:36 ` [PATCH 04/15] block: use queue_limits_commit_update in queue_max_sectors_store Christoph Hellwig
  2024-01-23  5:07   ` Damien Le Moal
@ 2024-01-24  6:09   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:09 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Convert queue_max_sectors_store to use queue_limits_commit_update to
> check and updated the max_sectors limit and freeze the queue before
> doing so to ensure we don't have request in flight while changing
> the limits.
> 
> Note that this removes the previously held queue_lock that doesn't
> protect against any other read or writer.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-sysfs.c | 37 ++++++++++++-------------------------
>   1 file changed, 12 insertions(+), 25 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/15] block: add a max_user_discard_sectors queue limit
  2024-01-22 17:36 ` [PATCH 05/15] block: add a max_user_discard_sectors queue limit Christoph Hellwig
  2024-01-22 18:27   ` Keith Busch
@ 2024-01-24  6:10   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:10 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Add a new max_user_discard_sectors limit that mirrors max_user_sectors
> and stores the value that the user manually set.  This now allows
> updates of the max_hw_discard_sectors to not worry about the user
> limit.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-settings.c   | 13 ++++++++++---
>   block/blk-sysfs.c      | 18 +++++++++---------
>   include/linux/blkdev.h |  1 +
>   3 files changed, 20 insertions(+), 12 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 06/15] nvme: remove the hack to not update the discard limits in nvme_config_discard
  2024-01-22 17:36 ` [PATCH 06/15] nvme: remove the hack to not update the discard limits in nvme_config_discard Christoph Hellwig
  2024-01-23  5:12   ` Damien Le Moal
@ 2024-01-24  6:11   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:11 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Now that the block layer tracks a separate user max discard limit, there
> is no need to prevent nvme from updating it on controller capability
> changes.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/nvme/host/core.c | 10 ----------
>   1 file changed, 10 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 07/15] block: use queue_limits_commit_update in queue_discard_max_store
  2024-01-22 17:36 ` [PATCH 07/15] block: use queue_limits_commit_update in queue_discard_max_store Christoph Hellwig
  2024-01-23  5:16   ` Damien Le Moal
@ 2024-01-24  6:12   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:12 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Convert queue_discard_max_store to use queue_limits_commit_update to
> check and updated the max_sectors limit and freeze the queue before
> doing so to ensure we don't have request in flight while changing
> the limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-sysfs.c | 14 ++++++++++----
>   1 file changed, 10 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue
  2024-01-22 17:36 ` [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue Christoph Hellwig
  2024-01-23  5:17   ` Damien Le Moal
@ 2024-01-24  6:14   ` Hannes Reinecke
  2024-01-25  9:45   ` John Garry
  2 siblings, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:14 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Pass a queue_limits to blk_alloc_queue and apply it if non-NULL.  This
> will allow allocating queues with valid queue limits instead of setting
> the values one at a time later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-core.c | 28 +++++++++++++++++++++-------
>   block/blk-mq.c   |  6 +++---
>   block/blk.h      |  2 +-
>   block/genhd.c    |  4 ++--
>   4 files changed, 27 insertions(+), 13 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 09/15] block: pass a queue_limits argument to blk_mq_init_queue
  2024-01-22 17:36 ` [PATCH 09/15] block: pass a queue_limits argument to blk_mq_init_queue Christoph Hellwig
  2024-01-23  5:19   ` Damien Le Moal
@ 2024-01-24  6:16   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Pass a queue_limits to blk_mq_init_queue and apply it if non-NULL.  This
> will allow allocating queues with valid queue limits instead of setting
> the values one at a time later.
> 
> Also rename the function to blk_mq_alloc_queue as that is a much better
> name for a function that allocates a queue and always pass the queuedata
> argument instead of having a separate version for the extra argument.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-mq.c            | 19 +++++++------------
>   block/bsg-lib.c           |  2 +-
>   drivers/nvme/host/apple.c |  2 +-
>   drivers/nvme/host/core.c  |  6 +++---
>   drivers/scsi/scsi_scan.c  |  2 +-
>   drivers/ufs/core/ufshcd.c |  2 +-
>   include/linux/blk-mq.h    |  3 ++-
>   7 files changed, 16 insertions(+), 20 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 10/15] block: pass a queue_limits argument to blk_mq_alloc_disk
  2024-01-22 17:36 ` [PATCH 10/15] block: pass a queue_limits argument to blk_mq_alloc_disk Christoph Hellwig
  2024-01-23  5:22   ` Damien Le Moal
@ 2024-01-24  6:17   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:17 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Pass a queue_limits to blk_mq_alloc_disk and apply it if non-NULL.  This
> will allow allocating queues with valid queue limits instead of setting
> the values one at a time later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   arch/um/drivers/ubd_kern.c          | 2 +-
>   block/blk-mq.c                      | 5 +++--
>   drivers/block/amiflop.c             | 2 +-
>   drivers/block/aoe/aoeblk.c          | 2 +-
>   drivers/block/ataflop.c             | 2 +-
>   drivers/block/floppy.c              | 2 +-
>   drivers/block/loop.c                | 2 +-
>   drivers/block/mtip32xx/mtip32xx.c   | 2 +-
>   drivers/block/nbd.c                 | 2 +-
>   drivers/block/null_blk/main.c       | 2 +-
>   drivers/block/ps3disk.c             | 2 +-
>   drivers/block/rbd.c                 | 2 +-
>   drivers/block/rnbd/rnbd-clt.c       | 2 +-
>   drivers/block/sunvdc.c              | 2 +-
>   drivers/block/swim.c                | 2 +-
>   drivers/block/swim3.c               | 2 +-
>   drivers/block/ublk_drv.c            | 2 +-
>   drivers/block/virtio_blk.c          | 2 +-
>   drivers/block/xen-blkfront.c        | 2 +-
>   drivers/block/z2ram.c               | 2 +-
>   drivers/cdrom/gdrom.c               | 2 +-
>   drivers/memstick/core/ms_block.c    | 2 +-
>   drivers/memstick/core/mspro_block.c | 2 +-
>   drivers/mmc/core/queue.c            | 2 +-
>   drivers/mtd/mtd_blkdevs.c           | 2 +-
>   drivers/mtd/ubi/block.c             | 2 +-
>   drivers/nvme/host/core.c            | 2 +-
>   drivers/s390/block/dasd_genhd.c     | 2 +-
>   drivers/s390/block/scm_blk.c        | 2 +-
>   include/linux/blk-mq.h              | 7 ++++---
>   30 files changed, 35 insertions(+), 33 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 11/15] virtio_blk: split virtblk_probe
  2024-01-22 17:36 ` [PATCH 11/15] virtio_blk: split virtblk_probe Christoph Hellwig
                     ` (2 preceding siblings ...)
  2024-01-23 20:23   ` Chaitanya Kulkarni
@ 2024-01-24  6:19   ` Hannes Reinecke
  3 siblings, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:19 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Split out a virtblk_read_limits helper that just reads the various
> queue limits to separate it from the higher level probing logic.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/virtio_blk.c | 193 +++++++++++++++++++------------------
>   1 file changed, 101 insertions(+), 92 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk
  2024-01-22 17:36 ` [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
  2024-01-23  5:27   ` Damien Le Moal
  2024-01-23 14:19   ` Stefan Hajnoczi
@ 2024-01-24  6:21   ` Hannes Reinecke
  2024-06-28 14:25   ` John Garry
  3 siblings, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:21 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Call virtblk_read_limits and most of virtblk_probe_zoned_device before
> allocating the gendisk and thus request_queue and make them read into
> a queue_limits structure instead.  Pass this initialized queue_limits
> to blk_mq_alloc_disk to set the queue up with the right parameters
> from the start and only leave a few final touches for zoned devices
> to be done just before adding the disk.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/virtio_blk.c | 130 ++++++++++++++++++-------------------
>   1 file changed, 64 insertions(+), 66 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 13/15] loop: cleanup loop_config_discard
  2024-01-22 17:36 ` [PATCH 13/15] loop: cleanup loop_config_discard Christoph Hellwig
  2024-01-23  5:28   ` Damien Le Moal
  2024-01-24  5:29   ` Chaitanya Kulkarni
@ 2024-01-24  6:21   ` Hannes Reinecke
  2 siblings, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:21 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Initialize the local variables for the discard max sectors and
> granularity to zero as a sensible default, and then merge the
> calls assigning them to the queue limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/loop.c | 27 ++++++++-------------------
>   1 file changed, 8 insertions(+), 19 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 14/15] loop: pass queue_limits to blk_mq_alloc_disk
  2024-01-22 17:36 ` [PATCH 14/15] loop: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
  2024-01-23  5:29   ` Damien Le Moal
@ 2024-01-24  6:22   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:22 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Pass the max_hw_sector limit loop sets at initialization time directly to
> blk_mq_alloc_disk instead of updating it right after the allocation.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/loop.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 15/15] loop: use the atomic queue limits update API
  2024-01-22 17:36 ` [PATCH 15/15] loop: use the atomic queue limits update API Christoph Hellwig
  2024-01-23  5:31   ` Damien Le Moal
@ 2024-01-24  6:22   ` Hannes Reinecke
  1 sibling, 0 replies; 65+ messages in thread
From: Hannes Reinecke @ 2024-01-24  6:22 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 1/22/24 18:36, Christoph Hellwig wrote:
> Pass the default limits to blk_mq_alloc_disk and then use the
> queue_limits_{start,commit}_update API to change the limits in an
> atomic way on existing loop gendisks.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/loop.c | 41 +++++++++++++++++++++++++----------------
>   1 file changed, 25 insertions(+), 16 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Ivo Totev, Andrew McDonald,
Werner Knoblich


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/15] block: add an API to atomically update queue limits
  2024-01-24  6:08   ` Hannes Reinecke
@ 2024-01-24  9:21     ` Christoph Hellwig
  0 siblings, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-24  9:21 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Damien Le Moal, Keith Busch, Sagi Grimberg, linux-block,
	linux-nvme, virtualization

On Wed, Jan 24, 2024 at 07:08:58AM +0100, Hannes Reinecke wrote:
>> + * that there is outstanding I/O by other means.
>> + */
>> +static inline struct queue_limits
>> +queue_limits_start_update(struct request_queue *q)
>> +	__acquires(q->limits_lock)
>> +{
>> +	mutex_lock(&q->limits_lock);
>> +	return q->limits;
>> +}
>
> I'm slightly confused about the lifetime of the returned structure.
> By my understanding, the returned 'struct queue_limit' is allocated
> on the stack of the caller, right?
> Shouldn't we note this somewhere such that people don't start passing
> the structure around, or, worse, calling 'kfree()' on it?

queue_limits_start_update returns the structure by value.  So the
lifetime is that of whatever variable you assign it to, which better
be on the stack.  Tryign to kfree a non-pointer type will get the
compiler complain, as does passing a reference to it outside the
function these days.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/15] block: add a max_user_discard_sectors queue limit
  2024-01-22 18:38     ` Christoph Hellwig
@ 2024-01-24 15:44       ` Keith Busch
  2024-01-25  8:12         ` Christoph Hellwig
  0 siblings, 1 reply; 65+ messages in thread
From: Keith Busch @ 2024-01-24 15:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Damien Le Moal, Sagi Grimberg, linux-block, linux-nvme,
	virtualization

On Mon, Jan 22, 2024 at 07:38:57PM +0100, Christoph Hellwig wrote:
> On Mon, Jan 22, 2024 at 11:27:15AM -0700, Keith Busch wrote:
> > > +	q->limits.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
> > > +	q->limits.max_discard_sectors =
> > > +		min_not_zero(q->limits.max_hw_discard_sectors,
> > > +			     q->limits.max_user_discard_sectors);
> > 
> > Shouldn't writing 0 disable discards?
> 
> I mirror the max_user_sectors behavior here, where 0 disables the
> user limiting.  But yes, that would be a behavior change.

But a user should be able to disable discards, right? Unless you really
want to match max_user_sectors, I think you could default the user
setting here to UINT_MAX and use "min" instead of "min_not_zero".

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 05/15] block: add a max_user_discard_sectors queue limit
  2024-01-24 15:44       ` Keith Busch
@ 2024-01-25  8:12         ` Christoph Hellwig
  0 siblings, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-25  8:12 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Damien Le Moal, Sagi Grimberg, linux-block, linux-nvme,
	virtualization

On Wed, Jan 24, 2024 at 08:44:16AM -0700, Keith Busch wrote:
> > > > +		min_not_zero(q->limits.max_hw_discard_sectors,
> > > > +			     q->limits.max_user_discard_sectors);
> > > 
> > > Shouldn't writing 0 disable discards?
> > 
> > I mirror the max_user_sectors behavior here, where 0 disables the
> > user limiting.  But yes, that would be a behavior change.
> 
> But a user should be able to disable discards, right? Unless you really
> want to match max_user_sectors, I think you could default the user
> setting here to UINT_MAX and use "min" instead of "min_not_zero".

Not a fan of having different semantics for the different attributes,
but given that we've had this or 9 years we better stick to it, so
I'll change it.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue
  2024-01-22 17:36 ` [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue Christoph Hellwig
  2024-01-23  5:17   ` Damien Le Moal
  2024-01-24  6:14   ` Hannes Reinecke
@ 2024-01-25  9:45   ` John Garry
  2024-01-25 14:32     ` Christoph Hellwig
  2 siblings, 1 reply; 65+ messages in thread
From: John Garry @ 2024-01-25  9:45 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization

On 22/01/2024 17:36, Christoph Hellwig wrote:
> Pass a queue_limits to blk_alloc_queue and apply it if non-NULL.  This
> will allow allocating queues with valid queue limits instead of setting
> the values one at a time later.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-core.c | 28 +++++++++++++++++++++-------
>   block/blk-mq.c   |  6 +++---
>   block/blk.h      |  2 +-
>   block/genhd.c    |  4 ++--
>   4 files changed, 27 insertions(+), 13 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 09f4a44a4aa3cc..9f1af8fba4dcd2 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -393,9 +393,10 @@ static void blk_timeout_work(struct work_struct *work)
>   {
>   }
>   
> -struct request_queue *blk_alloc_queue(int node_id)
> +struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id)
>   {
>   	struct request_queue *q;
> +	int error;
>   
>   	q = kmem_cache_alloc_node(blk_requestq_cachep, GFP_KERNEL | __GFP_ZERO,
>   				  node_id);
> @@ -404,13 +405,26 @@ struct request_queue *blk_alloc_queue(int node_id)

Is there actually an issue in that blk_alloc_queue() can return NULL, 
and we should be checking IS_ERR_OR_NULL() in the callers?

I don't think that IS_ERR() picks up on NULL pointers, right?

Or make this change:


diff --git a/block/blk-core.c b/block/blk-core.c
index 76cd797d9712..a447b0501e82 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -401,7 +401,7 @@ struct request_queue *blk_alloc_queue(struct
queue_limits *lim, int node_id)
        q = kmem_cache_alloc_node(blk_requestq_cachep, GFP_KERNEL | 
__GFP_ZERO,
                                  node_id);
        if (!q)
-               return NULL;
+               return ERR_PTR(-ENOMEM);

        q->last_merge = NULL;


>   
>   	q->last_merge = NULL;
>   
> +	if (lim) {
> +		error = blk_validate_limits(lim);

nit: This is only ever going to return -EINVAL or 0 by its very nature, 
right? I suppose that it could return a bool and we do the conversion to 
EINVAL here. It's a personal taste thing, I suppose.

> +		if (error)
> +			goto fail_q;
> +		q->limits = *lim;

nit: It might be neater to do this in blk_validate_limits()

> +	} else {
> +		blk_set_default_limits(&q->limits);
> +	}
> +
>   	q->id = ida_alloc(&blk_queue_ida, GFP_KERNEL);
> -	if (q->id < 0)
> +	if (q->id < 0) {
> +		error = q->id;
>   		goto fail_q;
> +	}
>   
>   	q->stats = blk_alloc_queue_stats();
> -	if (!q->stats)
> +	if (!q->stats) {
> +		error = -ENOMEM;
>   		goto fail_id;
> +	}
>   
>   	q->node = node_id;
>   
> @@ -435,12 +449,12 @@ struct request_queue *blk_alloc_queue(int node_id)
>   	 * Init percpu_ref in atomic mode so that it's faster to shutdown.
>   	 * See blk_register_queue() for details.
>   	 */
> -	if (percpu_ref_init(&q->q_usage_counter,
> +	error = percpu_ref_init(&q->q_usage_counter,
>   				blk_queue_usage_counter_release,
> -				PERCPU_REF_INIT_ATOMIC, GFP_KERNEL))
> +				PERCPU_REF_INIT_ATOMIC, GFP_KERNEL);
> +	if (error)
>   		goto fail_stats;
>   
> -	blk_set_default_limits(&q->limits);
>   	q->nr_requests = BLKDEV_DEFAULT_RQ;
>   
>   	return q;
> @@ -451,7 +465,7 @@ struct request_queue *blk_alloc_queue(int node_id)
>   	ida_free(&blk_queue_ida, q->id);
>   fail_q:
>   	kmem_cache_free(blk_requestq_cachep, q);
> -	return NULL;
> +	return ERR_PTR(error);
>   }
>   
>   /**
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index aa87fcfda1ecfc..2ddbefdeae93e4 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -4092,9 +4092,9 @@ static struct request_queue *blk_mq_init_queue_data(struct blk_mq_tag_set *set,
>   	struct request_queue *q;
>   	int ret;
>   
> -	q = blk_alloc_queue(set->numa_node);
> -	if (!q)
> -		return ERR_PTR(-ENOMEM);
> +	q = blk_alloc_queue(NULL, set->numa_node);
> +	if (IS_ERR(q))
> +		return q;
>   	q->queuedata = queuedata;
>   	ret = blk_mq_init_allocated_queue(set, q);
>   	if (ret) {
> diff --git a/block/blk.h b/block/blk.h
> index 58b5dbac2a487d..100c7a02854bfd 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -448,7 +448,7 @@ static inline void bio_release_page(struct bio *bio, struct page *page)
>   }
>   
>   int blk_validate_limits(struct queue_limits *lim);
> -struct request_queue *blk_alloc_queue(int node_id);
> +struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id);
>   
>   int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode);
>   
> diff --git a/block/genhd.c b/block/genhd.c
> index d74fb5b4ae6818..defcd35b421bdd 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -1396,8 +1396,8 @@ struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass)
>   	struct request_queue *q;
>   	struct gendisk *disk;
>   
> -	q = blk_alloc_queue(node);
> -	if (!q)
> +	q = blk_alloc_queue(NULL, node);
> +	if (IS_ERR(q))
>   		return NULL;
>   
>   	disk = __alloc_disk_node(q, node, lkclass);


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/15] block: add an API to atomically update queue limits
  2024-01-22 17:36 ` [PATCH 03/15] block: add an API to atomically update queue limits Christoph Hellwig
  2024-01-23  4:50   ` Damien Le Moal
  2024-01-24  6:08   ` Hannes Reinecke
@ 2024-01-25 10:28   ` John Garry
  2024-01-25 14:35     ` Christoph Hellwig
  2 siblings, 1 reply; 65+ messages in thread
From: John Garry @ 2024-01-25 10:28 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization


> +
> +int blk_validate_limits(struct queue_limits *lim)
> +{
> +	unsigned int max_hw_sectors;
> +
> +	if (!lim->logical_block_size)
> +		lim->logical_block_size = SECTOR_SIZE;

nit: This function is doing a bit more than validating, as the function 
name suggests

> +
> +	if (!lim->physical_block_size)
> +		lim->physical_block_size = SECTOR_SIZE;
> +	if (lim->physical_block_size < lim->logical_block_size)
> +		lim->physical_block_size = lim->physical_block_size;
> +
> +	if (!lim->io_min)
> +		lim->io_min = SECTOR_SIZE;
> +	if (lim->io_min < lim->physical_block_size)
> +		lim->io_min = lim->physical_block_size;
> +
> +	if (!lim->max_hw_sectors)
> +		lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
> +	if (WARN_ON_ONCE(lim->max_hw_sectors < PAGE_SIZE / SECTOR_SIZE))
> +		return -EINVAL;

The WARN_ON_ONCE usage is odd, as it will only ever WARN once for a 
specific q, while other queues associated with other drivers may have 
the same limit issue. But I suppose if one issue is fixed, then the 
other will make itself known...

> +
> +	lim->max_hw_sectors = round_down(lim->max_hw_sectors,
> +			lim->logical_block_size >> SECTOR_SHIFT);
> +
> +	max_hw_sectors = min_not_zero(lim->max_hw_sectors,
> +				lim->max_dev_sectors);
> +	if (lim->max_user_sectors) {
> +		if (lim->max_user_sectors > max_hw_sectors ||
> +		    lim->max_user_sectors < PAGE_SIZE / SECTOR_SIZE)
> +			return -EINVAL;
> +		lim->max_sectors = min(max_hw_sectors, lim->max_user_sectors);
> +	} else {
> +		lim->max_sectors = min(max_hw_sectors, BLK_DEF_MAX_SECTORS_CAP);
> +	}
> +
>
> +/**
> + * queue_limits_commit_update - commit an atomic update of queue limits
> + * @q:		queue to update
> + * @lim:	limits to apply
> + *
> + * Apply the limits in @lim that were obtained from queue_limits_start_update()
> + * and updated by the caller to @q.
> + *
> + * Returns 0 if successful, else a negative error code.
> + */
> +int queue_limits_commit_update(struct request_queue *q,
> +		struct queue_limits *lim)
> +	__releases(q->limits_lock)
> +{
> +	int error = blk_validate_limits(lim);
> +
> +	if (!error) {
> +		q->limits = *lim;

this is duplicating what blk_alloc_queue() does

> +		if (q->disk)
> +			blk_apply_bdi_limits(q->disk->bdi, lim);
> +	}
> +	mutex_unlock(&q->limits_lock);
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(queue_limits_commit_update);
> +
>   /**
>    * blk_queue_bounce_limit - set bounce buffer limit for queue
>    * @q: the request queue for the device
> diff --git a/block/blk.h b/block/blk.h
> index 1ef920f72e0f87..58b5dbac2a487d 100644
> --- a/block/blk.h
> +++ b/block/blk.h
> @@ -447,6 +447,7 @@ static inline void bio_release_page(struct bio *bio, struct page *page)
>   		unpin_user_page(page);
>   }
>   
> +int blk_validate_limits(struct queue_limits *lim);
>   struct request_queue *blk_alloc_queue(int node_id);
>   
>   int disk_scan_partitions(struct gendisk *disk, blk_mode_t mode);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 4a2e82c7971c86..5b5d3b238de1e7 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -473,6 +473,7 @@ struct request_queue {
>   
>   	struct mutex		sysfs_lock;
>   	struct mutex		sysfs_dir_lock;
> +	struct mutex		limits_lock;
>   
>   	/*
>   	 * for reusing dead hctx instance in case of updating
> @@ -861,6 +862,28 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset,
>   	return chunk_sectors - (offset & (chunk_sectors - 1));
>   }
>   
> +/**
> + * queue_limits_start_update - start an atomic update of queue limits
> + * @q:		queue to update
> + *
> + * This functions starts an atomic update of the queue limits.  It takes a lock
> + * to prevent other updates and returns a snapshot of the current limits that
> + * the caller can modify.  The caller must call queue_limits_commit_update()
> + * to finish the update.
> + *
> + * Context: process context.  The caller must have frozen the queue or ensured
> + * that there is outstanding I/O by other means.
> + */
> +static inline struct queue_limits
> +queue_limits_start_update(struct request_queue *q)
> +	__acquires(q->limits_lock)
> +{
> +	mutex_lock(&q->limits_lock);
> +	return q->limits;
> +}
> +int queue_limits_commit_update(struct request_queue *q,
> +		struct queue_limits *lim);

It ain't so nice that the code for queue_limits_start_update() and 
queue_limits_commit_update() pair are in separate files.

> +
>   /*
>    * Access functions for manipulating queue properties
>    */


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue
  2024-01-25  9:45   ` John Garry
@ 2024-01-25 14:32     ` Christoph Hellwig
  0 siblings, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-25 14:32 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Damien Le Moal, Keith Busch, Sagi Grimberg, linux-block,
	linux-nvme, virtualization

On Thu, Jan 25, 2024 at 09:45:20AM +0000, John Garry wrote:
>> +struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id)
>>   {
>>   	struct request_queue *q;
>> +	int error;
>>     	q = kmem_cache_alloc_node(blk_requestq_cachep, GFP_KERNEL | 
>> __GFP_ZERO,
>>   				  node_id);
>> @@ -404,13 +405,26 @@ struct request_queue *blk_alloc_queue(int node_id)
>
> Is there actually an issue in that blk_alloc_queue() can return NULL, and 
> we should be checking IS_ERR_OR_NULL() in the callers?
>
> I don't think that IS_ERR() picks up on NULL pointers, right?
>
> Or make this change:

Yes, that's the right thing to do, I'll add it.

> nit: This is only ever going to return -EINVAL or 0 by its very nature, 
> right? I suppose that it could return a bool and we do the conversion to 
> EINVAL here. It's a personal taste thing, I suppose.

I actually had that during most of the development, but then the callers
had to convert it.  Either way works, but this seemed a bit cleaner.

>> +		if (error)
>> +			goto fail_q;
>> +		q->limits = *lim;
>
> nit: It might be neater to do this in blk_validate_limits()

The limits assigment?  I'd really like to keep blk_validate_limits limited
to only look at the passed in queue_limits and never look at a live
object.

^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 03/15] block: add an API to atomically update queue limits
  2024-01-25 10:28   ` John Garry
@ 2024-01-25 14:35     ` Christoph Hellwig
  0 siblings, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-01-25 14:35 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, Jens Axboe, Michael S. Tsirkin, Jason Wang,
	Xuan Zhuo, Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Damien Le Moal, Keith Busch, Sagi Grimberg, linux-block,
	linux-nvme, virtualization

On Thu, Jan 25, 2024 at 10:28:49AM +0000, John Garry wrote:
>
>> +
>> +int blk_validate_limits(struct queue_limits *lim)
>> +{
>> +	unsigned int max_hw_sectors;
>> +
>> +	if (!lim->logical_block_size)
>> +		lim->logical_block_size = SECTOR_SIZE;
>
> nit: This function is doing a bit more than validating, as the function 
> name suggests

Well, it validates and fixes up.  But that sucks as a name.  If you
have a good naming suggestion I can pick it up, but I couldn't come
up with a better name.

>> +	if (!lim->physical_block_size)
>> +		lim->physical_block_size = SECTOR_SIZE;
>> +	if (lim->physical_block_size < lim->logical_block_size)
>> +		lim->physical_block_size = lim->physical_block_size;
>> +
>> +	if (!lim->io_min)
>> +		lim->io_min = SECTOR_SIZE;
>> +	if (lim->io_min < lim->physical_block_size)
>> +		lim->io_min = lim->physical_block_size;
>> +
>> +	if (!lim->max_hw_sectors)
>> +		lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
>> +	if (WARN_ON_ONCE(lim->max_hw_sectors < PAGE_SIZE / SECTOR_SIZE))
>> +		return -EINVAL;
>
> The WARN_ON_ONCE usage is odd, as it will only ever WARN once for a 
> specific q, while other queues associated with other drivers may have the 
> same limit issue. But I suppose if one issue is fixed, then the other will 
> make itself known...

Yeah.  The idea is to give a loud warning for the API misuse as this
is not an error a normal user action could trigger.

>> +	int error = blk_validate_limits(lim);
>> +
>> +	if (!error) {
>> +		q->limits = *lim;
>
> this is duplicating what blk_alloc_queue() does

I originally had another helper to do the limits assignment
and the blk_apply_bdi_limits.  But as only one caller needs
the blk_apply_bdi_limits it felt a little silly.

>> +static inline struct queue_limits
>> +queue_limits_start_update(struct request_queue *q)
>> +	__acquires(q->limits_lock)
>> +{
>> +	mutex_lock(&q->limits_lock);
>> +	return q->limits;
>> +}
>> +int queue_limits_commit_update(struct request_queue *q,
>> +		struct queue_limits *lim);
>
> It ain't so nice that the code for queue_limits_start_update() and 
> queue_limits_commit_update() pair are in separate files.

We do that for a lot of APIs where part is inline.  And I really do
want queue_limits_start_update inline as passing large structs by
value generates horrible code, and for this API to work it needs
to be returned by value.  In fact it probably should be __always_inline
to avoid gcc doing stupid thing with -Os.


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk
  2024-01-22 17:36 ` [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
                     ` (2 preceding siblings ...)
  2024-01-24  6:21   ` Hannes Reinecke
@ 2024-06-28 14:25   ` John Garry
  2024-06-29  5:19     ` Christoph Hellwig
  3 siblings, 1 reply; 65+ messages in thread
From: John Garry @ 2024-06-28 14:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization,
	Jens Axboe

On 22/01/2024 17:36, Christoph Hellwig wrote:>   	max_dma_size = 
virtio_max_dma_size(vdev);
>   	max_size = max_dma_size > U32_MAX ? U32_MAX : max_dma_size;
> @@ -1288,7 +1283,7 @@ static int virtblk_read_limits(struct virtio_blk *vblk)
>   	if (!err)
>   		max_size = min(max_size, v);
>   
> -	blk_queue_max_segment_size(q, max_size);
> +	lim->max_segment_size = max_size;
>   
>   	/* Host can optionally specify the block size of the device */
>   	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_BLK_SIZE,
> @@ -1303,35 +1298,34 @@ static int virtblk_read_limits(struct virtio_blk *vblk)
>   			return err;
>   		}
>   
> -		blk_queue_logical_block_size(q, blk_size);
> +		lim->logical_block_size = blk_size;
>   	} else
> -		blk_size = queue_logical_block_size(q);
> +		blk_size = lim->logical_block_size;
>   
>   	/* Use topology information if available */
>   	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
>   				   struct virtio_blk_config, physical_block_exp,
>   				   &physical_block_exp);
>   	if (!err && physical_block_exp)
> -		blk_queue_physical_block_size(q,
> -				blk_size * (1 << physical_block_exp));
> +		lim->physical_block_size = blk_size * (1 << physical_block_exp);
>   
>   	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,
>   				   struct virtio_blk_config, alignment_offset,
>   				   &alignment_offset);

I think that we might need a change like the following change after this:

----8<----
[PATCH] virtio_blk: Fix default logical block size

If we fail to read a logical block size in virtblk_read_limits() ->
virtio_cread_feature(), then we default to what is in
lim->logical_block_size, but that would be 0.

We can deal with lim->logical_block_size = 0 later in the
blk_mq_alloc_disk(), but the subsequent code in virtblk_read_limits() 
cannot, so give a default of SECTOR_SIZE.

Signed-off-by: John Garry <john.g.garry@oracle.com>

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 6c64a67ab9c9..d60d805523d2 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1453,6 +1453,7 @@ static int virtblk_probe(struct virtio_device *vdev)
  	struct virtio_blk *vblk;
  	struct queue_limits lim = {
  		.features		= BLK_FEAT_ROTATIONAL,
+		.logical_block_size	= SECTOR_SIZE,
  	};
  	int err, index;
  	unsigned int queue_depth;

---->8----

Look ok?


^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk
  2024-06-28 14:25   ` John Garry
@ 2024-06-29  5:19     ` Christoph Hellwig
  2024-06-30  9:55       ` John Garry
  0 siblings, 1 reply; 65+ messages in thread
From: Christoph Hellwig @ 2024-06-29  5:19 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Damien Le Moal, Keith Busch, Sagi Grimberg, linux-block,
	linux-nvme, virtualization, Jens Axboe

On Fri, Jun 28, 2024 at 03:25:02PM +0100, John Garry wrote:
> I think that we might need a change like the following change after this:
>
> ----8<----
> [PATCH] virtio_blk: Fix default logical block size
>
> If we fail to read a logical block size in virtblk_read_limits() ->
> virtio_cread_feature(), then we default to what is in
> lim->logical_block_size, but that would be 0.
>
> We can deal with lim->logical_block_size = 0 later in the
> blk_mq_alloc_disk(), but the subsequent code in virtblk_read_limits() 
> cannot, so give a default of SECTOR_SIZE.

I think the right fix would be to initialize it where the virtio code
currently uses the uninitialized lim->logical_block_size.  That assumes
that we really want to handle this case.  If reading the block size
fails, can we really trust reading other less basic attributes?  Or
should we just fail the probe?

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 6c64a67ab9c901..5bde41505818f9 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1303,7 +1303,7 @@ static int virtblk_read_limits(struct virtio_blk *vblk,
 
 		lim->logical_block_size = blk_size;
 	} else
-		blk_size = lim->logical_block_size;
+		blk_size = SECTOR_SIZE;
 
 	/* Use topology information if available */
 	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,

^ permalink raw reply related	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk
  2024-06-29  5:19     ` Christoph Hellwig
@ 2024-06-30  9:55       ` John Garry
  2024-07-01  4:54         ` Christoph Hellwig
  0 siblings, 1 reply; 65+ messages in thread
From: John Garry @ 2024-06-30  9:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Paolo Bonzini,
	Stefan Hajnoczi, Martin K. Petersen, Damien Le Moal, Keith Busch,
	Sagi Grimberg, linux-block, linux-nvme, virtualization,
	Jens Axboe

On 29/06/2024 06:19, Christoph Hellwig wrote:
> On Fri, Jun 28, 2024 at 03:25:02PM +0100, John Garry wrote:
>> I think that we might need a change like the following change after this:
>>
>> ----8<----
>> [PATCH] virtio_blk: Fix default logical block size
>>
>> If we fail to read a logical block size in virtblk_read_limits() ->
>> virtio_cread_feature(), then we default to what is in
>> lim->logical_block_size, but that would be 0.
>>
>> We can deal with lim->logical_block_size = 0 later in the
>> blk_mq_alloc_disk(), but the subsequent code in virtblk_read_limits()
>> cannot, so give a default of SECTOR_SIZE.
> 
> I think the right fix would be to initialize it where the virtio code
> currently uses the uninitialized lim->logical_block_size.  That assumes
> that we really want to handle this case.  If reading the block size
> fails, can we really trust reading other less basic attributes?  Or
> should we just fail the probe?

According to the comment on virtio_cread_feature, it is conditional 
(which I read as optional) and that function can only fail with -ENOENT. 
So I don't think that the probe should fail. virtio people?

 From my qemu testing, it is always supplied (obvs).

> 
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 6c64a67ab9c901..5bde41505818f9 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -1303,7 +1303,7 @@ static int virtblk_read_limits(struct virtio_blk *vblk,
>   
>   		lim->logical_block_size = blk_size;
>   	} else
> -		blk_size = lim->logical_block_size;
> +		blk_size = SECTOR_SIZE;

I think that it would need to be:
		blk_size = lim->logical_block_size = SECTOR_SIZE;

Which is a big ugly, so maybe:

	if (err)
		blk_size = SECTOR_SIZE;
	lim->logical_block_size = SECTOR_SIZE;

or, alternatively, set bsize to SECTOR_SIZE when declared.

>   
>   	/* Use topology information if available */
>   	err = virtio_cread_feature(vdev, VIRTIO_BLK_F_TOPOLOGY,


^ permalink raw reply	[flat|nested] 65+ messages in thread

* Re: [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk
  2024-06-30  9:55       ` John Garry
@ 2024-07-01  4:54         ` Christoph Hellwig
  0 siblings, 0 replies; 65+ messages in thread
From: Christoph Hellwig @ 2024-07-01  4:54 UTC (permalink / raw)
  To: John Garry
  Cc: Christoph Hellwig, Michael S. Tsirkin, Jason Wang, Xuan Zhuo,
	Paolo Bonzini, Stefan Hajnoczi, Martin K. Petersen,
	Damien Le Moal, Keith Busch, Sagi Grimberg, linux-block,
	linux-nvme, virtualization, Jens Axboe

On Sun, Jun 30, 2024 at 10:55:12AM +0100, John Garry wrote:
> According to the comment on virtio_cread_feature, it is conditional (which 
> I read as optional) and that function can only fail with -ENOENT. So I 
> don't think that the probe should fail. virtio people?

Oh well..

> I think that it would need to be:
> 		blk_size = lim->logical_block_size = SECTOR_SIZE;
>
> Which is a big ugly, so maybe:
>
> 	if (err)
> 		blk_size = SECTOR_SIZE;
> 	lim->logical_block_size = SECTOR_SIZE;
>
> or, alternatively, set bsize to SECTOR_SIZE when declared.

Maybe just set up logical_block_size at lim declaration time as in
your patch that started this, and then kill the blk_size variable
entirely and just use lim->logical_block_size.


^ permalink raw reply	[flat|nested] 65+ messages in thread

end of thread, other threads:[~2024-07-01  4:54 UTC | newest]

Thread overview: 65+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-22 17:36 atomic queue limits updates Christoph Hellwig
2024-01-22 17:36 ` [PATCH 01/15] block: move max_{open,active}_zones to struct queue_limits Christoph Hellwig
2024-01-23  4:39   ` Damien Le Moal
2024-01-24  6:00   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 02/15] block: refactor disk_update_readahead Christoph Hellwig
2024-01-23  4:41   ` Damien Le Moal
2024-01-23  8:40     ` Christoph Hellwig
2024-01-24  6:01   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 03/15] block: add an API to atomically update queue limits Christoph Hellwig
2024-01-23  4:50   ` Damien Le Moal
2024-01-23  8:44     ` Christoph Hellwig
2024-01-24  6:08   ` Hannes Reinecke
2024-01-24  9:21     ` Christoph Hellwig
2024-01-25 10:28   ` John Garry
2024-01-25 14:35     ` Christoph Hellwig
2024-01-22 17:36 ` [PATCH 04/15] block: use queue_limits_commit_update in queue_max_sectors_store Christoph Hellwig
2024-01-23  5:07   ` Damien Le Moal
2024-01-24  6:09   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 05/15] block: add a max_user_discard_sectors queue limit Christoph Hellwig
2024-01-22 18:27   ` Keith Busch
2024-01-22 18:38     ` Christoph Hellwig
2024-01-24 15:44       ` Keith Busch
2024-01-25  8:12         ` Christoph Hellwig
2024-01-24  6:10   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 06/15] nvme: remove the hack to not update the discard limits in nvme_config_discard Christoph Hellwig
2024-01-23  5:12   ` Damien Le Moal
2024-01-23  8:45     ` Christoph Hellwig
2024-01-24  6:11   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 07/15] block: use queue_limits_commit_update in queue_discard_max_store Christoph Hellwig
2024-01-23  5:16   ` Damien Le Moal
2024-01-24  6:12   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 08/15] block: pass a queue_limits argument to blk_alloc_queue Christoph Hellwig
2024-01-23  5:17   ` Damien Le Moal
2024-01-24  6:14   ` Hannes Reinecke
2024-01-25  9:45   ` John Garry
2024-01-25 14:32     ` Christoph Hellwig
2024-01-22 17:36 ` [PATCH 09/15] block: pass a queue_limits argument to blk_mq_init_queue Christoph Hellwig
2024-01-23  5:19   ` Damien Le Moal
2024-01-24  6:16   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 10/15] block: pass a queue_limits argument to blk_mq_alloc_disk Christoph Hellwig
2024-01-23  5:22   ` Damien Le Moal
2024-01-24  6:17   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 11/15] virtio_blk: split virtblk_probe Christoph Hellwig
2024-01-23  5:26   ` Damien Le Moal
2024-01-23 14:16   ` Stefan Hajnoczi
2024-01-23 20:23   ` Chaitanya Kulkarni
2024-01-24  6:19   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 12/15] virtio_blk: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
2024-01-23  5:27   ` Damien Le Moal
2024-01-23 14:19   ` Stefan Hajnoczi
2024-01-24  6:21   ` Hannes Reinecke
2024-06-28 14:25   ` John Garry
2024-06-29  5:19     ` Christoph Hellwig
2024-06-30  9:55       ` John Garry
2024-07-01  4:54         ` Christoph Hellwig
2024-01-22 17:36 ` [PATCH 13/15] loop: cleanup loop_config_discard Christoph Hellwig
2024-01-23  5:28   ` Damien Le Moal
2024-01-24  5:29   ` Chaitanya Kulkarni
2024-01-24  6:21   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 14/15] loop: pass queue_limits to blk_mq_alloc_disk Christoph Hellwig
2024-01-23  5:29   ` Damien Le Moal
2024-01-24  6:22   ` Hannes Reinecke
2024-01-22 17:36 ` [PATCH 15/15] loop: use the atomic queue limits update API Christoph Hellwig
2024-01-23  5:31   ` Damien Le Moal
2024-01-24  6:22   ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).