move features flags into queue

linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* move features flags into queue_limits
@ 2024-06-11  5:19 Christoph Hellwig
  2024-06-11  5:19 ` [PATCH 01/26] sd: fix sd_is_zoned Christoph Hellwig
                   ` (25 more replies)
  0 siblings, 26 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Hi all,

this is the third and last major series to convert settings to
queue_limits for this merge window.  After a bunch of prep patches to
get various drivers in shape, it moves all the queue_flags that specify
driver controlled features into the queue limits so that they can be
set atomically and are separated from the blk-mq internal flags.

Note that I've only Cc'ed the maintainers for drivers with non-mechanical
changes as the Cc list is already huge.

This series sits on top of the "convert the SCSI ULDs to the atomic queue
limits API v2" and "move integrity settings to queue_limits v2" series.

A git tree is available here:

    git://git.infradead.org/users/hch/block.git block-limit-flags

Gitweb:

    http://git.infradead.org/?p=users/hch/block.git;a=shortlog;h=refs/heads/block-limit-flags

Diffstat:
 Documentation/block/writeback_cache_control.rst |   67 +++++---
 arch/m68k/emu/nfblock.c                         |    1 
 arch/um/drivers/ubd_kern.c                      |    3 
 arch/xtensa/platforms/iss/simdisk.c             |    5 
 block/blk-core.c                                |    7 
 block/blk-flush.c                               |   36 ++--
 block/blk-mq-debugfs.c                          |   13 -
 block/blk-mq.c                                  |   42 +++--
 block/blk-settings.c                            |   46 ++----
 block/blk-sysfs.c                               |  118 ++++++++-------
 block/blk-wbt.c                                 |    4 
 block/blk.h                                     |    2 
 drivers/block/amiflop.c                         |    5 
 drivers/block/aoe/aoeblk.c                      |    1 
 drivers/block/ataflop.c                         |    5 
 drivers/block/brd.c                             |    6 
 drivers/block/drbd/drbd_main.c                  |    6 
 drivers/block/floppy.c                          |    3 
 drivers/block/loop.c                            |   79 +++++-----
 drivers/block/mtip32xx/mtip32xx.c               |    2 
 drivers/block/n64cart.c                         |    2 
 drivers/block/nbd.c                             |   24 +--
 drivers/block/null_blk/main.c                   |   13 -
 drivers/block/null_blk/zoned.c                  |    3 
 drivers/block/pktcdvd.c                         |    1 
 drivers/block/ps3disk.c                         |    8 -
 drivers/block/rbd.c                             |   12 -
 drivers/block/rnbd/rnbd-clt.c                   |   14 -
 drivers/block/sunvdc.c                          |    1 
 drivers/block/swim.c                            |    5 
 drivers/block/swim3.c                           |    5 
 drivers/block/ublk_drv.c                        |   21 +-
 drivers/block/virtio_blk.c                      |   37 ++--
 drivers/block/xen-blkfront.c                    |   33 +---
 drivers/block/zram/zram_drv.c                   |    6 
 drivers/cdrom/gdrom.c                           |    1 
 drivers/md/bcache/super.c                       |    9 -
 drivers/md/dm-table.c                           |  181 +++++-------------------
 drivers/md/dm-zone.c                            |    2 
 drivers/md/dm-zoned-target.c                    |    2 
 drivers/md/dm.c                                 |   13 -
 drivers/md/md.c                                 |   40 -----
 drivers/md/raid5.c                              |    6 
 drivers/mmc/core/block.c                        |   42 ++---
 drivers/mmc/core/queue.c                        |   20 +-
 drivers/mmc/core/queue.h                        |    3 
 drivers/mtd/mtd_blkdevs.c                       |    9 -
 drivers/nvdimm/btt.c                            |    4 
 drivers/nvdimm/pmem.c                           |   14 -
 drivers/nvme/host/core.c                        |   33 ++--
 drivers/nvme/host/multipath.c                   |   24 ---
 drivers/nvme/host/zns.c                         |    3 
 drivers/s390/block/dasd_genhd.c                 |    1 
 drivers/s390/block/dcssblk.c                    |    2 
 drivers/s390/block/scm_blk.c                    |    5 
 drivers/scsi/iscsi_tcp.c                        |    8 -
 drivers/scsi/scsi_lib.c                         |    5 
 drivers/scsi/sd.c                               |   60 +++----
 drivers/scsi/sd.h                               |    7 
 drivers/scsi/sd_zbc.c                           |   17 +-
 include/linux/blkdev.h                          |  119 +++++++++++----
 61 files changed, 556 insertions(+), 710 deletions(-)

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 01/26] sd: fix sd_is_zoned
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  5:46   ` Damien Le Moal
                     ` (3 more replies)
  2024-06-11  5:19 ` [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics Christoph Hellwig
                   ` (24 subsequent siblings)
  25 siblings, 4 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Since commit 7437bb73f087 ("block: remove support for the host aware zone
model"), only ZBC devices expose a zoned access model.  sd_is_zoned is
used to check for that and thus return false for host aware devices.

Fixes: 7437bb73f087 ("block: remove support for the host aware zone model")
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.h     | 7 ++++++-
 drivers/scsi/sd_zbc.c | 7 +------
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 726f1613f6cb56..65dff3c2108926 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -222,9 +222,14 @@ static inline sector_t sectors_to_logical(struct scsi_device *sdev, sector_t sec
 
 void sd_dif_config_host(struct scsi_disk *sdkp, struct queue_limits *lim);
 
+/*
+ * Check if we support a zoned model for this device.
+ *
+ * Note that host aware devices are treated as conventional by Linux.
+ */
 static inline int sd_is_zoned(struct scsi_disk *sdkp)
 {
-	return sdkp->zoned == 1 || sdkp->device->type == TYPE_ZBC;
+	return sdkp->device->type == TYPE_ZBC;
 }
 
 #ifdef CONFIG_BLK_DEV_ZONED
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index f685838d9ed214..422eaed8457227 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -598,13 +598,8 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, struct queue_limits *lim,
 	u32 zone_blocks = 0;
 	int ret;
 
-	if (!sd_is_zoned(sdkp)) {
-		/*
-		 * Device managed or normal SCSI disk, no special handling
-		 * required.
-		 */
+	if (!sd_is_zoned(sdkp))
 		return 0;
-	}
 
 	/* READ16/WRITE16/SYNC16 is mandatory for ZBC devices */
 	sdkp->device->use_16_for_rw = 1;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 01/26] sd: fix sd_is_zoned
  2024-06-11  5:19 ` [PATCH 01/26] sd: fix sd_is_zoned Christoph Hellwig
@ 2024-06-11  5:46   ` Damien Le Moal
  2024-06-11  8:11   ` Hannes Reinecke
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  5:46 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Since commit 7437bb73f087 ("block: remove support for the host aware zone
> model"), only ZBC devices expose a zoned access model.  sd_is_zoned is
> used to check for that and thus return false for host aware devices.
> 
> Fixes: 7437bb73f087 ("block: remove support for the host aware zone model")
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 01/26] sd: fix sd_is_zoned
  2024-06-11  5:19 ` [PATCH 01/26] sd: fix sd_is_zoned Christoph Hellwig
  2024-06-11  5:46   ` Damien Le Moal
@ 2024-06-11  8:11   ` Hannes Reinecke
  2024-06-11 10:50   ` Johannes Thumshirn
  2024-06-11 19:18   ` Bart Van Assche
  3 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:11 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> Since commit 7437bb73f087 ("block: remove support for the host aware zone
> model"), only ZBC devices expose a zoned access model.  sd_is_zoned is
> used to check for that and thus return false for host aware devices.
> 
> Fixes: 7437bb73f087 ("block: remove support for the host aware zone model")
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/sd.h     | 7 ++++++-
>   drivers/scsi/sd_zbc.c | 7 +------
>   2 files changed, 7 insertions(+), 7 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 01/26] sd: fix sd_is_zoned
  2024-06-11  5:19 ` [PATCH 01/26] sd: fix sd_is_zoned Christoph Hellwig
  2024-06-11  5:46   ` Damien Le Moal
  2024-06-11  8:11   ` Hannes Reinecke
@ 2024-06-11 10:50   ` Johannes Thumshirn
  2024-06-11 19:18   ` Bart Van Assche
  3 siblings, 0 replies; 104+ messages in thread
From: Johannes Thumshirn @ 2024-06-11 10:50 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen,
	linux-m68k@lists.linux-m68k.org, linux-um@lists.infradead.org,
	drbd-dev@lists.linbit.com, nbd@other.debian.org,
	linuxppc-dev@lists.ozlabs.org, ceph-devel@vger.kernel.org,
	virtualization@lists.linux.dev, xen-devel@lists.xenproject.org,
	linux-bcache@vger.kernel.org, dm-devel@lists.linux.dev,
	linux-raid@vger.kernel.org, linux-mmc@vger.kernel.org,
	linux-mtd@lists.infradead.org, nvdimm@lists.linux.dev,
	linux-nvme@lists.infradead.org, linux-s390@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-block@vger.kernel.org

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 01/26] sd: fix sd_is_zoned
  2024-06-11  5:19 ` [PATCH 01/26] sd: fix sd_is_zoned Christoph Hellwig
                     ` (2 preceding siblings ...)
  2024-06-11 10:50   ` Johannes Thumshirn
@ 2024-06-11 19:18   ` Bart Van Assche
  3 siblings, 0 replies; 104+ messages in thread
From: Bart Van Assche @ 2024-06-11 19:18 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/10/24 10:19 PM, Christoph Hellwig wrote:
> Since commit 7437bb73f087 ("block: remove support for the host aware zone
> model"), only ZBC devices expose a zoned access model.  sd_is_zoned is
> used to check for that and thus return false for host aware devices.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
  2024-06-11  5:19 ` [PATCH 01/26] sd: fix sd_is_zoned Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  5:51   ` Damien Le Moal
  2024-06-11  8:12   ` Hannes Reinecke
  2024-06-11  5:19 ` [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd Christoph Hellwig
                   ` (23 subsequent siblings)
  25 siblings, 2 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move a bit of code that sets up the zone flag and the write granularity
into sd_zbc_read_zones to be with the rest of the zoned limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.c     | 21 +--------------------
 drivers/scsi/sd_zbc.c | 13 ++++++++++++-
 2 files changed, 13 insertions(+), 21 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 85b45345a27739..5bfed61c70db8f 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3308,29 +3308,10 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp,
 		blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
 	}
 
-
-#ifdef CONFIG_BLK_DEV_ZONED /* sd_probe rejects ZBD devices early otherwise */
-	if (sdkp->device->type == TYPE_ZBC) {
-		lim->zoned = true;
-
-		/*
-		 * Per ZBC and ZAC specifications, writes in sequential write
-		 * required zones of host-managed devices must be aligned to
-		 * the device physical block size.
-		 */
-		lim->zone_write_granularity = sdkp->physical_block_size;
-	} else {
-		/*
-		 * Host-aware devices are treated as conventional.
-		 */
-		lim->zoned = false;
-	}
-#endif /* CONFIG_BLK_DEV_ZONED */
-
 	if (!sdkp->first_scan)
 		return;
 
-	if (lim->zoned)
+	if (sdkp->device->type == TYPE_ZBC)
 		sd_printk(KERN_NOTICE, sdkp, "Host-managed zoned block device\n");
 	else if (sdkp->zoned == 1)
 		sd_printk(KERN_NOTICE, sdkp, "Host-aware SMR disk used as regular disk\n");
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 422eaed8457227..e9501db0450be3 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -598,8 +598,19 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, struct queue_limits *lim,
 	u32 zone_blocks = 0;
 	int ret;
 
-	if (!sd_is_zoned(sdkp))
+	if (!sd_is_zoned(sdkp)) {
+		lim->zoned = false;
 		return 0;
+	}
+
+	lim->zoned = true;
+
+	/*
+	 * Per ZBC and ZAC specifications, writes in sequential write required
+	 * zones of host-managed devices must be aligned to the device physical
+	 * block size.
+	 */
+	lim->zone_write_granularity = sdkp->physical_block_size;
 
 	/* READ16/WRITE16/SYNC16 is mandatory for ZBC devices */
 	sdkp->device->use_16_for_rw = 1;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-11  5:19 ` [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics Christoph Hellwig
@ 2024-06-11  5:51   ` Damien Le Moal
  2024-06-11  5:52     ` Christoph Hellwig
                       ` (2 more replies)
  2024-06-11  8:12   ` Hannes Reinecke
  1 sibling, 3 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  5:51 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move a bit of code that sets up the zone flag and the write granularity
> into sd_zbc_read_zones to be with the rest of the zoned limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/scsi/sd.c     | 21 +--------------------
>  drivers/scsi/sd_zbc.c | 13 ++++++++++++-
>  2 files changed, 13 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 85b45345a27739..5bfed61c70db8f 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3308,29 +3308,10 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp,
>  		blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
>  	}
>  
> -
> -#ifdef CONFIG_BLK_DEV_ZONED /* sd_probe rejects ZBD devices early otherwise */
> -	if (sdkp->device->type == TYPE_ZBC) {
> -		lim->zoned = true;
> -
> -		/*
> -		 * Per ZBC and ZAC specifications, writes in sequential write
> -		 * required zones of host-managed devices must be aligned to
> -		 * the device physical block size.
> -		 */
> -		lim->zone_write_granularity = sdkp->physical_block_size;
> -	} else {
> -		/*
> -		 * Host-aware devices are treated as conventional.
> -		 */
> -		lim->zoned = false;
> -	}
> -#endif /* CONFIG_BLK_DEV_ZONED */
> -
>  	if (!sdkp->first_scan)
>  		return;
>  
> -	if (lim->zoned)
> +	if (sdkp->device->type == TYPE_ZBC)

Nit: use sd_is_zoned() here ?

>  		sd_printk(KERN_NOTICE, sdkp, "Host-managed zoned block device\n");
>  	else if (sdkp->zoned == 1)
>  		sd_printk(KERN_NOTICE, sdkp, "Host-aware SMR disk used as regular disk\n");
> diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
> index 422eaed8457227..e9501db0450be3 100644
> --- a/drivers/scsi/sd_zbc.c
> +++ b/drivers/scsi/sd_zbc.c
> @@ -598,8 +598,19 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, struct queue_limits *lim,
>  	u32 zone_blocks = 0;
>  	int ret;
>  
> -	if (!sd_is_zoned(sdkp))
> +	if (!sd_is_zoned(sdkp)) {
> +		lim->zoned = false;

Maybe we should clear the other zone related limits here ? If the drive is
reformatted/converted from SMR to CMR (FORMAT WITH PRESET), the other zone
limits may be set already, no ?

>  		return 0;
> +	}
> +
> +	lim->zoned = true;
> +
> +	/*
> +	 * Per ZBC and ZAC specifications, writes in sequential write required
> +	 * zones of host-managed devices must be aligned to the device physical
> +	 * block size.
> +	 */
> +	lim->zone_write_granularity = sdkp->physical_block_size;
>  
>  	/* READ16/WRITE16/SYNC16 is mandatory for ZBC devices */
>  	sdkp->device->use_16_for_rw = 1;

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-11  5:51   ` Damien Le Moal
@ 2024-06-11  5:52     ` Christoph Hellwig
  2024-06-11  5:54       ` Christoph Hellwig
  2024-06-11  7:20       ` Damien Le Moal
  2024-06-12  4:45     ` Christoph Hellwig
  2024-06-13  9:39     ` Christoph Hellwig
  2 siblings, 2 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:52 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph B??hmwalder, Josef Bacik, Ming Lei, Michael S. Tsirkin,
	Jason Wang, Roger Pau Monn??, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

On Tue, Jun 11, 2024 at 02:51:24PM +0900, Damien Le Moal wrote:
> > -	if (lim->zoned)
> > +	if (sdkp->device->type == TYPE_ZBC)
> 
> Nit: use sd_is_zoned() here ?

Yes.

> > -	if (!sd_is_zoned(sdkp))
> > +	if (!sd_is_zoned(sdkp)) {
> > +		lim->zoned = false;
> 
> Maybe we should clear the other zone related limits here ? If the drive is
> reformatted/converted from SMR to CMR (FORMAT WITH PRESET), the other zone
> limits may be set already, no ?

blk_validate_zoned_limits already takes care of that.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-11  5:52     ` Christoph Hellwig
@ 2024-06-11  5:54       ` Christoph Hellwig
  2024-06-11  7:25         ` Damien Le Moal
  2024-06-11  7:20       ` Damien Le Moal
  1 sibling, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:54 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph B??hmwalder, Josef Bacik, Ming Lei, Michael S. Tsirkin,
	Jason Wang, Roger Pau Monn??, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

On Tue, Jun 11, 2024 at 07:52:39AM +0200, Christoph Hellwig wrote:
> > Maybe we should clear the other zone related limits here ? If the drive is
> > reformatted/converted from SMR to CMR (FORMAT WITH PRESET), the other zone
> > limits may be set already, no ?
> 
> blk_validate_zoned_limits already takes care of that.

Sorry, brainfart.  The integrity code does that, but not the zoned
code.  I suspect the core code might be a better place for it,
though.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-11  5:54       ` Christoph Hellwig
@ 2024-06-11  7:25         ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  7:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Geert Uytterhoeven, Richard Weinberger,
	Philipp Reisner, Lars Ellenberg, Christoph B??hmwalder,
	Josef Bacik, Ming Lei, Michael S. Tsirkin, Jason Wang,
	Roger Pau Monn??, Alasdair Kergon, Mike Snitzer, Mikulas Patocka,
	Song Liu, Yu Kuai, Vineeth Vijayan, Martin K. Petersen,
	linux-m68k, linux-um, drbd-dev, nbd, linuxppc-dev, ceph-devel,
	virtualization, xen-devel, linux-bcache, dm-devel, linux-raid,
	linux-mmc, linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:54 PM, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 07:52:39AM +0200, Christoph Hellwig wrote:
>>> Maybe we should clear the other zone related limits here ? If the drive is
>>> reformatted/converted from SMR to CMR (FORMAT WITH PRESET), the other zone
>>> limits may be set already, no ?
>>
>> blk_validate_zoned_limits already takes care of that.
> 
> Sorry, brainfart.  The integrity code does that, but not the zoned
> code.  I suspect the core code might be a better place for it,
> though.

Yes. Just replied to your previous email before seeing this one.
I think that:

static int blk_validate_zoned_limits(struct queue_limits *lim)
{
        if (!lim->zoned) {
                if (WARN_ON_ONCE(lim->max_open_zones) ||
                    WARN_ON_ONCE(lim->max_active_zones) ||
                    WARN_ON_ONCE(lim->zone_write_granularity) ||
                    WARN_ON_ONCE(lim->max_zone_append_sectors))
                        return -EINVAL;
                return 0;
        }
	...

could be changed into:

static int blk_validate_zoned_limits(struct queue_limits *lim)
{
	if (!lim->zoned) {
                lim->max_open_zones = 0;
		lim->max_active_zones = 0;
		lim->zone_write_granularity = 0;
		lim->max_zone_append_sectors = 0
		return 0;
	}

But then we would not see "bad" drivers. Could have a small

blk_clear_zoned_limits(struct queue_limits *lim)

helper too.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-11  5:52     ` Christoph Hellwig
  2024-06-11  5:54       ` Christoph Hellwig
@ 2024-06-11  7:20       ` Damien Le Moal
  1 sibling, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  7:20 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Geert Uytterhoeven, Richard Weinberger,
	Philipp Reisner, Lars Ellenberg, Christoph B??hmwalder,
	Josef Bacik, Ming Lei, Michael S. Tsirkin, Jason Wang,
	Roger Pau Monn??, Alasdair Kergon, Mike Snitzer, Mikulas Patocka,
	Song Liu, Yu Kuai, Vineeth Vijayan, Martin K. Petersen,
	linux-m68k, linux-um, drbd-dev, nbd, linuxppc-dev, ceph-devel,
	virtualization, xen-devel, linux-bcache, dm-devel, linux-raid,
	linux-mmc, linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:52 PM, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 02:51:24PM +0900, Damien Le Moal wrote:
>>> -	if (lim->zoned)
>>> +	if (sdkp->device->type == TYPE_ZBC)
>>
>> Nit: use sd_is_zoned() here ?
> 
> Yes.
> 
>>> -	if (!sd_is_zoned(sdkp))
>>> +	if (!sd_is_zoned(sdkp)) {
>>> +		lim->zoned = false;
>>
>> Maybe we should clear the other zone related limits here ? If the drive is
>> reformatted/converted from SMR to CMR (FORMAT WITH PRESET), the other zone
>> limits may be set already, no ?
> 
> blk_validate_zoned_limits already takes care of that.

I do not think it does:

static int blk_validate_zoned_limits(struct queue_limits *lim)
{
        if (!lim->zoned) {
                if (WARN_ON_ONCE(lim->max_open_zones) ||
                    WARN_ON_ONCE(lim->max_active_zones) ||
                    WARN_ON_ONCE(lim->zone_write_granularity) ||
                    WARN_ON_ONCE(lim->max_zone_append_sectors))
                        return -EINVAL;
                return 0;
        }
	...

So setting lim->zoned to false without clearing the other limits potentially
will trigger warnings...

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-11  5:51   ` Damien Le Moal
  2024-06-11  5:52     ` Christoph Hellwig
@ 2024-06-12  4:45     ` Christoph Hellwig
  2024-06-13  9:39     ` Christoph Hellwig
  2 siblings, 0 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-12  4:45 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Tue, Jun 11, 2024 at 02:51:24PM +0900, Damien Le Moal wrote:
> > -	if (!sd_is_zoned(sdkp))
> > +	if (!sd_is_zoned(sdkp)) {
> > +		lim->zoned = false;
> 
> Maybe we should clear the other zone related limits here ? If the drive is
> reformatted/converted from SMR to CMR (FORMAT WITH PRESET), the other zone
> limits may be set already, no ?

Yes, but we would not end up here.  The device type is constant over
the struct of the scsi_device and we'd have to fully reprobe it.

So we don't need to clear any flags, including the actual zoned flag
here.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-11  5:51   ` Damien Le Moal
  2024-06-11  5:52     ` Christoph Hellwig
  2024-06-12  4:45     ` Christoph Hellwig
@ 2024-06-13  9:39     ` Christoph Hellwig
  2024-06-16 23:01       ` Damien Le Moal
  2 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-13  9:39 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Tue, Jun 11, 2024 at 02:51:24PM +0900, Damien Le Moal wrote:
> > +	if (sdkp->device->type == TYPE_ZBC)
> 
> Nit: use sd_is_zoned() here ?

Actually - is there much in even keeping sd_is_zoned now that the
host aware support is removed?  Just open coding the type check isn't
any more code, and probably easier to follow.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-13  9:39     ` Christoph Hellwig
@ 2024-06-16 23:01       ` Damien Le Moal
  2024-06-17  4:53         ` Christoph Hellwig
  0 siblings, 1 reply; 104+ messages in thread
From: Damien Le Moal @ 2024-06-16 23:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Geert Uytterhoeven, Richard Weinberger,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
	Josef Bacik, Ming Lei, Michael S. Tsirkin, Jason Wang,
	Roger Pau Monné, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

On 6/13/24 18:39, Christoph Hellwig wrote:
> On Tue, Jun 11, 2024 at 02:51:24PM +0900, Damien Le Moal wrote:
>>> +	if (sdkp->device->type == TYPE_ZBC)
>>
>> Nit: use sd_is_zoned() here ?
> 
> Actually - is there much in even keeping sd_is_zoned now that the
> host aware support is removed?  Just open coding the type check isn't
> any more code, and probably easier to follow.

Removing this helper is fine by me. There are only 2 call sites in sd.c and the
some of 4 calls in sd_zbc.c are not really needed:
1) The call in sd_zbc_print_zones() is not needed at all since this function is
called only for a zoned drive from sd_zbc_revalidate_zones().
2) The calls in sd_zbc_report_zones() and sd_zbc_cmnd_checks() are probably
useless as these are called only for zoned drives in the first place. The checks
would be useful only for passthrough commands, but then we do not really care
about these and the user will get a failure anyway if it tries to do ZBC
commands on non-ZBC drives.
3) That leaves only the call in sd_zbc_read_zones() but that check can probably
be moved to sd.c to conditionally call  sd_zbc_read_zones().

-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-16 23:01       ` Damien Le Moal
@ 2024-06-17  4:53         ` Christoph Hellwig
  2024-06-17  6:03           ` Damien Le Moal
  0 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-17  4:53 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Mon, Jun 17, 2024 at 08:01:04AM +0900, Damien Le Moal wrote:
> On 6/13/24 18:39, Christoph Hellwig wrote:
> > On Tue, Jun 11, 2024 at 02:51:24PM +0900, Damien Le Moal wrote:
> >>> +	if (sdkp->device->type == TYPE_ZBC)
> >>
> >> Nit: use sd_is_zoned() here ?
> > 
> > Actually - is there much in even keeping sd_is_zoned now that the
> > host aware support is removed?  Just open coding the type check isn't
> > any more code, and probably easier to follow.
> 
> Removing this helper is fine by me.

FYI, I've removed it yesterday, but not done much of the cleanups suggest
here.  We should probably do those in a follow up up, uncluding removing
the !ZBC check in sd_zbc_check_zoned_characteristics.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-17  4:53         ` Christoph Hellwig
@ 2024-06-17  6:03           ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-17  6:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Geert Uytterhoeven, Richard Weinberger,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
	Josef Bacik, Ming Lei, Michael S. Tsirkin, Jason Wang,
	Roger Pau Monné, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

On 6/17/24 13:53, Christoph Hellwig wrote:
> On Mon, Jun 17, 2024 at 08:01:04AM +0900, Damien Le Moal wrote:
>> On 6/13/24 18:39, Christoph Hellwig wrote:
>>> On Tue, Jun 11, 2024 at 02:51:24PM +0900, Damien Le Moal wrote:
>>>>> +	if (sdkp->device->type == TYPE_ZBC)
>>>>
>>>> Nit: use sd_is_zoned() here ?
>>>
>>> Actually - is there much in even keeping sd_is_zoned now that the
>>> host aware support is removed?  Just open coding the type check isn't
>>> any more code, and probably easier to follow.
>>
>> Removing this helper is fine by me.
> 
> FYI, I've removed it yesterday, but not done much of the cleanups suggest
> here.  We should probably do those in a follow up up, uncluding removing
> the !ZBC check in sd_zbc_check_zoned_characteristics.

OK. I will send that once your series in queued.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics
  2024-06-11  5:19 ` [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics Christoph Hellwig
  2024-06-11  5:51   ` Damien Le Moal
@ 2024-06-11  8:12   ` Hannes Reinecke
  1 sibling, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:12 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> Move a bit of code that sets up the zone flag and the write granularity
> into sd_zbc_read_zones to be with the rest of the zoned limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/scsi/sd.c     | 21 +--------------------
>   drivers/scsi/sd_zbc.c | 13 ++++++++++++-
>   2 files changed, 13 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 85b45345a27739..5bfed61c70db8f 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -3308,29 +3308,10 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp,
>   		blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
>   	}
>   
> -
> -#ifdef CONFIG_BLK_DEV_ZONED /* sd_probe rejects ZBD devices early otherwise */
> -	if (sdkp->device->type == TYPE_ZBC) {
> -		lim->zoned = true;
> -
> -		/*
> -		 * Per ZBC and ZAC specifications, writes in sequential write
> -		 * required zones of host-managed devices must be aligned to
> -		 * the device physical block size.
> -		 */
> -		lim->zone_write_granularity = sdkp->physical_block_size;
> -	} else {
> -		/*
> -		 * Host-aware devices are treated as conventional.
> -		 */
> -		lim->zoned = false;
> -	}
> -#endif /* CONFIG_BLK_DEV_ZONED */
> -
>   	if (!sdkp->first_scan)
>   		return;
>   
> -	if (lim->zoned)
> +	if (sdkp->device->type == TYPE_ZBC)

Why not sd_is_zoned()?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
  2024-06-11  5:19 ` [PATCH 01/26] sd: fix sd_is_zoned Christoph Hellwig
  2024-06-11  5:19 ` [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  5:53   ` Damien Le Moal
                     ` (2 more replies)
  2024-06-11  5:19 ` [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits Christoph Hellwig
                   ` (22 subsequent siblings)
  25 siblings, 3 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

__loop_clr_fd wants to clear all settings on the device.  Prepare for
moving more settings into the block limits by open coding
loop_reconfigure_limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 93780f41646b75..93a49c40a31a71 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1133,6 +1133,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 
 static void __loop_clr_fd(struct loop_device *lo, bool release)
 {
+	struct queue_limits lim;
 	struct file *filp;
 	gfp_t gfp = lo->old_gfp_mask;
 
@@ -1156,7 +1157,14 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 	lo->lo_offset = 0;
 	lo->lo_sizelimit = 0;
 	memset(lo->lo_file_name, 0, LO_NAME_SIZE);
-	loop_reconfigure_limits(lo, 512, false);
+
+	/* reset the block size to the default */
+	lim = queue_limits_start_update(lo->lo_queue);
+	lim.logical_block_size = 512;
+	lim.physical_block_size = 512;
+	lim.io_min = 512;
+	queue_limits_commit_update(lo->lo_queue, &lim);
+
 	invalidate_disk(lo->lo_disk);
 	loop_sysfs_exit(lo);
 	/* let user-space know about this change */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd
  2024-06-11  5:19 ` [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd Christoph Hellwig
@ 2024-06-11  5:53   ` Damien Le Moal
  2024-06-11  5:54     ` Christoph Hellwig
  2024-06-11  8:14   ` Hannes Reinecke
  2024-06-11 19:21   ` Bart Van Assche
  2 siblings, 1 reply; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  5:53 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> __loop_clr_fd wants to clear all settings on the device.  Prepare for
> moving more settings into the block limits by open coding
> loop_reconfigure_limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/block/loop.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 93780f41646b75..93a49c40a31a71 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1133,6 +1133,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
>  
>  static void __loop_clr_fd(struct loop_device *lo, bool release)
>  {
> +	struct queue_limits lim;
>  	struct file *filp;
>  	gfp_t gfp = lo->old_gfp_mask;
>  
> @@ -1156,7 +1157,14 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
>  	lo->lo_offset = 0;
>  	lo->lo_sizelimit = 0;
>  	memset(lo->lo_file_name, 0, LO_NAME_SIZE);
> -	loop_reconfigure_limits(lo, 512, false);
> +
> +	/* reset the block size to the default */
> +	lim = queue_limits_start_update(lo->lo_queue);
> +	lim.logical_block_size = 512;

Nit: SECTOR_SIZE ? maybe ?

> +	lim.physical_block_size = 512;
> +	lim.io_min = 512;
> +	queue_limits_commit_update(lo->lo_queue, &lim);
> +
>  	invalidate_disk(lo->lo_disk);
>  	loop_sysfs_exit(lo);
>  	/* let user-space know about this change */

Otherwise, looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd
  2024-06-11  5:53   ` Damien Le Moal
@ 2024-06-11  5:54     ` Christoph Hellwig
  0 siblings, 0 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:54 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph B??hmwalder, Josef Bacik, Ming Lei, Michael S. Tsirkin,
	Jason Wang, Roger Pau Monn??, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

On Tue, Jun 11, 2024 at 02:53:19PM +0900, Damien Le Moal wrote:
> > +	/* reset the block size to the default */
> > +	lim = queue_limits_start_update(lo->lo_queue);
> > +	lim.logical_block_size = 512;
> 
> Nit: SECTOR_SIZE ? maybe ?

Yes.  I was following the existing code, but SECTOR_SIZE is probably
a better choice here.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd
  2024-06-11  5:19 ` [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd Christoph Hellwig
  2024-06-11  5:53   ` Damien Le Moal
@ 2024-06-11  8:14   ` Hannes Reinecke
  2024-06-11 19:21   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:14 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> __loop_clr_fd wants to clear all settings on the device.  Prepare for
> moving more settings into the block limits by open coding
> loop_reconfigure_limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/loop.c | 10 +++++++++-
>   1 file changed, 9 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd
  2024-06-11  5:19 ` [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd Christoph Hellwig
  2024-06-11  5:53   ` Damien Le Moal
  2024-06-11  8:14   ` Hannes Reinecke
@ 2024-06-11 19:21   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Bart Van Assche @ 2024-06-11 19:21 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/10/24 10:19 PM, Christoph Hellwig wrote:
> __loop_clr_fd wants to clear all settings on the device.  Prepare for
> moving more settings into the block limits by open coding
> loop_reconfigure_limits.

If Damien's comment is addressed, feel free to add:

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (2 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  5:54   ` Damien Le Moal
                     ` (2 more replies)
  2024-06-11  5:19 ` [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O Christoph Hellwig
                   ` (21 subsequent siblings)
  25 siblings, 3 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Simplify loop_reconfigure_limits by always updating the discard limits.
This adds a little more work to loop_set_block_size, but doesn't change
the outcome as the discard flag won't change.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 93a49c40a31a71..c658282454af1b 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -975,8 +975,7 @@ loop_set_status_from_info(struct loop_device *lo,
 	return 0;
 }
 
-static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize,
-		bool update_discard_settings)
+static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
 {
 	struct queue_limits lim;
 
@@ -984,8 +983,7 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize,
 	lim.logical_block_size = bsize;
 	lim.physical_block_size = bsize;
 	lim.io_min = bsize;
-	if (update_discard_settings)
-		loop_config_discard(lo, &lim);
+	loop_config_discard(lo, &lim);
 	return queue_limits_commit_update(lo->lo_queue, &lim);
 }
 
@@ -1086,7 +1084,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 	else
 		bsize = 512;
 
-	error = loop_reconfigure_limits(lo, bsize, true);
+	error = loop_reconfigure_limits(lo, bsize);
 	if (WARN_ON_ONCE(error))
 		goto out_unlock;
 
@@ -1496,7 +1494,7 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
 	invalidate_bdev(lo->lo_device);
 
 	blk_mq_freeze_queue(lo->lo_queue);
-	err = loop_reconfigure_limits(lo, arg, false);
+	err = loop_reconfigure_limits(lo, arg);
 	loop_update_dio(lo);
 	blk_mq_unfreeze_queue(lo->lo_queue);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits
  2024-06-11  5:19 ` [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits Christoph Hellwig
@ 2024-06-11  5:54   ` Damien Le Moal
  2024-06-11  8:15   ` Hannes Reinecke
  2024-06-11 19:23   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  5:54 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Simplify loop_reconfigure_limits by always updating the discard limits.
> This adds a little more work to loop_set_block_size, but doesn't change
> the outcome as the discard flag won't change.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits
  2024-06-11  5:19 ` [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits Christoph Hellwig
  2024-06-11  5:54   ` Damien Le Moal
@ 2024-06-11  8:15   ` Hannes Reinecke
  2024-06-11 19:23   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:15 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> Simplify loop_reconfigure_limits by always updating the discard limits.
> This adds a little more work to loop_set_block_size, but doesn't change
> the outcome as the discard flag won't change.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/loop.c | 10 ++++------
>   1 file changed, 4 insertions(+), 6 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits
  2024-06-11  5:19 ` [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits Christoph Hellwig
  2024-06-11  5:54   ` Damien Le Moal
  2024-06-11  8:15   ` Hannes Reinecke
@ 2024-06-11 19:23   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Bart Van Assche @ 2024-06-11 19:23 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/10/24 10:19 PM, Christoph Hellwig wrote:
> Simplify loop_reconfigure_limits by always updating the discard limits.
> This adds a little more work to loop_set_block_size, but doesn't change
> the outcome as the discard flag won't change.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (3 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  5:56   ` Damien Le Moal
                     ` (2 more replies)
  2024-06-11  5:19 ` [PATCH 06/26] loop: also use the default block size from an underlying block device Christoph Hellwig
                   ` (20 subsequent siblings)
  25 siblings, 3 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

The LOOP_CONFIGURE path automatically upgrades the block size to that
of the underlying file for O_DIRECT file descriptors, but the
LOOP_SET_BLOCK_SIZE path does not.  Fix this by lifting the code to
pick the block size into common code.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index c658282454af1b..4f6d8514d19bd6 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -975,10 +975,24 @@ loop_set_status_from_info(struct loop_device *lo,
 	return 0;
 }
 
+static unsigned short loop_default_blocksize(struct loop_device *lo,
+		struct block_device *backing_bdev)
+{
+	/* In case of direct I/O, match underlying block size */
+	if ((lo->lo_backing_file->f_flags & O_DIRECT) && backing_bdev)
+		return bdev_logical_block_size(backing_bdev);
+	return 512;
+}
+
 static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
 {
+	struct file *file = lo->lo_backing_file;
+	struct inode *inode = file->f_mapping->host;
 	struct queue_limits lim;
 
+	if (!bsize)
+		bsize = loop_default_blocksize(lo, inode->i_sb->s_bdev);
+
 	lim = queue_limits_start_update(lo->lo_queue);
 	lim.logical_block_size = bsize;
 	lim.physical_block_size = bsize;
@@ -997,7 +1011,6 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 	int error;
 	loff_t size;
 	bool partscan;
-	unsigned short bsize;
 	bool is_loop;
 
 	if (!file)
@@ -1076,15 +1089,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 	if (!(lo->lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
 		blk_queue_write_cache(lo->lo_queue, true, false);
 
-	if (config->block_size)
-		bsize = config->block_size;
-	else if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev)
-		/* In case of direct I/O, match underlying block size */
-		bsize = bdev_logical_block_size(inode->i_sb->s_bdev);
-	else
-		bsize = 512;
-
-	error = loop_reconfigure_limits(lo, bsize);
+	error = loop_reconfigure_limits(lo, config->block_size);
 	if (WARN_ON_ONCE(error))
 		goto out_unlock;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O
  2024-06-11  5:19 ` [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O Christoph Hellwig
@ 2024-06-11  5:56   ` Damien Le Moal
  2024-06-11  5:59     ` Christoph Hellwig
  2024-06-11  8:16   ` Hannes Reinecke
  2024-06-11 19:27   ` Bart Van Assche
  2 siblings, 1 reply; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  5:56 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> The LOOP_CONFIGURE path automatically upgrades the block size to that
> of the underlying file for O_DIRECT file descriptors, but the
> LOOP_SET_BLOCK_SIZE path does not.  Fix this by lifting the code to
> pick the block size into common code.

s/lock/block in the commit title.

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/block/loop.c | 25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index c658282454af1b..4f6d8514d19bd6 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -975,10 +975,24 @@ loop_set_status_from_info(struct loop_device *lo,
>  	return 0;
>  }
>  
> +static unsigned short loop_default_blocksize(struct loop_device *lo,
> +		struct block_device *backing_bdev)
> +{
> +	/* In case of direct I/O, match underlying block size */
> +	if ((lo->lo_backing_file->f_flags & O_DIRECT) && backing_bdev)
> +		return bdev_logical_block_size(backing_bdev);
> +	return 512;

Nit: SECTOR_SIZE ?

> +}
> +
>  static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
>  {
> +	struct file *file = lo->lo_backing_file;
> +	struct inode *inode = file->f_mapping->host;
>  	struct queue_limits lim;
>  
> +	if (!bsize)
> +		bsize = loop_default_blocksize(lo, inode->i_sb->s_bdev);

If bsize is specified and there is a backing dev used with direct IO, should it
be checked that bsize is a multiple of bdev_logical_block_size(backing_bdev) ?

> +
>  	lim = queue_limits_start_update(lo->lo_queue);
>  	lim.logical_block_size = bsize;
>  	lim.physical_block_size = bsize;
> @@ -997,7 +1011,6 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
>  	int error;
>  	loff_t size;
>  	bool partscan;
> -	unsigned short bsize;
>  	bool is_loop;
>  
>  	if (!file)
> @@ -1076,15 +1089,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
>  	if (!(lo->lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
>  		blk_queue_write_cache(lo->lo_queue, true, false);
>  
> -	if (config->block_size)
> -		bsize = config->block_size;
> -	else if ((lo->lo_backing_file->f_flags & O_DIRECT) && inode->i_sb->s_bdev)
> -		/* In case of direct I/O, match underlying block size */
> -		bsize = bdev_logical_block_size(inode->i_sb->s_bdev);
> -	else
> -		bsize = 512;
> -
> -	error = loop_reconfigure_limits(lo, bsize);
> +	error = loop_reconfigure_limits(lo, config->block_size);
>  	if (WARN_ON_ONCE(error))
>  		goto out_unlock;
>  

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O
  2024-06-11  5:56   ` Damien Le Moal
@ 2024-06-11  5:59     ` Christoph Hellwig
  0 siblings, 0 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:59 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph B??hmwalder, Josef Bacik, Ming Lei, Michael S. Tsirkin,
	Jason Wang, Roger Pau Monn??, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

On Tue, Jun 11, 2024 at 02:56:59PM +0900, Damien Le Moal wrote:
> > +	if (!bsize)
> > +		bsize = loop_default_blocksize(lo, inode->i_sb->s_bdev);
> 
> If bsize is specified and there is a backing dev used with direct IO, should it
> be checked that bsize is a multiple of bdev_logical_block_size(backing_bdev) ?

For direct I/O that check would be useful.  For buffered I/O we can do
read-modify-write cycles.  However this series is already huge and not
primarily about improving the loop driver parameter validation, so
I'll defer this for now.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O
  2024-06-11  5:19 ` [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O Christoph Hellwig
  2024-06-11  5:56   ` Damien Le Moal
@ 2024-06-11  8:16   ` Hannes Reinecke
  2024-06-11 19:27   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> The LOOP_CONFIGURE path automatically upgrades the block size to that
> of the underlying file for O_DIRECT file descriptors, but the
> LOOP_SET_BLOCK_SIZE path does not.  Fix this by lifting the code to
> pick the block size into common code.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/loop.c | 25 +++++++++++++++----------
>   1 file changed, 15 insertions(+), 10 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O
  2024-06-11  5:19 ` [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O Christoph Hellwig
  2024-06-11  5:56   ` Damien Le Moal
  2024-06-11  8:16   ` Hannes Reinecke
@ 2024-06-11 19:27   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Bart Van Assche @ 2024-06-11 19:27 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/10/24 10:19 PM, Christoph Hellwig wrote:
> The LOOP_CONFIGURE path automatically upgrades the block size to that
> of the underlying file for O_DIRECT file descriptors, but the
> LOOP_SET_BLOCK_SIZE path does not.  Fix this by lifting the code to
> pick the block size into common code.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 06/26] loop: also use the default block size from an underlying block device
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (4 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  5:58   ` Damien Le Moal
                     ` (2 more replies)
  2024-06-11  5:19 ` [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits Christoph Hellwig
                   ` (19 subsequent siblings)
  25 siblings, 3 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Fix the code in loop_reconfigure_limits to pick a default block size for
O_DIRECT file descriptors to also work when the loop device sits on top
of a block device and not just on a regular file on a block device based
file system.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 4f6d8514d19bd6..d7cf6bbbfb1b86 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -988,10 +988,16 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
 {
 	struct file *file = lo->lo_backing_file;
 	struct inode *inode = file->f_mapping->host;
+	struct block_device *backing_bdev = NULL;
 	struct queue_limits lim;
 
+	if (S_ISBLK(inode->i_mode))
+		backing_bdev = I_BDEV(inode);
+	else if (inode->i_sb->s_bdev)
+		backing_bdev = inode->i_sb->s_bdev;
+
 	if (!bsize)
-		bsize = loop_default_blocksize(lo, inode->i_sb->s_bdev);
+		bsize = loop_default_blocksize(lo, backing_bdev);
 
 	lim = queue_limits_start_update(lo->lo_queue);
 	lim.logical_block_size = bsize;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 06/26] loop: also use the default block size from an underlying block device
  2024-06-11  5:19 ` [PATCH 06/26] loop: also use the default block size from an underlying block device Christoph Hellwig
@ 2024-06-11  5:58   ` Damien Le Moal
  2024-06-11  5:59     ` Christoph Hellwig
  2024-06-11  8:17   ` Hannes Reinecke
  2024-06-11 19:28   ` Bart Van Assche
  2 siblings, 1 reply; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  5:58 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Fix the code in loop_reconfigure_limits to pick a default block size for
> O_DIRECT file descriptors to also work when the loop device sits on top
> of a block device and not just on a regular file on a block device based
> file system.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/block/loop.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 4f6d8514d19bd6..d7cf6bbbfb1b86 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -988,10 +988,16 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
>  {
>  	struct file *file = lo->lo_backing_file;
>  	struct inode *inode = file->f_mapping->host;
> +	struct block_device *backing_bdev = NULL;
>  	struct queue_limits lim;
>  
> +	if (S_ISBLK(inode->i_mode))
> +		backing_bdev = I_BDEV(inode);
> +	else if (inode->i_sb->s_bdev)
> +		backing_bdev = inode->i_sb->s_bdev;
> +

Why not move this hunk inside the below "if" ? (backing_dev declaration can go
there too).

>  	if (!bsize)
> -		bsize = loop_default_blocksize(lo, inode->i_sb->s_bdev);
> +		bsize = loop_default_blocksize(lo, backing_bdev);
>  
>  	lim = queue_limits_start_update(lo->lo_queue);
>  	lim.logical_block_size = bsize;

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 06/26] loop: also use the default block size from an underlying block device
  2024-06-11  5:58   ` Damien Le Moal
@ 2024-06-11  5:59     ` Christoph Hellwig
  0 siblings, 0 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:59 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph B??hmwalder, Josef Bacik, Ming Lei, Michael S. Tsirkin,
	Jason Wang, Roger Pau Monn??, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

On Tue, Jun 11, 2024 at 02:58:56PM +0900, Damien Le Moal wrote:
> > +	if (S_ISBLK(inode->i_mode))
> > +		backing_bdev = I_BDEV(inode);
> > +	else if (inode->i_sb->s_bdev)
> > +		backing_bdev = inode->i_sb->s_bdev;
> > +
> 
> Why not move this hunk inside the below "if" ? (backing_dev declaration can go
> there too).

Because another use will pop up a bit later :)


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 06/26] loop: also use the default block size from an underlying block device
  2024-06-11  5:19 ` [PATCH 06/26] loop: also use the default block size from an underlying block device Christoph Hellwig
  2024-06-11  5:58   ` Damien Le Moal
@ 2024-06-11  8:17   ` Hannes Reinecke
  2024-06-11 19:28   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:17 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> Fix the code in loop_reconfigure_limits to pick a default block size for
> O_DIRECT file descriptors to also work when the loop device sits on top
> of a block device and not just on a regular file on a block device based
> file system.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/loop.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 06/26] loop: also use the default block size from an underlying block device
  2024-06-11  5:19 ` [PATCH 06/26] loop: also use the default block size from an underlying block device Christoph Hellwig
  2024-06-11  5:58   ` Damien Le Moal
  2024-06-11  8:17   ` Hannes Reinecke
@ 2024-06-11 19:28   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Bart Van Assche @ 2024-06-11 19:28 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/10/24 10:19 PM, Christoph Hellwig wrote:
> Fix the code in loop_reconfigure_limits to pick a default block size for
> O_DIRECT file descriptors to also work when the loop device sits on top
> of a block device and not just on a regular file on a block device based
> file system.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (5 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 06/26] loop: also use the default block size from an underlying block device Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  6:00   ` Damien Le Moal
                     ` (2 more replies)
  2024-06-11  5:19 ` [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode Christoph Hellwig
                   ` (18 subsequent siblings)
  25 siblings, 3 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

This prepares for moving the rotational flag into the queue_limits and
also fixes it for the case where the loop device is backed by a block
device.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/loop.c | 23 ++++-------------------
 1 file changed, 4 insertions(+), 19 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index d7cf6bbbfb1b86..2c4a5eb3a6a7f9 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -916,24 +916,6 @@ static void loop_free_idle_workers_timer(struct timer_list *timer)
 	return loop_free_idle_workers(lo, false);
 }
 
-static void loop_update_rotational(struct loop_device *lo)
-{
-	struct file *file = lo->lo_backing_file;
-	struct inode *file_inode = file->f_mapping->host;
-	struct block_device *file_bdev = file_inode->i_sb->s_bdev;
-	struct request_queue *q = lo->lo_queue;
-	bool nonrot = true;
-
-	/* not all filesystems (e.g. tmpfs) have a sb->s_bdev */
-	if (file_bdev)
-		nonrot = bdev_nonrot(file_bdev);
-
-	if (nonrot)
-		blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
-	else
-		blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
-}
-
 /**
  * loop_set_status_from_info - configure device from loop_info
  * @lo: struct loop_device to configure
@@ -1003,6 +985,10 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
 	lim.logical_block_size = bsize;
 	lim.physical_block_size = bsize;
 	lim.io_min = bsize;
+	if (!backing_bdev || bdev_nonrot(backing_bdev))
+		blk_queue_flag_set(QUEUE_FLAG_NONROT, lo->lo_queue);
+	else
+		blk_queue_flag_clear(QUEUE_FLAG_NONROT, lo->lo_queue);
 	loop_config_discard(lo, &lim);
 	return queue_limits_commit_update(lo->lo_queue, &lim);
 }
@@ -1099,7 +1085,6 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 	if (WARN_ON_ONCE(error))
 		goto out_unlock;
 
-	loop_update_rotational(lo);
 	loop_update_dio(lo);
 	loop_sysfs_init(lo);
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits
  2024-06-11  5:19 ` [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits Christoph Hellwig
@ 2024-06-11  6:00   ` Damien Le Moal
  2024-06-11  8:18   ` Hannes Reinecke
  2024-06-11 19:31   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  6:00 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> This prepares for moving the rotational flag into the queue_limits and
> also fixes it for the case where the loop device is backed by a block
> device.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits
  2024-06-11  5:19 ` [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits Christoph Hellwig
  2024-06-11  6:00   ` Damien Le Moal
@ 2024-06-11  8:18   ` Hannes Reinecke
  2024-06-11 19:31   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:18 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> This prepares for moving the rotational flag into the queue_limits and
> also fixes it for the case where the loop device is backed by a block
> device.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/loop.c | 23 ++++-------------------
>   1 file changed, 4 insertions(+), 19 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits
  2024-06-11  5:19 ` [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits Christoph Hellwig
  2024-06-11  6:00   ` Damien Le Moal
  2024-06-11  8:18   ` Hannes Reinecke
@ 2024-06-11 19:31   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Bart Van Assche @ 2024-06-11 19:31 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/10/24 10:19 PM, Christoph Hellwig wrote:
> This prepares for moving the rotational flag into the queue_limits and
> also fixes it for the case where the loop device is backed by a block
> device.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (6 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  7:26   ` Damien Le Moal
                     ` (4 more replies)
  2024-06-11  5:19 ` [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size Christoph Hellwig
                   ` (17 subsequent siblings)
  25 siblings, 5 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

virtblk_update_cache_mode boils down to a single call to
blk_queue_write_cache.  Remove it in preparation for moving the cache
control flags into the queue_limits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/virtio_blk.c | 13 +++----------
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 2351f411fa4680..378b241911ca87 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1089,14 +1089,6 @@ static int virtblk_get_cache_mode(struct virtio_device *vdev)
 	return writeback;
 }
 
-static void virtblk_update_cache_mode(struct virtio_device *vdev)
-{
-	u8 writeback = virtblk_get_cache_mode(vdev);
-	struct virtio_blk *vblk = vdev->priv;
-
-	blk_queue_write_cache(vblk->disk->queue, writeback, false);
-}
-
 static const char *const virtblk_cache_types[] = {
 	"write through", "write back"
 };
@@ -1116,7 +1108,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 		return i;
 
 	virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
-	virtblk_update_cache_mode(vdev);
+	blk_queue_write_cache(disk->queue, virtblk_get_cache_mode(vdev), false);
 	return count;
 }
 
@@ -1528,7 +1520,8 @@ static int virtblk_probe(struct virtio_device *vdev)
 	vblk->index = index;
 
 	/* configure queue flush support */
-	virtblk_update_cache_mode(vdev);
+	blk_queue_write_cache(vblk->disk->queue, virtblk_get_cache_mode(vdev),
+			false);
 
 	/* If disk is read-only in the host, the guest should obey */
 	if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode
  2024-06-11  5:19 ` [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode Christoph Hellwig
@ 2024-06-11  7:26   ` Damien Le Moal
  2024-06-11  8:19   ` Hannes Reinecke
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  7:26 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> virtblk_update_cache_mode boils down to a single call to
> blk_queue_write_cache.  Remove it in preparation for moving the cache
> control flags into the queue_limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode
  2024-06-11  5:19 ` [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode Christoph Hellwig
  2024-06-11  7:26   ` Damien Le Moal
@ 2024-06-11  8:19   ` Hannes Reinecke
  2024-06-11 11:49   ` Johannes Thumshirn
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:19 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> virtblk_update_cache_mode boils down to a single call to
> blk_queue_write_cache.  Remove it in preparation for moving the cache
> control flags into the queue_limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/virtio_blk.c | 13 +++----------
>   1 file changed, 3 insertions(+), 10 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode
  2024-06-11  5:19 ` [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode Christoph Hellwig
  2024-06-11  7:26   ` Damien Le Moal
  2024-06-11  8:19   ` Hannes Reinecke
@ 2024-06-11 11:49   ` Johannes Thumshirn
  2024-06-11 15:43   ` Stefan Hajnoczi
  2024-06-11 19:32   ` Bart Van Assche
  4 siblings, 0 replies; 104+ messages in thread
From: Johannes Thumshirn @ 2024-06-11 11:49 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen,
	linux-m68k@lists.linux-m68k.org, linux-um@lists.infradead.org,
	drbd-dev@lists.linbit.com, nbd@other.debian.org,
	linuxppc-dev@lists.ozlabs.org, ceph-devel@vger.kernel.org,
	virtualization@lists.linux.dev, xen-devel@lists.xenproject.org,
	linux-bcache@vger.kernel.org, dm-devel@lists.linux.dev,
	linux-raid@vger.kernel.org, linux-mmc@vger.kernel.org,
	linux-mtd@lists.infradead.org, nvdimm@lists.linux.dev,
	linux-nvme@lists.infradead.org, linux-s390@vger.kernel.org,
	linux-scsi@vger.kernel.org, linux-block@vger.kernel.org

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode
  2024-06-11  5:19 ` [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode Christoph Hellwig
                     ` (2 preceding siblings ...)
  2024-06-11 11:49   ` Johannes Thumshirn
@ 2024-06-11 15:43   ` Stefan Hajnoczi
  2024-06-11 19:32   ` Bart Van Assche
  4 siblings, 0 replies; 104+ messages in thread
From: Stefan Hajnoczi @ 2024-06-11 15:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Geert Uytterhoeven, Richard Weinberger,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
	Josef Bacik, Ming Lei, Michael S. Tsirkin, Jason Wang,
	Roger Pau Monné, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

[-- Attachment #1: Type: text/plain, Size: 458 bytes --]

On Tue, Jun 11, 2024 at 07:19:08AM +0200, Christoph Hellwig wrote:
> virtblk_update_cache_mode boils down to a single call to
> blk_queue_write_cache.  Remove it in preparation for moving the cache
> control flags into the queue_limits.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/block/virtio_blk.c | 13 +++----------
>  1 file changed, 3 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode
  2024-06-11  5:19 ` [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode Christoph Hellwig
                     ` (3 preceding siblings ...)
  2024-06-11 15:43   ` Stefan Hajnoczi
@ 2024-06-11 19:32   ` Bart Van Assche
  4 siblings, 0 replies; 104+ messages in thread
From: Bart Van Assche @ 2024-06-11 19:32 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/10/24 10:19 PM, Christoph Hellwig wrote:
> virtblk_update_cache_mode boils down to a single call to
> blk_queue_write_cache.  Remove it in preparation for moving the cache
> control flags into the queue_limits.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (7 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  7:28   ` Damien Le Moal
                     ` (3 more replies)
  2024-06-11  5:19 ` [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail Christoph Hellwig
                   ` (16 subsequent siblings)
  25 siblings, 4 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move setting the cache control flags in nbd in preparation for moving
these flags into the queue_limits structure.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/nbd.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index ad887d614d5b3f..44b8c671921e5c 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -342,6 +342,12 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
 		lim.max_hw_discard_sectors = UINT_MAX;
 	else
 		lim.max_hw_discard_sectors = 0;
+	if (!(nbd->config->flags & NBD_FLAG_SEND_FLUSH))
+		blk_queue_write_cache(nbd->disk->queue, false, false);
+	else if (nbd->config->flags & NBD_FLAG_SEND_FUA)
+		blk_queue_write_cache(nbd->disk->queue, true, true);
+	else
+		blk_queue_write_cache(nbd->disk->queue, true, false);
 	lim.logical_block_size = blksize;
 	lim.physical_block_size = blksize;
 	error = queue_limits_commit_update(nbd->disk->queue, &lim);
@@ -1286,19 +1292,10 @@ static void nbd_bdev_reset(struct nbd_device *nbd)
 
 static void nbd_parse_flags(struct nbd_device *nbd)
 {
-	struct nbd_config *config = nbd->config;
-	if (config->flags & NBD_FLAG_READ_ONLY)
+	if (nbd->config->flags & NBD_FLAG_READ_ONLY)
 		set_disk_ro(nbd->disk, true);
 	else
 		set_disk_ro(nbd->disk, false);
-	if (config->flags & NBD_FLAG_SEND_FLUSH) {
-		if (config->flags & NBD_FLAG_SEND_FUA)
-			blk_queue_write_cache(nbd->disk->queue, true, true);
-		else
-			blk_queue_write_cache(nbd->disk->queue, true, false);
-	}
-	else
-		blk_queue_write_cache(nbd->disk->queue, false, false);
 }
 
 static void send_disconnects(struct nbd_device *nbd)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size
  2024-06-11  5:19 ` [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size Christoph Hellwig
@ 2024-06-11  7:28   ` Damien Le Moal
  2024-06-11  8:20   ` Hannes Reinecke
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  7:28 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move setting the cache control flags in nbd in preparation for moving
> these flags into the queue_limits structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size
  2024-06-11  5:19 ` [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size Christoph Hellwig
  2024-06-11  7:28   ` Damien Le Moal
@ 2024-06-11  8:20   ` Hannes Reinecke
  2024-06-11 16:50   ` Josef Bacik
  2024-06-11 19:34   ` Bart Van Assche
  3 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:20 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> Move setting the cache control flags in nbd in preparation for moving
> these flags into the queue_limits structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/nbd.c | 17 +++++++----------
>   1 file changed, 7 insertions(+), 10 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size
  2024-06-11  5:19 ` [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size Christoph Hellwig
  2024-06-11  7:28   ` Damien Le Moal
  2024-06-11  8:20   ` Hannes Reinecke
@ 2024-06-11 16:50   ` Josef Bacik
  2024-06-11 19:34   ` Bart Van Assche
  3 siblings, 0 replies; 104+ messages in thread
From: Josef Bacik @ 2024-06-11 16:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Geert Uytterhoeven, Richard Weinberger,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
	Ming Lei, Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Tue, Jun 11, 2024 at 07:19:09AM +0200, Christoph Hellwig wrote:
> Move setting the cache control flags in nbd in preparation for moving
> these flags into the queue_limits structure.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size
  2024-06-11  5:19 ` [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size Christoph Hellwig
                     ` (2 preceding siblings ...)
  2024-06-11 16:50   ` Josef Bacik
@ 2024-06-11 19:34   ` Bart Van Assche
  3 siblings, 0 replies; 104+ messages in thread
From: Bart Van Assche @ 2024-06-11 19:34 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/10/24 10:19 PM, Christoph Hellwig wrote:
> Move setting the cache control flags in nbd in preparation for moving
> these flags into the queue_limits structure.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (8 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  7:30   ` Damien Le Moal
                     ` (2 more replies)
  2024-06-11  5:19 ` [PATCH 11/26] block: freeze the queue in queue_attr_store Christoph Hellwig
                   ` (15 subsequent siblings)
  25 siblings, 3 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

blkfront always had a robust negotiation protocol for detecting a write
cache.  Stop simply disabling cache flushes when they fail as that is
a grave error.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/xen-blkfront.c | 29 +++++++++--------------------
 1 file changed, 9 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9b4ec3e4908cce..9794ac2d3299d1 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -982,18 +982,6 @@ static const char *flush_info(struct blkfront_info *info)
 		return "barrier or flush: disabled;";
 }
 
-static void xlvbd_flush(struct blkfront_info *info)
-{
-	blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
-			      info->feature_fua ? true : false);
-	pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
-		info->gd->disk_name, flush_info(info),
-		"persistent grants:", info->feature_persistent ?
-		"enabled;" : "disabled;", "indirect descriptors:",
-		info->max_indirect_segments ? "enabled;" : "disabled;",
-		"bounce buffer:", info->bounce ? "enabled" : "disabled;");
-}
-
 static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
 {
 	int major;
@@ -1162,7 +1150,15 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 	info->sector_size = sector_size;
 	info->physical_sector_size = physical_sector_size;
 
-	xlvbd_flush(info);
+	blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
+			      info->feature_fua ? true : false);
+
+	pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
+		info->gd->disk_name, flush_info(info),
+		"persistent grants:", info->feature_persistent ?
+		"enabled;" : "disabled;", "indirect descriptors:",
+		info->max_indirect_segments ? "enabled;" : "disabled;",
+		"bounce buffer:", info->bounce ? "enabled" : "disabled;");
 
 	if (info->vdisk_info & VDISK_READONLY)
 		set_disk_ro(gd, 1);
@@ -1622,13 +1618,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 				       info->gd->disk_name, op_name(bret.operation));
 				blkif_req(req)->error = BLK_STS_NOTSUPP;
 			}
-			if (unlikely(blkif_req(req)->error)) {
-				if (blkif_req(req)->error == BLK_STS_NOTSUPP)
-					blkif_req(req)->error = BLK_STS_OK;
-				info->feature_fua = 0;
-				info->feature_flush = 0;
-				xlvbd_flush(info);
-			}
 			fallthrough;
 		case BLKIF_OP_READ:
 		case BLKIF_OP_WRITE:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
  2024-06-11  5:19 ` [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail Christoph Hellwig
@ 2024-06-11  7:30   ` Damien Le Moal
  2024-06-12  4:50     ` Christoph Hellwig
  2024-06-11  8:21   ` Hannes Reinecke
  2024-06-12  8:01   ` Roger Pau Monné
  2 siblings, 1 reply; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  7:30 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> blkfront always had a robust negotiation protocol for detecting a write
> cache.  Stop simply disabling cache flushes when they fail as that is
> a grave error.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good to me but maybe mention that removal of xlvbd_flush() as well ?
And it feels like the "stop disabling cache flushes when they fail" part should
be a fix patch sent separately...

Anyway, for both parts, feel free to add:

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

> ---
>  drivers/block/xen-blkfront.c | 29 +++++++++--------------------
>  1 file changed, 9 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 9b4ec3e4908cce..9794ac2d3299d1 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -982,18 +982,6 @@ static const char *flush_info(struct blkfront_info *info)
>  		return "barrier or flush: disabled;";
>  }
>  
> -static void xlvbd_flush(struct blkfront_info *info)
> -{
> -	blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
> -			      info->feature_fua ? true : false);
> -	pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
> -		info->gd->disk_name, flush_info(info),
> -		"persistent grants:", info->feature_persistent ?
> -		"enabled;" : "disabled;", "indirect descriptors:",
> -		info->max_indirect_segments ? "enabled;" : "disabled;",
> -		"bounce buffer:", info->bounce ? "enabled" : "disabled;");
> -}
> -
>  static int xen_translate_vdev(int vdevice, int *minor, unsigned int *offset)
>  {
>  	int major;
> @@ -1162,7 +1150,15 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>  	info->sector_size = sector_size;
>  	info->physical_sector_size = physical_sector_size;
>  
> -	xlvbd_flush(info);
> +	blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
> +			      info->feature_fua ? true : false);
> +
> +	pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
> +		info->gd->disk_name, flush_info(info),
> +		"persistent grants:", info->feature_persistent ?
> +		"enabled;" : "disabled;", "indirect descriptors:",
> +		info->max_indirect_segments ? "enabled;" : "disabled;",
> +		"bounce buffer:", info->bounce ? "enabled" : "disabled;");
>  
>  	if (info->vdisk_info & VDISK_READONLY)
>  		set_disk_ro(gd, 1);
> @@ -1622,13 +1618,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
>  				       info->gd->disk_name, op_name(bret.operation));
>  				blkif_req(req)->error = BLK_STS_NOTSUPP;
>  			}
> -			if (unlikely(blkif_req(req)->error)) {
> -				if (blkif_req(req)->error == BLK_STS_NOTSUPP)
> -					blkif_req(req)->error = BLK_STS_OK;
> -				info->feature_fua = 0;
> -				info->feature_flush = 0;
> -				xlvbd_flush(info);
> -			}
>  			fallthrough;
>  		case BLKIF_OP_READ:
>  		case BLKIF_OP_WRITE:

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
  2024-06-11  7:30   ` Damien Le Moal
@ 2024-06-12  4:50     ` Christoph Hellwig
  0 siblings, 0 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-12  4:50 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Tue, Jun 11, 2024 at 04:30:39PM +0900, Damien Le Moal wrote:
> On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> > blkfront always had a robust negotiation protocol for detecting a write
> > cache.  Stop simply disabling cache flushes when they fail as that is
> > a grave error.
> > 
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> 
> Looks good to me but maybe mention that removal of xlvbd_flush() as well ?
> And it feels like the "stop disabling cache flushes when they fail" part should
> be a fix patch sent separately...

I'll move the patch to the front of the series to get more attention from
the maintainers, but otherwise the xlvbd_flush remova lis the really
trivial part here.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
  2024-06-11  5:19 ` [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail Christoph Hellwig
  2024-06-11  7:30   ` Damien Le Moal
@ 2024-06-11  8:21   ` Hannes Reinecke
  2024-06-12  8:01   ` Roger Pau Monné
  2 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:21 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> blkfront always had a robust negotiation protocol for detecting a write
> cache.  Stop simply disabling cache flushes when they fail as that is
> a grave error.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   drivers/block/xen-blkfront.c | 29 +++++++++--------------------
>   1 file changed, 9 insertions(+), 20 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
  2024-06-11  5:19 ` [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail Christoph Hellwig
  2024-06-11  7:30   ` Damien Le Moal
  2024-06-11  8:21   ` Hannes Reinecke
@ 2024-06-12  8:01   ` Roger Pau Monné
  2024-06-12 15:00     ` Christoph Hellwig
  2 siblings, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2024-06-12  8:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Geert Uytterhoeven, Richard Weinberger,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
	Josef Bacik, Ming Lei, Michael S. Tsirkin, Jason Wang,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Tue, Jun 11, 2024 at 07:19:10AM +0200, Christoph Hellwig wrote:
> blkfront always had a robust negotiation protocol for detecting a write
> cache.  Stop simply disabling cache flushes when they fail as that is
> a grave error.

It's my understanding the current code attempts to cover up for the
lack of guarantees the feature itself provides:

 * feature-barrier
 *      Values:         0/1 (boolean)
 *      Default Value:  0
 *
 *      A value of "1" indicates that the backend can process requests
 *      containing the BLKIF_OP_WRITE_BARRIER request opcode.  Requests
 *      of this type may still be returned at any time with the
 *      BLKIF_RSP_EOPNOTSUPP result code.
 *
 * feature-flush-cache
 *      Values:         0/1 (boolean)
 *      Default Value:  0
 *
 *      A value of "1" indicates that the backend can process requests
 *      containing the BLKIF_OP_FLUSH_DISKCACHE request opcode.  Requests
 *      of this type may still be returned at any time with the
 *      BLKIF_RSP_EOPNOTSUPP result code.

So even when the feature is exposed, the backend might return
EOPNOTSUPP for the flush/barrier operations.

Such failure is tied on whether the underlying blkback storage
supports REQ_OP_WRITE with REQ_PREFLUSH operation.  blkback will
expose "feature-barrier" and/or "feature-flush-cache" without knowing
whether the underlying backend supports those operations, hence the
weird fallback in blkfront.

I'm unsure whether lack of REQ_PREFLUSH support is not something that
we should worry about, it seems like it was when the code was
introduced, but that's > 10y ago.

Overall blkback should ensure that REQ_PREFLUSH is supported before
exposing "feature-barrier" or "feature-flush-cache", as then the
exposed features would really match what the underlying backend
supports (rather than the commands blkback knows about).

Thanks, Roger.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
  2024-06-12  8:01   ` Roger Pau Monné
@ 2024-06-12 15:00     ` Christoph Hellwig
  2024-06-12 15:56       ` Roger Pau Monné
  0 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-12 15:00 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

On Wed, Jun 12, 2024 at 10:01:18AM +0200, Roger Pau Monné wrote:
> On Tue, Jun 11, 2024 at 07:19:10AM +0200, Christoph Hellwig wrote:
> > blkfront always had a robust negotiation protocol for detecting a write
> > cache.  Stop simply disabling cache flushes when they fail as that is
> > a grave error.
> 
> It's my understanding the current code attempts to cover up for the
> lack of guarantees the feature itself provides:

> So even when the feature is exposed, the backend might return
> EOPNOTSUPP for the flush/barrier operations.

How is this supposed to work?  I mean in the worst case we could
just immediately complete the flush requests in the driver, but
we're really lying to any upper layer.

> Such failure is tied on whether the underlying blkback storage
> supports REQ_OP_WRITE with REQ_PREFLUSH operation.  blkback will
> expose "feature-barrier" and/or "feature-flush-cache" without knowing
> whether the underlying backend supports those operations, hence the
> weird fallback in blkfront.

If we are just talking about the Linux blkback driver (I know there
probably are a few other implementations) it won't every do that.
I see it has code to do so, but the Linux block layer doesn't
allow the flush operation to randomly fail if it was previously
advertised.  Note that even blkfront conforms to this as it fixes
up the return value when it gets this notsupp error to ok.

> Overall blkback should ensure that REQ_PREFLUSH is supported before
> exposing "feature-barrier" or "feature-flush-cache", as then the
> exposed features would really match what the underlying backend
> supports (rather than the commands blkback knows about).

Yes.  The in-tree xen-blkback does that, but even without that the
Linux block layer actually makes sure flushes sent by upper layers
always succeed even when not supported.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
  2024-06-12 15:00     ` Christoph Hellwig
@ 2024-06-12 15:56       ` Roger Pau Monné
  2024-06-13 14:05         ` Christoph Hellwig
  0 siblings, 1 reply; 104+ messages in thread
From: Roger Pau Monné @ 2024-06-12 15:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Geert Uytterhoeven, Richard Weinberger,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
	Josef Bacik, Ming Lei, Michael S. Tsirkin, Jason Wang,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Wed, Jun 12, 2024 at 05:00:30PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 12, 2024 at 10:01:18AM +0200, Roger Pau Monné wrote:
> > On Tue, Jun 11, 2024 at 07:19:10AM +0200, Christoph Hellwig wrote:
> > > blkfront always had a robust negotiation protocol for detecting a write
> > > cache.  Stop simply disabling cache flushes when they fail as that is
> > > a grave error.
> > 
> > It's my understanding the current code attempts to cover up for the
> > lack of guarantees the feature itself provides:
> 
> > So even when the feature is exposed, the backend might return
> > EOPNOTSUPP for the flush/barrier operations.
> 
> How is this supposed to work?  I mean in the worst case we could
> just immediately complete the flush requests in the driver, but
> we're really lying to any upper layer.

Right.  AFAICT advertising "feature-barrier" and/or
"feature-flush-cache" could be done based on whether blkback
understand those commands, not on whether the underlying storage
supports the equivalent of them.

Worst case we can print a warning message once about the underlying
storage failing to complete flush/barrier requests, and that data
integrity might not be guaranteed going forward, and not propagate the
error to the upper layer?

What would be the consequence of propagating a flush error to the
upper layers?

> > Such failure is tied on whether the underlying blkback storage
> > supports REQ_OP_WRITE with REQ_PREFLUSH operation.  blkback will
> > expose "feature-barrier" and/or "feature-flush-cache" without knowing
> > whether the underlying backend supports those operations, hence the
> > weird fallback in blkfront.
> 
> If we are just talking about the Linux blkback driver (I know there
> probably are a few other implementations) it won't every do that.
> I see it has code to do so, but the Linux block layer doesn't
> allow the flush operation to randomly fail if it was previously
> advertised.  Note that even blkfront conforms to this as it fixes
> up the return value when it gets this notsupp error to ok.

Yes, I'm afraid it's impossible to know what the multiple incarnations
of all the scattered blkback implementations possibly do (FreeBSD,
NetBSD, QEMU and blktap at least I know of).

> > Overall blkback should ensure that REQ_PREFLUSH is supported before
> > exposing "feature-barrier" or "feature-flush-cache", as then the
> > exposed features would really match what the underlying backend
> > supports (rather than the commands blkback knows about).
> 
> Yes.  The in-tree xen-blkback does that, but even without that the
> Linux block layer actually makes sure flushes sent by upper layers
> always succeed even when not supported.

Given the description of the feature in the blkif header, I'm afraid
we cannot guarantee that seeing the feature exposed implies barrier or
flush support, since the request could fail at any time (or even from
the start of the disk attachment) and it would still sadly be a correct
implementation given the description of the options.

Thanks, Roger.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
  2024-06-12 15:56       ` Roger Pau Monné
@ 2024-06-13 14:05         ` Christoph Hellwig
  2024-06-14  7:56           ` Roger Pau Monné
  0 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-13 14:05 UTC (permalink / raw)
  To: Roger Pau Monné
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

On Wed, Jun 12, 2024 at 05:56:15PM +0200, Roger Pau Monné wrote:
> Right.  AFAICT advertising "feature-barrier" and/or
> "feature-flush-cache" could be done based on whether blkback
> understand those commands, not on whether the underlying storage
> supports the equivalent of them.
> 
> Worst case we can print a warning message once about the underlying
> storage failing to complete flush/barrier requests, and that data
> integrity might not be guaranteed going forward, and not propagate the
> error to the upper layer?
> 
> What would be the consequence of propagating a flush error to the
> upper layers?

If you propage the error to the upper layer you will generate an
I/O error there, which usually leads to a file system shutdown.

> Given the description of the feature in the blkif header, I'm afraid
> we cannot guarantee that seeing the feature exposed implies barrier or
> flush support, since the request could fail at any time (or even from
> the start of the disk attachment) and it would still sadly be a correct
> implementation given the description of the options.

Well, then we could do something like the patch below, which keeps
the existing behavior, but insolates the block layer from it and
removes the only user of blk_queue_write_cache from interrupt
context:

---
From e6e82c769ab209a77302994c3829cf6ff7a595b8 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch@lst.de>
Date: Thu, 30 May 2024 08:58:52 +0200
Subject: xen-blkfront: don't disable cache flushes when they fail

blkfront always had a robust negotiation protocol for detecting a write
cache.  Stop simply disabling cache flushes in the block layer as the
flags handling is moving to the atomic queue limits API that needs
user context to freeze the queue for that.  Instead handle the case
of the feature flags cleared inside of blkfront.  This removes old
debug code to check for such a mismatch which was previously impossible
to hit, including the check for passthrough requests that blkfront
never used to start with.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/block/xen-blkfront.c | 44 +++++++++++++++++++-----------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9b4ec3e4908cce..e2c92d5095ff17 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -788,6 +788,14 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 			 * A barrier request a superset of FUA, so we can
 			 * implement it the same way.  (It's also a FLUSH+FUA,
 			 * since it is guaranteed ordered WRT previous writes.)
+			 *
+			 * Note that can end up here with a FUA write and the
+			 * flags cleared.  This happens when the flag was
+			 * run-time disabled and raced with I/O submission in
+			 * the block layer.  We submit it as a normal write
+			 * here.  A pure flush should never end up here with
+			 * the flags cleared as they are completed earlier for
+			 * the !feature_flush case.
 			 */
 			if (info->feature_flush && info->feature_fua)
 				ring_req->operation =
@@ -795,8 +803,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
 			else if (info->feature_flush)
 				ring_req->operation =
 					BLKIF_OP_FLUSH_DISKCACHE;
-			else
-				ring_req->operation = 0;
 		}
 		ring_req->u.rw.nr_segments = num_grant;
 		if (unlikely(require_extra_req)) {
@@ -887,16 +893,6 @@ static inline void flush_requests(struct blkfront_ring_info *rinfo)
 		notify_remote_via_irq(rinfo->irq);
 }
 
-static inline bool blkif_request_flush_invalid(struct request *req,
-					       struct blkfront_info *info)
-{
-	return (blk_rq_is_passthrough(req) ||
-		((req_op(req) == REQ_OP_FLUSH) &&
-		 !info->feature_flush) ||
-		((req->cmd_flags & REQ_FUA) &&
-		 !info->feature_fua));
-}
-
 static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 			  const struct blk_mq_queue_data *qd)
 {
@@ -908,23 +904,30 @@ static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
 	rinfo = get_rinfo(info, qid);
 	blk_mq_start_request(qd->rq);
 	spin_lock_irqsave(&rinfo->ring_lock, flags);
-	if (RING_FULL(&rinfo->ring))
-		goto out_busy;
 
-	if (blkif_request_flush_invalid(qd->rq, rinfo->dev_info))
-		goto out_err;
+	/*
+	 * Check if the backend actually supports flushes.
+	 *
+	 * While the block layer won't send us flushes if we don't claim to
+	 * support them, the Xen protocol allows the backend to revoke support
+	 * at any time.  That is of course a really bad idea and dangerous, but
+	 * has been allowed for 10+ years.  In that case we simply clear the
+	 * flags, and directly return here for an empty flush and ignore the
+	 * FUA flag later on.
+	 */
+	if (unlikely(req_op(qd->rq) == REQ_OP_FLUSH && !info->feature_flush))
+		goto out;
 
+	if (RING_FULL(&rinfo->ring))
+		goto out_busy;
 	if (blkif_queue_request(qd->rq, rinfo))
 		goto out_busy;
 
 	flush_requests(rinfo);
+out:
 	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
 	return BLK_STS_OK;
 
-out_err:
-	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
-	return BLK_STS_IOERR;
-
 out_busy:
 	blk_mq_stop_hw_queue(hctx);
 	spin_unlock_irqrestore(&rinfo->ring_lock, flags);
@@ -1627,7 +1630,6 @@ static irqreturn_t blkif_interrupt(int irq, void *dev_id)
 					blkif_req(req)->error = BLK_STS_OK;
 				info->feature_fua = 0;
 				info->feature_flush = 0;
-				xlvbd_flush(info);
 			}
 			fallthrough;
 		case BLKIF_OP_READ:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail
  2024-06-13 14:05         ` Christoph Hellwig
@ 2024-06-14  7:56           ` Roger Pau Monné
  0 siblings, 0 replies; 104+ messages in thread
From: Roger Pau Monné @ 2024-06-14  7:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Geert Uytterhoeven, Richard Weinberger,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
	Josef Bacik, Ming Lei, Michael S. Tsirkin, Jason Wang,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Thu, Jun 13, 2024 at 04:05:08PM +0200, Christoph Hellwig wrote:
> On Wed, Jun 12, 2024 at 05:56:15PM +0200, Roger Pau Monné wrote:
> > Right.  AFAICT advertising "feature-barrier" and/or
> > "feature-flush-cache" could be done based on whether blkback
> > understand those commands, not on whether the underlying storage
> > supports the equivalent of them.
> > 
> > Worst case we can print a warning message once about the underlying
> > storage failing to complete flush/barrier requests, and that data
> > integrity might not be guaranteed going forward, and not propagate the
> > error to the upper layer?
> > 
> > What would be the consequence of propagating a flush error to the
> > upper layers?
> 
> If you propage the error to the upper layer you will generate an
> I/O error there, which usually leads to a file system shutdown.
> 
> > Given the description of the feature in the blkif header, I'm afraid
> > we cannot guarantee that seeing the feature exposed implies barrier or
> > flush support, since the request could fail at any time (or even from
> > the start of the disk attachment) and it would still sadly be a correct
> > implementation given the description of the options.
> 
> Well, then we could do something like the patch below, which keeps
> the existing behavior, but insolates the block layer from it and
> removes the only user of blk_queue_write_cache from interrupt
> context:

LGTM, I'm not sure there's much else we can do.

> ---
> From e6e82c769ab209a77302994c3829cf6ff7a595b8 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig <hch@lst.de>
> Date: Thu, 30 May 2024 08:58:52 +0200
> Subject: xen-blkfront: don't disable cache flushes when they fail
> 
> blkfront always had a robust negotiation protocol for detecting a write
> cache.  Stop simply disabling cache flushes in the block layer as the
> flags handling is moving to the atomic queue limits API that needs
> user context to freeze the queue for that.  Instead handle the case
> of the feature flags cleared inside of blkfront.  This removes old
> debug code to check for such a mismatch which was previously impossible
> to hit, including the check for passthrough requests that blkfront
> never used to start with.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  drivers/block/xen-blkfront.c | 44 +++++++++++++++++++-----------------
>  1 file changed, 23 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 9b4ec3e4908cce..e2c92d5095ff17 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -788,6 +788,14 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>  			 * A barrier request a superset of FUA, so we can
>  			 * implement it the same way.  (It's also a FLUSH+FUA,
>  			 * since it is guaranteed ordered WRT previous writes.)
> +			 *
> +			 * Note that can end up here with a FUA write and the
> +			 * flags cleared.  This happens when the flag was
> +			 * run-time disabled and raced with I/O submission in
> +			 * the block layer.  We submit it as a normal write

Since blkfront no longer signals that FUA is no longer available for the
device, getting a request with FUA is not actually a race I think?

> +			 * here.  A pure flush should never end up here with
> +			 * the flags cleared as they are completed earlier for
> +			 * the !feature_flush case.
>  			 */
>  			if (info->feature_flush && info->feature_fua)
>  				ring_req->operation =
> @@ -795,8 +803,6 @@ static int blkif_queue_rw_req(struct request *req, struct blkfront_ring_info *ri
>  			else if (info->feature_flush)
>  				ring_req->operation =
>  					BLKIF_OP_FLUSH_DISKCACHE;
> -			else
> -				ring_req->operation = 0;
>  		}
>  		ring_req->u.rw.nr_segments = num_grant;
>  		if (unlikely(require_extra_req)) {
> @@ -887,16 +893,6 @@ static inline void flush_requests(struct blkfront_ring_info *rinfo)
>  		notify_remote_via_irq(rinfo->irq);
>  }
>  
> -static inline bool blkif_request_flush_invalid(struct request *req,
> -					       struct blkfront_info *info)
> -{
> -	return (blk_rq_is_passthrough(req) ||
> -		((req_op(req) == REQ_OP_FLUSH) &&
> -		 !info->feature_flush) ||
> -		((req->cmd_flags & REQ_FUA) &&
> -		 !info->feature_fua));
> -}
> -
>  static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
>  			  const struct blk_mq_queue_data *qd)
>  {
> @@ -908,23 +904,30 @@ static blk_status_t blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
>  	rinfo = get_rinfo(info, qid);
>  	blk_mq_start_request(qd->rq);
>  	spin_lock_irqsave(&rinfo->ring_lock, flags);
> -	if (RING_FULL(&rinfo->ring))
> -		goto out_busy;
>  
> -	if (blkif_request_flush_invalid(qd->rq, rinfo->dev_info))
> -		goto out_err;
> +	/*
> +	 * Check if the backend actually supports flushes.
> +	 *
> +	 * While the block layer won't send us flushes if we don't claim to
> +	 * support them, the Xen protocol allows the backend to revoke support
> +	 * at any time.  That is of course a really bad idea and dangerous, but
> +	 * has been allowed for 10+ years.  In that case we simply clear the
> +	 * flags, and directly return here for an empty flush and ignore the
> +	 * FUA flag later on.
> +	 */
> +	if (unlikely(req_op(qd->rq) == REQ_OP_FLUSH && !info->feature_flush))
> +		goto out;

Don't you need to complete the request here?

Thanks, Roger.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 11/26] block: freeze the queue in queue_attr_store
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (9 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  7:32   ` Damien Le Moal
                     ` (2 more replies)
  2024-06-11  5:19 ` [PATCH 12/26] block: remove blk_flush_policy Christoph Hellwig
                   ` (14 subsequent siblings)
  25 siblings, 3 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

queue_attr_store updates attributes used to control generating I/O, and
can cause malformed bios if changed with I/O in flight.  Freeze the queue
in common code instead of adding it to almost every attribute.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq.c    | 5 +++--
 block/blk-sysfs.c | 9 ++-------
 2 files changed, 5 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 0d4cd39c3d25da..58b0d6c7cc34d6 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4631,13 +4631,15 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	int ret;
 	unsigned long i;
 
+	if (WARN_ON_ONCE(!q->mq_freeze_depth))
+		return -EINVAL;
+
 	if (!set)
 		return -EINVAL;
 
 	if (q->nr_requests == nr)
 		return 0;
 
-	blk_mq_freeze_queue(q);
 	blk_mq_quiesce_queue(q);
 
 	ret = 0;
@@ -4671,7 +4673,6 @@ int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr)
 	}
 
 	blk_mq_unquiesce_queue(q);
-	blk_mq_unfreeze_queue(q);
 
 	return ret;
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f0f9314ab65c61..5c787965b7d09e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -189,12 +189,9 @@ static ssize_t queue_discard_max_store(struct request_queue *q,
 	if ((max_discard_bytes >> SECTOR_SHIFT) > UINT_MAX)
 		return -EINVAL;
 
-	blk_mq_freeze_queue(q);
 	lim = queue_limits_start_update(q);
 	lim.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
 	err = queue_limits_commit_update(q, &lim);
-	blk_mq_unfreeze_queue(q);
-
 	if (err)
 		return err;
 	return ret;
@@ -241,11 +238,9 @@ queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
 	if (ret < 0)
 		return ret;
 
-	blk_mq_freeze_queue(q);
 	lim = queue_limits_start_update(q);
 	lim.max_user_sectors = max_sectors_kb << 1;
 	err = queue_limits_commit_update(q, &lim);
-	blk_mq_unfreeze_queue(q);
 	if (err)
 		return err;
 	return ret;
@@ -585,13 +580,11 @@ static ssize_t queue_wb_lat_store(struct request_queue *q, const char *page,
 	 * ends up either enabling or disabling wbt completely. We can't
 	 * have IO inflight if that happens.
 	 */
-	blk_mq_freeze_queue(q);
 	blk_mq_quiesce_queue(q);
 
 	wbt_set_min_lat(q, val);
 
 	blk_mq_unquiesce_queue(q);
-	blk_mq_unfreeze_queue(q);
 
 	return count;
 }
@@ -722,9 +715,11 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
 	if (!entry->store)
 		return -EIO;
 
+	blk_mq_freeze_queue(q);
 	mutex_lock(&q->sysfs_lock);
 	res = entry->store(q, page, length);
 	mutex_unlock(&q->sysfs_lock);
+	blk_mq_unfreeze_queue(q);
 	return res;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 11/26] block: freeze the queue in queue_attr_store
  2024-06-11  5:19 ` [PATCH 11/26] block: freeze the queue in queue_attr_store Christoph Hellwig
@ 2024-06-11  7:32   ` Damien Le Moal
  2024-06-11  8:22   ` Hannes Reinecke
  2024-06-11 19:36   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  7:32 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> queue_attr_store updates attributes used to control generating I/O, and
> can cause malformed bios if changed with I/O in flight.  Freeze the queue
> in common code instead of adding it to almost every attribute.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 11/26] block: freeze the queue in queue_attr_store
  2024-06-11  5:19 ` [PATCH 11/26] block: freeze the queue in queue_attr_store Christoph Hellwig
  2024-06-11  7:32   ` Damien Le Moal
@ 2024-06-11  8:22   ` Hannes Reinecke
  2024-06-11 19:36   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:22 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> queue_attr_store updates attributes used to control generating I/O, and
> can cause malformed bios if changed with I/O in flight.  Freeze the queue
> in common code instead of adding it to almost every attribute.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-mq.c    | 5 +++--
>   block/blk-sysfs.c | 9 ++-------
>   2 files changed, 5 insertions(+), 9 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 11/26] block: freeze the queue in queue_attr_store
  2024-06-11  5:19 ` [PATCH 11/26] block: freeze the queue in queue_attr_store Christoph Hellwig
  2024-06-11  7:32   ` Damien Le Moal
  2024-06-11  8:22   ` Hannes Reinecke
@ 2024-06-11 19:36   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Bart Van Assche @ 2024-06-11 19:36 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/10/24 10:19 PM, Christoph Hellwig wrote:
> queue_attr_store updates attributes used to control generating I/O, and
> can cause malformed bios if changed with I/O in flight.  Freeze the queue
> in common code instead of adding it to almost every attribute.

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 12/26] block: remove blk_flush_policy
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (10 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 11/26] block: freeze the queue in queue_attr_store Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  7:33   ` Damien Le Moal
                     ` (2 more replies)
  2024-06-11  5:19 ` [PATCH 13/26] block: move cache control settings out of queue->flags Christoph Hellwig
                   ` (13 subsequent siblings)
  25 siblings, 3 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Fold blk_flush_policy into the only caller to prepare for pending changes
to it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-flush.c | 33 +++++++++++++++------------------
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/block/blk-flush.c b/block/blk-flush.c
index c17cf8ed8113db..2234f8b3fc05f2 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -100,23 +100,6 @@ blk_get_flush_queue(struct request_queue *q, struct blk_mq_ctx *ctx)
 	return blk_mq_map_queue(q, REQ_OP_FLUSH, ctx)->fq;
 }
 
-static unsigned int blk_flush_policy(unsigned long fflags, struct request *rq)
-{
-	unsigned int policy = 0;
-
-	if (blk_rq_sectors(rq))
-		policy |= REQ_FSEQ_DATA;
-
-	if (fflags & (1UL << QUEUE_FLAG_WC)) {
-		if (rq->cmd_flags & REQ_PREFLUSH)
-			policy |= REQ_FSEQ_PREFLUSH;
-		if (!(fflags & (1UL << QUEUE_FLAG_FUA)) &&
-		    (rq->cmd_flags & REQ_FUA))
-			policy |= REQ_FSEQ_POSTFLUSH;
-	}
-	return policy;
-}
-
 static unsigned int blk_flush_cur_seq(struct request *rq)
 {
 	return 1 << ffz(rq->flush.seq);
@@ -399,12 +382,26 @@ bool blk_insert_flush(struct request *rq)
 {
 	struct request_queue *q = rq->q;
 	unsigned long fflags = q->queue_flags;	/* may change, cache */
-	unsigned int policy = blk_flush_policy(fflags, rq);
 	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
+	unsigned int policy = 0;
 
 	/* FLUSH/FUA request must never be merged */
 	WARN_ON_ONCE(rq->bio != rq->biotail);
 
+	if (blk_rq_sectors(rq))
+		policy |= REQ_FSEQ_DATA;
+
+	/*
+	 * Check which flushes we need to sequence for this operation.
+	 */
+	if (fflags & (1UL << QUEUE_FLAG_WC)) {
+		if (rq->cmd_flags & REQ_PREFLUSH)
+			policy |= REQ_FSEQ_PREFLUSH;
+		if (!(fflags & (1UL << QUEUE_FLAG_FUA)) &&
+		    (rq->cmd_flags & REQ_FUA))
+			policy |= REQ_FSEQ_POSTFLUSH;
+	}
+
 	/*
 	 * @policy now records what operations need to be done.  Adjust
 	 * REQ_PREFLUSH and FUA for the driver.
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 12/26] block: remove blk_flush_policy
  2024-06-11  5:19 ` [PATCH 12/26] block: remove blk_flush_policy Christoph Hellwig
@ 2024-06-11  7:33   ` Damien Le Moal
  2024-06-11  8:23   ` Hannes Reinecke
  2024-06-11 19:37   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  7:33 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Fold blk_flush_policy into the only caller to prepare for pending changes
> to it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 12/26] block: remove blk_flush_policy
  2024-06-11  5:19 ` [PATCH 12/26] block: remove blk_flush_policy Christoph Hellwig
  2024-06-11  7:33   ` Damien Le Moal
@ 2024-06-11  8:23   ` Hannes Reinecke
  2024-06-11 19:37   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  8:23 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> Fold blk_flush_policy into the only caller to prepare for pending changes
> to it.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   block/blk-flush.c | 33 +++++++++++++++------------------
>   1 file changed, 15 insertions(+), 18 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 12/26] block: remove blk_flush_policy
  2024-06-11  5:19 ` [PATCH 12/26] block: remove blk_flush_policy Christoph Hellwig
  2024-06-11  7:33   ` Damien Le Moal
  2024-06-11  8:23   ` Hannes Reinecke
@ 2024-06-11 19:37   ` Bart Van Assche
  2 siblings, 0 replies; 104+ messages in thread
From: Bart Van Assche @ 2024-06-11 19:37 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/10/24 10:19 PM, Christoph Hellwig wrote:
> Fold blk_flush_policy into the only caller to prepare for pending changes
> to it.
  Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 13/26] block: move cache control settings out of queue->flags
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (11 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 12/26] block: remove blk_flush_policy Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  7:55   ` Damien Le Moal
                     ` (2 more replies)
  2024-06-11  5:19 ` [PATCH 14/26] block: move the nonrot flag to queue_limits Christoph Hellwig
                   ` (12 subsequent siblings)
  25 siblings, 3 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the cache control settings into the queue_limits so that they
can be set atomically and all I/O is frozen when changing the
flags.

Add new features and flags field for the driver set flags, and internal
(usually sysfs-controlled) flags in the block layer.  Note that we'll
eventually remove enough field from queue_limits to bring it back to the
previous size.

The disable flag is inverted compared to the previous meaning, which
means it now survives a rescan, similar to the max_sectors and
max_discard_sectors user limits.

The FLUSH and FUA flags are now inherited by blk_stack_limits, which
simplified the code in dm a lot, but also causes a slight behavior
change in that dm-switch and dm-unstripe now advertise a write cache
despite setting num_flush_bios to 0.  The I/O path will handle this
gracefully, but as far as I can tell the lack of num_flush_bios
and thus flush support is a pre-existing data integrity bug in those
targets that really needs fixing, after which a non-zero num_flush_bios
should be required in dm for targets that map to underlying devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 .../block/writeback_cache_control.rst         | 67 +++++++++++--------
 arch/um/drivers/ubd_kern.c                    |  2 +-
 block/blk-core.c                              |  2 +-
 block/blk-flush.c                             |  9 ++-
 block/blk-mq-debugfs.c                        |  2 -
 block/blk-settings.c                          | 29 ++------
 block/blk-sysfs.c                             | 29 +++++---
 block/blk-wbt.c                               |  4 +-
 drivers/block/drbd/drbd_main.c                |  2 +-
 drivers/block/loop.c                          |  9 +--
 drivers/block/nbd.c                           | 14 ++--
 drivers/block/null_blk/main.c                 | 12 ++--
 drivers/block/ps3disk.c                       |  7 +-
 drivers/block/rnbd/rnbd-clt.c                 | 10 +--
 drivers/block/ublk_drv.c                      |  8 ++-
 drivers/block/virtio_blk.c                    | 20 ++++--
 drivers/block/xen-blkfront.c                  |  9 ++-
 drivers/md/bcache/super.c                     |  7 +-
 drivers/md/dm-table.c                         | 39 +++--------
 drivers/md/md.c                               |  8 ++-
 drivers/mmc/core/block.c                      | 42 ++++++------
 drivers/mmc/core/queue.c                      | 12 ++--
 drivers/mmc/core/queue.h                      |  3 +-
 drivers/mtd/mtd_blkdevs.c                     |  5 +-
 drivers/nvdimm/pmem.c                         |  4 +-
 drivers/nvme/host/core.c                      |  7 +-
 drivers/nvme/host/multipath.c                 |  6 --
 drivers/scsi/sd.c                             | 28 +++++---
 include/linux/blkdev.h                        | 38 +++++++++--
 29 files changed, 227 insertions(+), 207 deletions(-)

diff --git a/Documentation/block/writeback_cache_control.rst b/Documentation/block/writeback_cache_control.rst
index b208488d0aae85..9cfe27f90253c7 100644
--- a/Documentation/block/writeback_cache_control.rst
+++ b/Documentation/block/writeback_cache_control.rst
@@ -46,41 +46,50 @@ worry if the underlying devices need any explicit cache flushing and how
 the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
 may both be set on a single bio.
 
+Feature settings for block drivers
+----------------------------------
 
-Implementation details for bio based block drivers
---------------------------------------------------------------
+For devices that do not support volatile write caches there is no driver
+support required, the block layer completes empty REQ_PREFLUSH requests before
+entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
+requests that have a payload.
 
-These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
-directly below the submit_bio interface.  For remapping drivers the REQ_FUA
-bits need to be propagated to underlying devices, and a global flush needs
-to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
-drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
-on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
-data can be completed successfully without doing any work.  Drivers for
-devices with volatile caches need to implement the support for these
-flags themselves without any help from the block layer.
+For devices with volatile write caches the driver needs to tell the block layer
+that it supports flushing caches by setting the
 
+   BLK_FEAT_WRITE_CACHE
 
-Implementation details for request_fn based block drivers
----------------------------------------------------------
+flag in the queue_limits feature field.  For devices that also support the FUA
+bit the block layer needs to be told to pass on the REQ_FUA bit by also setting
+the
 
-For devices that do not support volatile write caches there is no driver
-support required, the block layer completes empty REQ_PREFLUSH requests before
-entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
-requests that have a payload.  For devices with volatile write caches the
-driver needs to tell the block layer that it supports flushing caches by
-doing::
+   BLK_FEAT_FUA
+
+flag in the features field of the queue_limits structure.
+
+Implementation details for bio based block drivers
+--------------------------------------------------
+
+For bio based drivers the REQ_PREFLUSH and REQ_FUA bit are simplify passed on
+to the driver if the drivers sets the BLK_FEAT_WRITE_CACHE flag and the drivers
+needs to handle them.
+
+*NOTE*: The REQ_FUA bit also gets passed on when the BLK_FEAT_FUA flags is
+_not_ set.  Any bio based driver that sets BLK_FEAT_WRITE_CACHE also needs to
+handle REQ_FUA.
 
-	blk_queue_write_cache(sdkp->disk->queue, true, false);
+For remapping drivers the REQ_FUA bits need to be propagated to underlying
+devices, and a global flush needs to be implemented for bios with the
+REQ_PREFLUSH bit set.
 
-and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
-REQ_PREFLUSH requests with a payload are automatically turned into a sequence
-of an empty REQ_OP_FLUSH request followed by the actual write by the block
-layer.  For devices that also support the FUA bit the block layer needs
-to be told to pass through the REQ_FUA bit using::
+Implementation details for blk-mq drivers
+-----------------------------------------
 
-	blk_queue_write_cache(sdkp->disk->queue, true, true);
+When the BLK_FEAT_WRITE_CACHE flag is set, REQ_OP_WRITE | REQ_PREFLUSH requests
+with a payload are automatically turned into a sequence of a REQ_OP_FLUSH
+request followed by the actual write by the block layer.
 
-and the driver must handle write requests that have the REQ_FUA bit set
-in prep_fn/request_fn.  If the FUA bit is not natively supported the block
-layer turns it into an empty REQ_OP_FLUSH request after the actual write.
+When the BLK_FEA_FUA flags is set, the REQ_FUA bit simplify passed on for the
+REQ_OP_WRITE request, else a REQ_OP_FLUSH request is sent by the block layer
+after the completion of the write request for bio submissions with the REQ_FUA
+bit set.
diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index cdcb75a68989dd..19e01691ea0ea7 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -835,6 +835,7 @@ static int ubd_add(int n, char **error_out)
 	struct queue_limits lim = {
 		.max_segments		= MAX_SG,
 		.seg_boundary_mask	= PAGE_SIZE - 1,
+		.features		= BLK_FEAT_WRITE_CACHE,
 	};
 	struct gendisk *disk;
 	int err = 0;
@@ -882,7 +883,6 @@ static int ubd_add(int n, char **error_out)
 	}
 
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
-	blk_queue_write_cache(disk->queue, true, false);
 	disk->major = UBD_MAJOR;
 	disk->first_minor = n << UBD_SHIFT;
 	disk->minors = 1 << UBD_SHIFT;
diff --git a/block/blk-core.c b/block/blk-core.c
index 82c3ae22d76d88..2b45a4df9a1aa1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -782,7 +782,7 @@ void submit_bio_noacct(struct bio *bio)
 		if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE &&
 				 bio_op(bio) != REQ_OP_ZONE_APPEND))
 			goto end_io;
-		if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
+		if (!bdev_write_cache(bdev)) {
 			bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
 			if (!bio_sectors(bio)) {
 				status = BLK_STS_OK;
diff --git a/block/blk-flush.c b/block/blk-flush.c
index 2234f8b3fc05f2..30b9d5033a2b85 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -381,8 +381,8 @@ static void blk_rq_init_flush(struct request *rq)
 bool blk_insert_flush(struct request *rq)
 {
 	struct request_queue *q = rq->q;
-	unsigned long fflags = q->queue_flags;	/* may change, cache */
 	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
+	bool supports_fua = q->limits.features & BLK_FEAT_FUA;
 	unsigned int policy = 0;
 
 	/* FLUSH/FUA request must never be merged */
@@ -394,11 +394,10 @@ bool blk_insert_flush(struct request *rq)
 	/*
 	 * Check which flushes we need to sequence for this operation.
 	 */
-	if (fflags & (1UL << QUEUE_FLAG_WC)) {
+	if (blk_queue_write_cache(q)) {
 		if (rq->cmd_flags & REQ_PREFLUSH)
 			policy |= REQ_FSEQ_PREFLUSH;
-		if (!(fflags & (1UL << QUEUE_FLAG_FUA)) &&
-		    (rq->cmd_flags & REQ_FUA))
+		if ((rq->cmd_flags & REQ_FUA) && !supports_fua)
 			policy |= REQ_FSEQ_POSTFLUSH;
 	}
 
@@ -407,7 +406,7 @@ bool blk_insert_flush(struct request *rq)
 	 * REQ_PREFLUSH and FUA for the driver.
 	 */
 	rq->cmd_flags &= ~REQ_PREFLUSH;
-	if (!(fflags & (1UL << QUEUE_FLAG_FUA)))
+	if (!supports_fua)
 		rq->cmd_flags &= ~REQ_FUA;
 
 	/*
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 770c0c2b72faaa..e8b9db7c30c455 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -93,8 +93,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(INIT_DONE),
 	QUEUE_FLAG_NAME(STABLE_WRITES),
 	QUEUE_FLAG_NAME(POLL),
-	QUEUE_FLAG_NAME(WC),
-	QUEUE_FLAG_NAME(FUA),
 	QUEUE_FLAG_NAME(DAX),
 	QUEUE_FLAG_NAME(STATS),
 	QUEUE_FLAG_NAME(REGISTERED),
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f11c8676eb4c67..536ee202fcdccb 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -261,6 +261,9 @@ static int blk_validate_limits(struct queue_limits *lim)
 		lim->misaligned = 0;
 	}
 
+	if (!(lim->features & BLK_FEAT_WRITE_CACHE))
+		lim->features &= ~BLK_FEAT_FUA;
+
 	err = blk_validate_integrity_limits(lim);
 	if (err)
 		return err;
@@ -454,6 +457,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 {
 	unsigned int top, bottom, alignment, ret = 0;
 
+	t->features |= (b->features & BLK_FEAT_INHERIT_MASK);
+
 	t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
 	t->max_user_sectors = min_not_zero(t->max_user_sectors,
 			b->max_user_sectors);
@@ -711,30 +716,6 @@ void blk_set_queue_depth(struct request_queue *q, unsigned int depth)
 }
 EXPORT_SYMBOL(blk_set_queue_depth);
 
-/**
- * blk_queue_write_cache - configure queue's write cache
- * @q:		the request queue for the device
- * @wc:		write back cache on or off
- * @fua:	device supports FUA writes, if true
- *
- * Tell the block layer about the write cache of @q.
- */
-void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
-{
-	if (wc) {
-		blk_queue_flag_set(QUEUE_FLAG_HW_WC, q);
-		blk_queue_flag_set(QUEUE_FLAG_WC, q);
-	} else {
-		blk_queue_flag_clear(QUEUE_FLAG_HW_WC, q);
-		blk_queue_flag_clear(QUEUE_FLAG_WC, q);
-	}
-	if (fua)
-		blk_queue_flag_set(QUEUE_FLAG_FUA, q);
-	else
-		blk_queue_flag_clear(QUEUE_FLAG_FUA, q);
-}
-EXPORT_SYMBOL_GPL(blk_queue_write_cache);
-
 int bdev_alignment_offset(struct block_device *bdev)
 {
 	struct request_queue *q = bdev_get_queue(bdev);
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 5c787965b7d09e..4f524c1d5e08bd 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -423,32 +423,41 @@ static ssize_t queue_io_timeout_store(struct request_queue *q, const char *page,
 
 static ssize_t queue_wc_show(struct request_queue *q, char *page)
 {
-	if (test_bit(QUEUE_FLAG_WC, &q->queue_flags))
-		return sprintf(page, "write back\n");
-
-	return sprintf(page, "write through\n");
+	if (q->limits.features & BLK_FLAGS_WRITE_CACHE_DISABLED)
+		return sprintf(page, "write through\n");
+	return sprintf(page, "write back\n");
 }
 
 static ssize_t queue_wc_store(struct request_queue *q, const char *page,
 			      size_t count)
 {
+	struct queue_limits lim;
+	bool disable;
+	int err;
+
 	if (!strncmp(page, "write back", 10)) {
-		if (!test_bit(QUEUE_FLAG_HW_WC, &q->queue_flags))
-			return -EINVAL;
-		blk_queue_flag_set(QUEUE_FLAG_WC, q);
+		disable = false;
 	} else if (!strncmp(page, "write through", 13) ||
-		 !strncmp(page, "none", 4)) {
-		blk_queue_flag_clear(QUEUE_FLAG_WC, q);
+		   !strncmp(page, "none", 4)) {
+		disable = true;
 	} else {
 		return -EINVAL;
 	}
 
+	lim = queue_limits_start_update(q);
+	if (disable)
+		lim.flags |= BLK_FLAGS_WRITE_CACHE_DISABLED;
+	else
+		lim.flags &= ~BLK_FLAGS_WRITE_CACHE_DISABLED;
+	err = queue_limits_commit_update(q, &lim);
+	if (err)
+		return err;
 	return count;
 }
 
 static ssize_t queue_fua_show(struct request_queue *q, char *page)
 {
-	return sprintf(page, "%u\n", test_bit(QUEUE_FLAG_FUA, &q->queue_flags));
+	return sprintf(page, "%u\n", !!(q->limits.features & BLK_FEAT_FUA));
 }
 
 static ssize_t queue_dax_show(struct request_queue *q, char *page)
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index 64472134dd26df..1a5e4b049ecd1d 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -206,8 +206,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
 	 */
 	if (wb_acct & WBT_DISCARD)
 		limit = rwb->wb_background;
-	else if (test_bit(QUEUE_FLAG_WC, &rwb->rqos.disk->queue->queue_flags) &&
-	         !wb_recent_wait(rwb))
+	else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
+		 !wb_recent_wait(rwb))
 		limit = 0;
 	else
 		limit = rwb->wb_normal;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 113b441d4d3670..bf42a46781fa21 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2697,6 +2697,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
 		 * connect.
 		 */
 		.max_hw_sectors		= DRBD_MAX_BIO_SIZE_SAFE >> 8,
+		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
 	};
 
 	device = minor_to_device(minor);
@@ -2736,7 +2737,6 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
 	disk->private_data = device;
 
 	blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
-	blk_queue_write_cache(disk->queue, true, true);
 
 	device->md_io.page = alloc_page(GFP_KERNEL);
 	if (!device->md_io.page)
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 2c4a5eb3a6a7f9..0b23fdc4e2edcc 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -985,6 +985,9 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
 	lim.logical_block_size = bsize;
 	lim.physical_block_size = bsize;
 	lim.io_min = bsize;
+	lim.features &= ~BLK_FEAT_WRITE_CACHE;
+	if (file->f_op->fsync && !(lo->lo_flags & LO_FLAGS_READ_ONLY))
+		lim.features |= BLK_FEAT_WRITE_CACHE;
 	if (!backing_bdev || bdev_nonrot(backing_bdev))
 		blk_queue_flag_set(QUEUE_FLAG_NONROT, lo->lo_queue);
 	else
@@ -1078,9 +1081,6 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
 	lo->old_gfp_mask = mapping_gfp_mask(mapping);
 	mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
 
-	if (!(lo->lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
-		blk_queue_write_cache(lo->lo_queue, true, false);
-
 	error = loop_reconfigure_limits(lo, config->block_size);
 	if (WARN_ON_ONCE(error))
 		goto out_unlock;
@@ -1131,9 +1131,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
 	struct file *filp;
 	gfp_t gfp = lo->old_gfp_mask;
 
-	if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
-		blk_queue_write_cache(lo->lo_queue, false, false);
-
 	/*
 	 * Freeze the request queue when unbinding on a live file descriptor and
 	 * thus an open device.  When called from ->release we are guaranteed
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 44b8c671921e5c..cb1c86a6a3fb9d 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -342,12 +342,14 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
 		lim.max_hw_discard_sectors = UINT_MAX;
 	else
 		lim.max_hw_discard_sectors = 0;
-	if (!(nbd->config->flags & NBD_FLAG_SEND_FLUSH))
-		blk_queue_write_cache(nbd->disk->queue, false, false);
-	else if (nbd->config->flags & NBD_FLAG_SEND_FUA)
-		blk_queue_write_cache(nbd->disk->queue, true, true);
-	else
-		blk_queue_write_cache(nbd->disk->queue, true, false);
+	if (!(nbd->config->flags & NBD_FLAG_SEND_FLUSH)) {
+		lim.features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
+	} else if (nbd->config->flags & NBD_FLAG_SEND_FUA) {
+		lim.features |= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA;
+	} else {
+		lim.features |= BLK_FEAT_WRITE_CACHE;
+		lim.features &= ~BLK_FEAT_FUA;
+	}
 	lim.logical_block_size = blksize;
 	lim.physical_block_size = blksize;
 	error = queue_limits_commit_update(nbd->disk->queue, &lim);
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 631dca2e4e8442..73e4aecf5bb492 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -1928,6 +1928,13 @@ static int null_add_dev(struct nullb_device *dev)
 			goto out_cleanup_tags;
 	}
 
+	if (dev->cache_size > 0) {
+		set_bit(NULLB_DEV_FL_CACHE, &nullb->dev->flags);
+		lim.features |= BLK_FEAT_WRITE_CACHE;
+		if (dev->fua)
+			lim.features |= BLK_FEAT_FUA;
+	}
+
 	nullb->disk = blk_mq_alloc_disk(nullb->tag_set, &lim, nullb);
 	if (IS_ERR(nullb->disk)) {
 		rv = PTR_ERR(nullb->disk);
@@ -1940,11 +1947,6 @@ static int null_add_dev(struct nullb_device *dev)
 		nullb_setup_bwtimer(nullb);
 	}
 
-	if (dev->cache_size > 0) {
-		set_bit(NULLB_DEV_FL_CACHE, &nullb->dev->flags);
-		blk_queue_write_cache(nullb->q, true, dev->fua);
-	}
-
 	nullb->q->queuedata = nullb;
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
 
diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index b810ac0a5c4b97..8b73cf459b5937 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -388,9 +388,8 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
 		.max_segments		= -1,
 		.max_segment_size	= dev->bounce_size,
 		.dma_alignment		= dev->blk_size - 1,
+		.features		= BLK_FEAT_WRITE_CACHE,
 	};
-
-	struct request_queue *queue;
 	struct gendisk *gendisk;
 
 	if (dev->blk_size < 512) {
@@ -447,10 +446,6 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
 		goto fail_free_tag_set;
 	}
 
-	queue = gendisk->queue;
-
-	blk_queue_write_cache(queue, true, false);
-
 	priv->gendisk = gendisk;
 	gendisk->major = ps3disk_major;
 	gendisk->first_minor = devidx * PS3DISK_MINORS;
diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
index b7ffe03c61606d..02c4b173182719 100644
--- a/drivers/block/rnbd/rnbd-clt.c
+++ b/drivers/block/rnbd/rnbd-clt.c
@@ -1389,6 +1389,12 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev *dev,
 			le32_to_cpu(rsp->max_discard_sectors);
 	}
 
+	if (rsp->cache_policy & RNBD_WRITEBACK) {
+		lim.features |= BLK_FEAT_WRITE_CACHE;
+		if (rsp->cache_policy & RNBD_FUA)
+			lim.features |= BLK_FEAT_FUA;
+	}
+
 	dev->gd = blk_mq_alloc_disk(&dev->sess->tag_set, &lim, dev);
 	if (IS_ERR(dev->gd))
 		return PTR_ERR(dev->gd);
@@ -1397,10 +1403,6 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev *dev,
 
 	blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, dev->queue);
 	blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, dev->queue);
-	blk_queue_write_cache(dev->queue,
-			      !!(rsp->cache_policy & RNBD_WRITEBACK),
-			      !!(rsp->cache_policy & RNBD_FUA));
-
 	return rnbd_clt_setup_gen_disk(dev, rsp, idx);
 }
 
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 4e159948c912c2..e45c65c1848d31 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -487,8 +487,6 @@ static void ublk_dev_param_basic_apply(struct ublk_device *ub)
 	struct request_queue *q = ub->ub_disk->queue;
 	const struct ublk_param_basic *p = &ub->params.basic;
 
-	blk_queue_write_cache(q, p->attrs & UBLK_ATTR_VOLATILE_CACHE,
-			p->attrs & UBLK_ATTR_FUA);
 	if (p->attrs & UBLK_ATTR_ROTATIONAL)
 		blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
 	else
@@ -2210,6 +2208,12 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
 		lim.max_zone_append_sectors = p->max_zone_append_sectors;
 	}
 
+	if (ub->params.basic.attrs & UBLK_ATTR_VOLATILE_CACHE) {
+		lim.features |= BLK_FEAT_WRITE_CACHE;
+		if (ub->params.basic.attrs & UBLK_ATTR_FUA)
+			lim.features |= BLK_FEAT_FUA;
+	}
+
 	if (wait_for_completion_interruptible(&ub->completion) != 0)
 		return -EINTR;
 
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 378b241911ca87..b1a3c293528519 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1100,6 +1100,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 	struct gendisk *disk = dev_to_disk(dev);
 	struct virtio_blk *vblk = disk->private_data;
 	struct virtio_device *vdev = vblk->vdev;
+	struct queue_limits lim;
 	int i;
 
 	BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
@@ -1108,7 +1109,17 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 		return i;
 
 	virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
-	blk_queue_write_cache(disk->queue, virtblk_get_cache_mode(vdev), false);
+
+	lim = queue_limits_start_update(disk->queue);
+	if (virtblk_get_cache_mode(vdev))
+		lim.features |= BLK_FEAT_WRITE_CACHE;
+	else
+		lim.features &= ~BLK_FEAT_WRITE_CACHE;
+	blk_mq_freeze_queue(disk->queue);
+	i = queue_limits_commit_update(disk->queue, &lim);
+	blk_mq_unfreeze_queue(disk->queue);
+	if (i)
+		return i;
 	return count;
 }
 
@@ -1504,6 +1515,9 @@ static int virtblk_probe(struct virtio_device *vdev)
 	if (err)
 		goto out_free_tags;
 
+	if (virtblk_get_cache_mode(vdev))
+		lim.features |= BLK_FEAT_WRITE_CACHE;
+
 	vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, &lim, vblk);
 	if (IS_ERR(vblk->disk)) {
 		err = PTR_ERR(vblk->disk);
@@ -1519,10 +1533,6 @@ static int virtblk_probe(struct virtio_device *vdev)
 	vblk->disk->fops = &virtblk_fops;
 	vblk->index = index;
 
-	/* configure queue flush support */
-	blk_queue_write_cache(vblk->disk->queue, virtblk_get_cache_mode(vdev),
-			false);
-
 	/* If disk is read-only in the host, the guest should obey */
 	if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
 		set_disk_ro(vblk->disk, 1);
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9794ac2d3299d1..de38e025769b14 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -956,6 +956,12 @@ static void blkif_set_queue_limits(const struct blkfront_info *info,
 			lim->max_secure_erase_sectors = UINT_MAX;
 	}
 
+	if (info->feature_flush) {
+		lim->features |= BLK_FEAT_WRITE_CACHE;
+		if (info->feature_fua)
+			lim->features |= BLK_FEAT_FUA;
+	}
+
 	/* Hard sector size and max sectors impersonate the equiv. hardware. */
 	lim->logical_block_size = info->sector_size;
 	lim->physical_block_size = info->physical_sector_size;
@@ -1150,9 +1156,6 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 	info->sector_size = sector_size;
 	info->physical_sector_size = physical_sector_size;
 
-	blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
-			      info->feature_fua ? true : false);
-
 	pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
 		info->gd->disk_name, flush_info(info),
 		"persistent grants:", info->feature_persistent ?
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 4d11fc664cb0b8..cb6595c8b5514e 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -897,7 +897,6 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
 		sector_t sectors, struct block_device *cached_bdev,
 		const struct block_device_operations *ops)
 {
-	struct request_queue *q;
 	const size_t max_stripes = min_t(size_t, INT_MAX,
 					 SIZE_MAX / sizeof(atomic_t));
 	struct queue_limits lim = {
@@ -909,6 +908,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
 		.io_min			= block_size,
 		.logical_block_size	= block_size,
 		.physical_block_size	= block_size,
+		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
 	};
 	uint64_t n;
 	int idx;
@@ -975,12 +975,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
 	d->disk->fops		= ops;
 	d->disk->private_data	= d;
 
-	q = d->disk->queue;
-
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, d->disk->queue);
-
-	blk_queue_write_cache(q, true, true);
-
 	return 0;
 
 out_bioset_exit:
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index fd789eeb62d943..fbe125d55e25b4 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1686,34 +1686,16 @@ int dm_calculate_queue_limits(struct dm_table *t,
 	return validate_hardware_logical_block_alignment(t, limits);
 }
 
-static int device_flush_capable(struct dm_target *ti, struct dm_dev *dev,
-				sector_t start, sector_t len, void *data)
-{
-	unsigned long flush = (unsigned long) data;
-	struct request_queue *q = bdev_get_queue(dev->bdev);
-
-	return (q->queue_flags & flush);
-}
-
-static bool dm_table_supports_flush(struct dm_table *t, unsigned long flush)
+/*
+ * Check if an target requires flush support even if none of the underlying
+ * devices need it (e.g. to persist target-specific metadata).
+ */
+static bool dm_table_supports_flush(struct dm_table *t)
 {
-	/*
-	 * Require at least one underlying device to support flushes.
-	 * t->devices includes internal dm devices such as mirror logs
-	 * so we need to use iterate_devices here, which targets
-	 * supporting flushes must provide.
-	 */
 	for (unsigned int i = 0; i < t->num_targets; i++) {
 		struct dm_target *ti = dm_table_get_target(t, i);
 
-		if (!ti->num_flush_bios)
-			continue;
-
-		if (ti->flush_supported)
-			return true;
-
-		if (ti->type->iterate_devices &&
-		    ti->type->iterate_devices(ti, device_flush_capable, (void *) flush))
+		if (ti->num_flush_bios && ti->flush_supported)
 			return true;
 	}
 
@@ -1855,7 +1837,6 @@ static int device_requires_stable_pages(struct dm_target *ti,
 int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 			      struct queue_limits *limits)
 {
-	bool wc = false, fua = false;
 	int r;
 
 	if (dm_table_supports_nowait(t))
@@ -1876,12 +1857,8 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (!dm_table_supports_secure_erase(t))
 		limits->max_secure_erase_sectors = 0;
 
-	if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_WC))) {
-		wc = true;
-		if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_FUA)))
-			fua = true;
-	}
-	blk_queue_write_cache(q, wc, fua);
+	if (dm_table_supports_flush(t))
+		limits->features |= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA;
 
 	if (dm_table_supports_dax(t, device_not_dax_capable)) {
 		blk_queue_flag_set(QUEUE_FLAG_DAX, q);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 67ece2cd725f50..2f4c5d1755d857 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5785,7 +5785,10 @@ struct mddev *md_alloc(dev_t dev, char *name)
 	int partitioned;
 	int shift;
 	int unit;
-	int error ;
+	int error;
+	struct queue_limits lim = {
+		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
+	};
 
 	/*
 	 * Wait for any previous instance of this device to be completely
@@ -5825,7 +5828,7 @@ struct mddev *md_alloc(dev_t dev, char *name)
 		 */
 		mddev->hold_active = UNTIL_STOP;
 
-	disk = blk_alloc_disk(NULL, NUMA_NO_NODE);
+	disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
 	if (IS_ERR(disk)) {
 		error = PTR_ERR(disk);
 		goto out_free_mddev;
@@ -5843,7 +5846,6 @@ struct mddev *md_alloc(dev_t dev, char *name)
 	disk->fops = &md_fops;
 	disk->private_data = mddev;
 
-	blk_queue_write_cache(disk->queue, true, true);
 	disk->events |= DISK_EVENT_MEDIA_CHANGE;
 	mddev->gendisk = disk;
 	error = add_disk(disk);
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 367509b5b6466c..2c9963248fcbd6 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2466,8 +2466,7 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
 	struct mmc_blk_data *md;
 	int devidx, ret;
 	char cap_str[10];
-	bool cache_enabled = false;
-	bool fua_enabled = false;
+	unsigned int features = 0;
 
 	devidx = ida_alloc_max(&mmc_blk_ida, max_devices - 1, GFP_KERNEL);
 	if (devidx < 0) {
@@ -2499,7 +2498,24 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
 	 */
 	md->read_only = mmc_blk_readonly(card);
 
-	md->disk = mmc_init_queue(&md->queue, card);
+	if (mmc_host_cmd23(card->host)) {
+		if ((mmc_card_mmc(card) &&
+		     card->csd.mmca_vsn >= CSD_SPEC_VER_3) ||
+		    (mmc_card_sd(card) &&
+		     card->scr.cmds & SD_SCR_CMD23_SUPPORT))
+			md->flags |= MMC_BLK_CMD23;
+	}
+
+	if (md->flags & MMC_BLK_CMD23 &&
+	    ((card->ext_csd.rel_param & EXT_CSD_WR_REL_PARAM_EN) ||
+	     card->ext_csd.rel_sectors)) {
+		md->flags |= MMC_BLK_REL_WR;
+		features |= (BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
+	} else if (mmc_cache_enabled(card->host)) {
+		features |= BLK_FEAT_WRITE_CACHE;
+	}
+
+	md->disk = mmc_init_queue(&md->queue, card, features);
 	if (IS_ERR(md->disk)) {
 		ret = PTR_ERR(md->disk);
 		goto err_kfree;
@@ -2539,26 +2555,6 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
 
 	set_capacity(md->disk, size);
 
-	if (mmc_host_cmd23(card->host)) {
-		if ((mmc_card_mmc(card) &&
-		     card->csd.mmca_vsn >= CSD_SPEC_VER_3) ||
-		    (mmc_card_sd(card) &&
-		     card->scr.cmds & SD_SCR_CMD23_SUPPORT))
-			md->flags |= MMC_BLK_CMD23;
-	}
-
-	if (md->flags & MMC_BLK_CMD23 &&
-	    ((card->ext_csd.rel_param & EXT_CSD_WR_REL_PARAM_EN) ||
-	     card->ext_csd.rel_sectors)) {
-		md->flags |= MMC_BLK_REL_WR;
-		fua_enabled = true;
-		cache_enabled = true;
-	}
-	if (mmc_cache_enabled(card->host))
-		cache_enabled  = true;
-
-	blk_queue_write_cache(md->queue.queue, cache_enabled, fua_enabled);
-
 	string_get_size((u64)size, 512, STRING_UNITS_2,
 			cap_str, sizeof(cap_str));
 	pr_info("%s: %s %s %s%s\n",
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 241cdc2b2a2a3b..97ff993d31570c 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -344,10 +344,12 @@ static const struct blk_mq_ops mmc_mq_ops = {
 };
 
 static struct gendisk *mmc_alloc_disk(struct mmc_queue *mq,
-		struct mmc_card *card)
+		struct mmc_card *card, unsigned int features)
 {
 	struct mmc_host *host = card->host;
-	struct queue_limits lim = { };
+	struct queue_limits lim = {
+		.features		= features,
+	};
 	struct gendisk *disk;
 
 	if (mmc_can_erase(card))
@@ -413,10 +415,12 @@ static inline bool mmc_merge_capable(struct mmc_host *host)
  * mmc_init_queue - initialise a queue structure.
  * @mq: mmc queue
  * @card: mmc card to attach this queue
+ * @features: block layer features (BLK_FEAT_*)
  *
  * Initialise a MMC card request queue.
  */
-struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
+struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
+		unsigned int features)
 {
 	struct mmc_host *host = card->host;
 	struct gendisk *disk;
@@ -460,7 +464,7 @@ struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
 		return ERR_PTR(ret);
 		
 
-	disk = mmc_alloc_disk(mq, card);
+	disk = mmc_alloc_disk(mq, card, features);
 	if (IS_ERR(disk))
 		blk_mq_free_tag_set(&mq->tag_set);
 	return disk;
diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
index 9ade3bcbb714e4..1498840a4ea008 100644
--- a/drivers/mmc/core/queue.h
+++ b/drivers/mmc/core/queue.h
@@ -94,7 +94,8 @@ struct mmc_queue {
 	struct work_struct	complete_work;
 };
 
-struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card);
+struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
+		unsigned int features);
 extern void mmc_cleanup_queue(struct mmc_queue *);
 extern void mmc_queue_suspend(struct mmc_queue *);
 extern void mmc_queue_resume(struct mmc_queue *);
diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 3caa0717d46c01..1b9f57f231e8be 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -336,6 +336,8 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
 	lim.logical_block_size = tr->blksize;
 	if (tr->discard)
 		lim.max_hw_discard_sectors = UINT_MAX;
+	if (tr->flush)
+		lim.features |= BLK_FEAT_WRITE_CACHE;
 
 	/* Create gendisk */
 	gd = blk_mq_alloc_disk(new->tag_set, &lim, new);
@@ -373,9 +375,6 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
 	spin_lock_init(&new->queue_lock);
 	INIT_LIST_HEAD(&new->rq_list);
 
-	if (tr->flush)
-		blk_queue_write_cache(new->rq, true, false);
-
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, new->rq);
 	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, new->rq);
 
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 598fe2e89bda45..aff818469c114c 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -455,6 +455,7 @@ static int pmem_attach_disk(struct device *dev,
 		.logical_block_size	= pmem_sector_size(ndns),
 		.physical_block_size	= PAGE_SIZE,
 		.max_hw_sectors		= UINT_MAX,
+		.features		= BLK_FEAT_WRITE_CACHE,
 	};
 	int nid = dev_to_node(dev), fua;
 	struct resource *res = &nsio->res;
@@ -495,6 +496,8 @@ static int pmem_attach_disk(struct device *dev,
 		dev_warn(dev, "unable to guarantee persistence of writes\n");
 		fua = 0;
 	}
+	if (fua)
+		lim.features |= BLK_FEAT_FUA;
 
 	if (!devm_request_mem_region(dev, res->start, resource_size(res),
 				dev_name(&ndns->dev))) {
@@ -543,7 +546,6 @@ static int pmem_attach_disk(struct device *dev,
 	}
 	pmem->virt_addr = addr;
 
-	blk_queue_write_cache(q, true, fua);
 	blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
 	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
 	if (pmem->pfn_flags & PFN_MAP)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5a673fa5cb2612..9fc5e36fe2e55e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2056,7 +2056,6 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
 static int nvme_update_ns_info_block(struct nvme_ns *ns,
 		struct nvme_ns_info *info)
 {
-	bool vwc = ns->ctrl->vwc & NVME_CTRL_VWC_PRESENT;
 	struct queue_limits lim;
 	struct nvme_id_ns_nvm *nvm = NULL;
 	struct nvme_zone_info zi = {};
@@ -2106,6 +2105,11 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 	    ns->head->ids.csi == NVME_CSI_ZNS)
 		nvme_update_zone_info(ns, &lim, &zi);
 
+	if (ns->ctrl->vwc & NVME_CTRL_VWC_PRESENT)
+		lim.features |= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA;
+	else
+		lim.features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
+
 	/*
 	 * Register a metadata profile for PI, or the plain non-integrity NVMe
 	 * metadata masquerading as Type 0 if supported, otherwise reject block
@@ -2132,7 +2136,6 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
 	if ((id->dlfeat & 0x7) == 0x1 && (id->dlfeat & (1 << 3)))
 		ns->head->features |= NVME_NS_DEAC;
 	set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
-	blk_queue_write_cache(ns->disk->queue, vwc, vwc);
 	set_bit(NVME_NS_READY, &ns->flags);
 	blk_mq_unfreeze_queue(ns->disk->queue);
 
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 12c59db02539e5..3d0e23a0a4ddd8 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -521,7 +521,6 @@ static void nvme_requeue_work(struct work_struct *work)
 int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 {
 	struct queue_limits lim;
-	bool vwc = false;
 
 	mutex_init(&head->lock);
 	bio_list_init(&head->requeue_list);
@@ -562,11 +561,6 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 	if (ctrl->tagset->nr_maps > HCTX_TYPE_POLL &&
 	    ctrl->tagset->map[HCTX_TYPE_POLL].nr_queues)
 		blk_queue_flag_set(QUEUE_FLAG_POLL, head->disk->queue);
-
-	/* we need to propagate up the VMC settings */
-	if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
-		vwc = true;
-	blk_queue_write_cache(head->disk->queue, vwc, vwc);
 	return 0;
 }
 
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 5bfed61c70db8f..8764ea14c9b881 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -120,17 +120,18 @@ static const char *sd_cache_types[] = {
 	"write back, no read (daft)"
 };
 
-static void sd_set_flush_flag(struct scsi_disk *sdkp)
+static void sd_set_flush_flag(struct scsi_disk *sdkp,
+		struct queue_limits *lim)
 {
-	bool wc = false, fua = false;
-
 	if (sdkp->WCE) {
-		wc = true;
+		lim->features |= BLK_FEAT_WRITE_CACHE;
 		if (sdkp->DPOFUA)
-			fua = true;
+			lim->features |= BLK_FEAT_FUA;
+		else
+			lim->features &= ~BLK_FEAT_FUA;
+	} else {
+		lim->features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
 	}
-
-	blk_queue_write_cache(sdkp->disk->queue, wc, fua);
 }
 
 static ssize_t
@@ -168,9 +169,18 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
 	wce = (ct & 0x02) && !sdkp->write_prot ? 1 : 0;
 
 	if (sdkp->cache_override) {
+		struct queue_limits lim;
+
 		sdkp->WCE = wce;
 		sdkp->RCD = rcd;
-		sd_set_flush_flag(sdkp);
+
+		lim = queue_limits_start_update(sdkp->disk->queue);
+		sd_set_flush_flag(sdkp, &lim);
+		blk_mq_freeze_queue(sdkp->disk->queue);
+		ret = queue_limits_commit_update(sdkp->disk->queue, &lim);
+		blk_mq_unfreeze_queue(sdkp->disk->queue);
+		if (ret)
+			return ret;
 		return count;
 	}
 
@@ -3659,7 +3669,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	 * We now have all cache related info, determine how we deal
 	 * with flush requests.
 	 */
-	sd_set_flush_flag(sdkp);
+	sd_set_flush_flag(sdkp, &lim);
 
 	/* Initial block count limit based on CDB TRANSFER LENGTH field size. */
 	dev_max = sdp->use_16_for_rw ? SD_MAX_XFER_BLOCKS : SD_DEF_XFER_BLOCKS;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c792d4d81e5fcc..4e8931a2c76b07 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -282,6 +282,28 @@ static inline bool blk_op_is_passthrough(blk_opf_t op)
 	return op == REQ_OP_DRV_IN || op == REQ_OP_DRV_OUT;
 }
 
+/* flags set by the driver in queue_limits.features */
+enum {
+	/* supports a a volatile write cache */
+	BLK_FEAT_WRITE_CACHE			= (1u << 0),
+
+	/* supports passing on the FUA bit */
+	BLK_FEAT_FUA				= (1u << 1),
+};
+
+/*
+ * Flags automatically inherited when stacking limits.
+ */
+#define BLK_FEAT_INHERIT_MASK \
+	(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA)
+
+
+/* internal flags in queue_limits.flags */
+enum {
+	/* do not send FLUSH or FUA command despite advertised write cache */
+	BLK_FLAGS_WRITE_CACHE_DISABLED		= (1u << 31),
+};
+
 /*
  * BLK_BOUNCE_NONE:	never bounce (default)
  * BLK_BOUNCE_HIGH:	bounce all highmem pages
@@ -292,6 +314,8 @@ enum blk_bounce {
 };
 
 struct queue_limits {
+	unsigned int		features;
+	unsigned int		flags;
 	enum blk_bounce		bounce;
 	unsigned long		seg_boundary_mask;
 	unsigned long		virt_boundary_mask;
@@ -536,12 +560,9 @@ struct request_queue {
 #define QUEUE_FLAG_ADD_RANDOM	10	/* Contributes to random pool */
 #define QUEUE_FLAG_SYNCHRONOUS	11	/* always completes in submit context */
 #define QUEUE_FLAG_SAME_FORCE	12	/* force complete on same CPU */
-#define QUEUE_FLAG_HW_WC	13	/* Write back caching supported */
 #define QUEUE_FLAG_INIT_DONE	14	/* queue is initialized */
 #define QUEUE_FLAG_STABLE_WRITES 15	/* don't modify blks until WB is done */
 #define QUEUE_FLAG_POLL		16	/* IO polling enabled if set */
-#define QUEUE_FLAG_WC		17	/* Write back caching */
-#define QUEUE_FLAG_FUA		18	/* device supports FUA writes */
 #define QUEUE_FLAG_DAX		19	/* device supports DAX */
 #define QUEUE_FLAG_STATS	20	/* track IO start and completion times */
 #define QUEUE_FLAG_REGISTERED	22	/* queue has been registered to a disk */
@@ -951,7 +972,6 @@ void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
 		sector_t offset, const char *pfx);
 extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
 extern void blk_queue_rq_timeout(struct request_queue *, unsigned int);
-extern void blk_queue_write_cache(struct request_queue *q, bool enabled, bool fua);
 
 struct blk_independent_access_ranges *
 disk_alloc_independent_access_ranges(struct gendisk *disk, int nr_ia_ranges);
@@ -1305,14 +1325,20 @@ static inline bool bdev_stable_writes(struct block_device *bdev)
 	return test_bit(QUEUE_FLAG_STABLE_WRITES, &q->queue_flags);
 }
 
+static inline bool blk_queue_write_cache(struct request_queue *q)
+{
+	return (q->limits.features & BLK_FEAT_WRITE_CACHE) &&
+		(q->limits.flags & BLK_FLAGS_WRITE_CACHE_DISABLED);
+}
+
 static inline bool bdev_write_cache(struct block_device *bdev)
 {
-	return test_bit(QUEUE_FLAG_WC, &bdev_get_queue(bdev)->queue_flags);
+	return blk_queue_write_cache(bdev_get_queue(bdev));
 }
 
 static inline bool bdev_fua(struct block_device *bdev)
 {
-	return test_bit(QUEUE_FLAG_FUA, &bdev_get_queue(bdev)->queue_flags);
+	return bdev_get_queue(bdev)->limits.features & BLK_FEAT_FUA;
 }
 
 static inline bool bdev_nowait(struct block_device *bdev)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 13/26] block: move cache control settings out of queue->flags
  2024-06-11  5:19 ` [PATCH 13/26] block: move cache control settings out of queue->flags Christoph Hellwig
@ 2024-06-11  7:55   ` Damien Le Moal
  2024-06-12  4:54     ` Christoph Hellwig
  2024-06-11  9:58   ` Hannes Reinecke
  2024-06-12 14:53   ` Ulf Hansson
  2 siblings, 1 reply; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  7:55 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the cache control settings into the queue_limits so that they
> can be set atomically and all I/O is frozen when changing the
> flags.

...so that they can be set atomically with the device queue frozen when
changing the flags.

may be better.

> 
> Add new features and flags field for the driver set flags, and internal
> (usually sysfs-controlled) flags in the block layer.  Note that we'll
> eventually remove enough field from queue_limits to bring it back to the
> previous size.
> 
> The disable flag is inverted compared to the previous meaning, which
> means it now survives a rescan, similar to the max_sectors and
> max_discard_sectors user limits.
> 
> The FLUSH and FUA flags are now inherited by blk_stack_limits, which
> simplified the code in dm a lot, but also causes a slight behavior
> change in that dm-switch and dm-unstripe now advertise a write cache
> despite setting num_flush_bios to 0.  The I/O path will handle this
> gracefully, but as far as I can tell the lack of num_flush_bios
> and thus flush support is a pre-existing data integrity bug in those
> targets that really needs fixing, after which a non-zero num_flush_bios
> should be required in dm for targets that map to underlying devices.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  .../block/writeback_cache_control.rst         | 67 +++++++++++--------
>  arch/um/drivers/ubd_kern.c                    |  2 +-
>  block/blk-core.c                              |  2 +-
>  block/blk-flush.c                             |  9 ++-
>  block/blk-mq-debugfs.c                        |  2 -
>  block/blk-settings.c                          | 29 ++------
>  block/blk-sysfs.c                             | 29 +++++---
>  block/blk-wbt.c                               |  4 +-
>  drivers/block/drbd/drbd_main.c                |  2 +-
>  drivers/block/loop.c                          |  9 +--
>  drivers/block/nbd.c                           | 14 ++--
>  drivers/block/null_blk/main.c                 | 12 ++--
>  drivers/block/ps3disk.c                       |  7 +-
>  drivers/block/rnbd/rnbd-clt.c                 | 10 +--
>  drivers/block/ublk_drv.c                      |  8 ++-
>  drivers/block/virtio_blk.c                    | 20 ++++--
>  drivers/block/xen-blkfront.c                  |  9 ++-
>  drivers/md/bcache/super.c                     |  7 +-
>  drivers/md/dm-table.c                         | 39 +++--------
>  drivers/md/md.c                               |  8 ++-
>  drivers/mmc/core/block.c                      | 42 ++++++------
>  drivers/mmc/core/queue.c                      | 12 ++--
>  drivers/mmc/core/queue.h                      |  3 +-
>  drivers/mtd/mtd_blkdevs.c                     |  5 +-
>  drivers/nvdimm/pmem.c                         |  4 +-
>  drivers/nvme/host/core.c                      |  7 +-
>  drivers/nvme/host/multipath.c                 |  6 --
>  drivers/scsi/sd.c                             | 28 +++++---
>  include/linux/blkdev.h                        | 38 +++++++++--
>  29 files changed, 227 insertions(+), 207 deletions(-)
> 
> diff --git a/Documentation/block/writeback_cache_control.rst b/Documentation/block/writeback_cache_control.rst
> index b208488d0aae85..9cfe27f90253c7 100644
> --- a/Documentation/block/writeback_cache_control.rst
> +++ b/Documentation/block/writeback_cache_control.rst
> @@ -46,41 +46,50 @@ worry if the underlying devices need any explicit cache flushing and how
>  the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
>  may both be set on a single bio.
>  
> +Feature settings for block drivers
> +----------------------------------
>  
> -Implementation details for bio based block drivers
> ---------------------------------------------------------------
> +For devices that do not support volatile write caches there is no driver
> +support required, the block layer completes empty REQ_PREFLUSH requests before
> +entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
> +requests that have a payload.
>  
> -These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
> -directly below the submit_bio interface.  For remapping drivers the REQ_FUA
> -bits need to be propagated to underlying devices, and a global flush needs
> -to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
> -drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
> -on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
> -data can be completed successfully without doing any work.  Drivers for
> -devices with volatile caches need to implement the support for these
> -flags themselves without any help from the block layer.
> +For devices with volatile write caches the driver needs to tell the block layer
> +that it supports flushing caches by setting the
>  
> +   BLK_FEAT_WRITE_CACHE
>  
> -Implementation details for request_fn based block drivers
> ----------------------------------------------------------
> +flag in the queue_limits feature field.  For devices that also support the FUA
> +bit the block layer needs to be told to pass on the REQ_FUA bit by also setting
> +the
>  
> -For devices that do not support volatile write caches there is no driver
> -support required, the block layer completes empty REQ_PREFLUSH requests before
> -entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
> -requests that have a payload.  For devices with volatile write caches the
> -driver needs to tell the block layer that it supports flushing caches by
> -doing::
> +   BLK_FEAT_FUA
> +
> +flag in the features field of the queue_limits structure.
> +
> +Implementation details for bio based block drivers
> +--------------------------------------------------
> +
> +For bio based drivers the REQ_PREFLUSH and REQ_FUA bit are simplify passed on
> +to the driver if the drivers sets the BLK_FEAT_WRITE_CACHE flag and the drivers
> +needs to handle them.
> +
> +*NOTE*: The REQ_FUA bit also gets passed on when the BLK_FEAT_FUA flags is
> +_not_ set.  Any bio based driver that sets BLK_FEAT_WRITE_CACHE also needs to
> +handle REQ_FUA.
>  
> -	blk_queue_write_cache(sdkp->disk->queue, true, false);
> +For remapping drivers the REQ_FUA bits need to be propagated to underlying
> +devices, and a global flush needs to be implemented for bios with the
> +REQ_PREFLUSH bit set.
>  
> -and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
> -REQ_PREFLUSH requests with a payload are automatically turned into a sequence
> -of an empty REQ_OP_FLUSH request followed by the actual write by the block
> -layer.  For devices that also support the FUA bit the block layer needs
> -to be told to pass through the REQ_FUA bit using::
> +Implementation details for blk-mq drivers
> +-----------------------------------------
>  
> -	blk_queue_write_cache(sdkp->disk->queue, true, true);
> +When the BLK_FEAT_WRITE_CACHE flag is set, REQ_OP_WRITE | REQ_PREFLUSH requests
> +with a payload are automatically turned into a sequence of a REQ_OP_FLUSH
> +request followed by the actual write by the block layer.
>  
> -and the driver must handle write requests that have the REQ_FUA bit set
> -in prep_fn/request_fn.  If the FUA bit is not natively supported the block
> -layer turns it into an empty REQ_OP_FLUSH request after the actual write.
> +When the BLK_FEA_FUA flags is set, the REQ_FUA bit simplify passed on for the

s/BLK_FEA_FUA/BLK_FEAT_FUA

> +REQ_OP_WRITE request, else a REQ_OP_FLUSH request is sent by the block layer
> +after the completion of the write request for bio submissions with the REQ_FUA
> +bit set.
	
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 5c787965b7d09e..4f524c1d5e08bd 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -423,32 +423,41 @@ static ssize_t queue_io_timeout_store(struct request_queue *q, const char *page,
>  
>  static ssize_t queue_wc_show(struct request_queue *q, char *page)
>  {
> -	if (test_bit(QUEUE_FLAG_WC, &q->queue_flags))
> -		return sprintf(page, "write back\n");
> -
> -	return sprintf(page, "write through\n");
> +	if (q->limits.features & BLK_FLAGS_WRITE_CACHE_DISABLED)
> +		return sprintf(page, "write through\n");
> +	return sprintf(page, "write back\n");
>  }
>  
>  static ssize_t queue_wc_store(struct request_queue *q, const char *page,
>  			      size_t count)
>  {
> +	struct queue_limits lim;
> +	bool disable;
> +	int err;
> +
>  	if (!strncmp(page, "write back", 10)) {
> -		if (!test_bit(QUEUE_FLAG_HW_WC, &q->queue_flags))
> -			return -EINVAL;
> -		blk_queue_flag_set(QUEUE_FLAG_WC, q);
> +		disable = false;
>  	} else if (!strncmp(page, "write through", 13) ||
> -		 !strncmp(page, "none", 4)) {
> -		blk_queue_flag_clear(QUEUE_FLAG_WC, q);
> +		   !strncmp(page, "none", 4)) {
> +		disable = true;
>  	} else {
>  		return -EINVAL;
>  	}

I think you can drop the curly brackets for this chain of if-else-if-else.

>  
> +	lim = queue_limits_start_update(q);
> +	if (disable)
> +		lim.flags |= BLK_FLAGS_WRITE_CACHE_DISABLED;
> +	else
> +		lim.flags &= ~BLK_FLAGS_WRITE_CACHE_DISABLED;
> +	err = queue_limits_commit_update(q, &lim);
> +	if (err)
> +		return err;
>  	return count;
>  }


> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index fd789eeb62d943..fbe125d55e25b4 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -1686,34 +1686,16 @@ int dm_calculate_queue_limits(struct dm_table *t,
>  	return validate_hardware_logical_block_alignment(t, limits);
>  }
>  
> -static int device_flush_capable(struct dm_target *ti, struct dm_dev *dev,
> -				sector_t start, sector_t len, void *data)
> -{
> -	unsigned long flush = (unsigned long) data;
> -	struct request_queue *q = bdev_get_queue(dev->bdev);
> -
> -	return (q->queue_flags & flush);
> -}
> -
> -static bool dm_table_supports_flush(struct dm_table *t, unsigned long flush)
> +/*
> + * Check if an target requires flush support even if none of the underlying

s/an/a

> + * devices need it (e.g. to persist target-specific metadata).
> + */
> +static bool dm_table_supports_flush(struct dm_table *t)
>  {
> -	/*
> -	 * Require at least one underlying device to support flushes.
> -	 * t->devices includes internal dm devices such as mirror logs
> -	 * so we need to use iterate_devices here, which targets
> -	 * supporting flushes must provide.
> -	 */
>  	for (unsigned int i = 0; i < t->num_targets; i++) {
>  		struct dm_target *ti = dm_table_get_target(t, i);
>  
> -		if (!ti->num_flush_bios)
> -			continue;
> -
> -		if (ti->flush_supported)
> -			return true;
> -
> -		if (ti->type->iterate_devices &&
> -		    ti->type->iterate_devices(ti, device_flush_capable, (void *) flush))
> +		if (ti->num_flush_bios && ti->flush_supported)
>  			return true;
>  	}


> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c792d4d81e5fcc..4e8931a2c76b07 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -282,6 +282,28 @@ static inline bool blk_op_is_passthrough(blk_opf_t op)
>  	return op == REQ_OP_DRV_IN || op == REQ_OP_DRV_OUT;
>  }
>  
> +/* flags set by the driver in queue_limits.features */
> +enum {
> +	/* supports a a volatile write cache */

Repeated "a".

> +	BLK_FEAT_WRITE_CACHE			= (1u << 0),
> +
> +	/* supports passing on the FUA bit */
> +	BLK_FEAT_FUA				= (1u << 1),
> +};


> +static inline bool blk_queue_write_cache(struct request_queue *q)
> +{
> +	return (q->limits.features & BLK_FEAT_WRITE_CACHE) &&
> +		(q->limits.flags & BLK_FLAGS_WRITE_CACHE_DISABLED);

Hmm, shouldn't this be !(q->limits.flags & BLK_FLAGS_WRITE_CACHE_DISABLED) ?

> +}
> +
>  static inline bool bdev_write_cache(struct block_device *bdev)
>  {
> -	return test_bit(QUEUE_FLAG_WC, &bdev_get_queue(bdev)->queue_flags);
> +	return blk_queue_write_cache(bdev_get_queue(bdev));
>  }
>  
>  static inline bool bdev_fua(struct block_device *bdev)
>  {
> -	return test_bit(QUEUE_FLAG_FUA, &bdev_get_queue(bdev)->queue_flags);
> +	return bdev_get_queue(bdev)->limits.features & BLK_FEAT_FUA;
>  }
>  
>  static inline bool bdev_nowait(struct block_device *bdev)

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 13/26] block: move cache control settings out of queue->flags
  2024-06-11  7:55   ` Damien Le Moal
@ 2024-06-12  4:54     ` Christoph Hellwig
  0 siblings, 0 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-12  4:54 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Tue, Jun 11, 2024 at 04:55:04PM +0900, Damien Le Moal wrote:
> On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> > Move the cache control settings into the queue_limits so that they
> > can be set atomically and all I/O is frozen when changing the
> > flags.
> 
> ...so that they can be set atomically with the device queue frozen when
> changing the flags.
> 
> may be better.

Sure.

If there was anything below I've skipped it after skipping over two
pages of full quotes.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 13/26] block: move cache control settings out of queue->flags
  2024-06-11  5:19 ` [PATCH 13/26] block: move cache control settings out of queue->flags Christoph Hellwig
  2024-06-11  7:55   ` Damien Le Moal
@ 2024-06-11  9:58   ` Hannes Reinecke
  2024-06-12  4:52     ` Christoph Hellwig
  2024-06-12 14:53   ` Ulf Hansson
  2 siblings, 1 reply; 104+ messages in thread
From: Hannes Reinecke @ 2024-06-11  9:58 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 07:19, Christoph Hellwig wrote:
> Move the cache control settings into the queue_limits so that they
> can be set atomically and all I/O is frozen when changing the
> flags.
> 
> Add new features and flags field for the driver set flags, and internal
> (usually sysfs-controlled) flags in the block layer.  Note that we'll
> eventually remove enough field from queue_limits to bring it back to the
> previous size.
> 
> The disable flag is inverted compared to the previous meaning, which
> means it now survives a rescan, similar to the max_sectors and
> max_discard_sectors user limits.
> 
> The FLUSH and FUA flags are now inherited by blk_stack_limits, which
> simplified the code in dm a lot, but also causes a slight behavior
> change in that dm-switch and dm-unstripe now advertise a write cache
> despite setting num_flush_bios to 0.  The I/O path will handle this
> gracefully, but as far as I can tell the lack of num_flush_bios
> and thus flush support is a pre-existing data integrity bug in those
> targets that really needs fixing, after which a non-zero num_flush_bios
> should be required in dm for targets that map to underlying devices.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>   .../block/writeback_cache_control.rst         | 67 +++++++++++--------
>   arch/um/drivers/ubd_kern.c                    |  2 +-
>   block/blk-core.c                              |  2 +-
>   block/blk-flush.c                             |  9 ++-
>   block/blk-mq-debugfs.c                        |  2 -
>   block/blk-settings.c                          | 29 ++------
>   block/blk-sysfs.c                             | 29 +++++---
>   block/blk-wbt.c                               |  4 +-
>   drivers/block/drbd/drbd_main.c                |  2 +-
>   drivers/block/loop.c                          |  9 +--
>   drivers/block/nbd.c                           | 14 ++--
>   drivers/block/null_blk/main.c                 | 12 ++--
>   drivers/block/ps3disk.c                       |  7 +-
>   drivers/block/rnbd/rnbd-clt.c                 | 10 +--
>   drivers/block/ublk_drv.c                      |  8 ++-
>   drivers/block/virtio_blk.c                    | 20 ++++--
>   drivers/block/xen-blkfront.c                  |  9 ++-
>   drivers/md/bcache/super.c                     |  7 +-
>   drivers/md/dm-table.c                         | 39 +++--------
>   drivers/md/md.c                               |  8 ++-
>   drivers/mmc/core/block.c                      | 42 ++++++------
>   drivers/mmc/core/queue.c                      | 12 ++--
>   drivers/mmc/core/queue.h                      |  3 +-
>   drivers/mtd/mtd_blkdevs.c                     |  5 +-
>   drivers/nvdimm/pmem.c                         |  4 +-
>   drivers/nvme/host/core.c                      |  7 +-
>   drivers/nvme/host/multipath.c                 |  6 --
>   drivers/scsi/sd.c                             | 28 +++++---
>   include/linux/blkdev.h                        | 38 +++++++++--
>   29 files changed, 227 insertions(+), 207 deletions(-)
> 
> diff --git a/Documentation/block/writeback_cache_control.rst b/Documentation/block/writeback_cache_control.rst
> index b208488d0aae85..9cfe27f90253c7 100644
> --- a/Documentation/block/writeback_cache_control.rst
> +++ b/Documentation/block/writeback_cache_control.rst
> @@ -46,41 +46,50 @@ worry if the underlying devices need any explicit cache flushing and how
>   the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
>   may both be set on a single bio.
>   
> +Feature settings for block drivers
> +----------------------------------
>   
> -Implementation details for bio based block drivers
> ---------------------------------------------------------------
> +For devices that do not support volatile write caches there is no driver
> +support required, the block layer completes empty REQ_PREFLUSH requests before
> +entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
> +requests that have a payload.
>   
> -These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
> -directly below the submit_bio interface.  For remapping drivers the REQ_FUA
> -bits need to be propagated to underlying devices, and a global flush needs
> -to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
> -drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
> -on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
> -data can be completed successfully without doing any work.  Drivers for
> -devices with volatile caches need to implement the support for these
> -flags themselves without any help from the block layer.
> +For devices with volatile write caches the driver needs to tell the block layer
> +that it supports flushing caches by setting the
>   
> +   BLK_FEAT_WRITE_CACHE
>   
> -Implementation details for request_fn based block drivers
> ----------------------------------------------------------
> +flag in the queue_limits feature field.  For devices that also support the FUA
> +bit the block layer needs to be told to pass on the REQ_FUA bit by also setting
> +the
>   
> -For devices that do not support volatile write caches there is no driver
> -support required, the block layer completes empty REQ_PREFLUSH requests before
> -entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
> -requests that have a payload.  For devices with volatile write caches the
> -driver needs to tell the block layer that it supports flushing caches by
> -doing::
> +   BLK_FEAT_FUA
> +
> +flag in the features field of the queue_limits structure.
> +
> +Implementation details for bio based block drivers
> +--------------------------------------------------
> +
> +For bio based drivers the REQ_PREFLUSH and REQ_FUA bit are simplify passed on
> +to the driver if the drivers sets the BLK_FEAT_WRITE_CACHE flag and the drivers
> +needs to handle them.
> +
> +*NOTE*: The REQ_FUA bit also gets passed on when the BLK_FEAT_FUA flags is
> +_not_ set.  Any bio based driver that sets BLK_FEAT_WRITE_CACHE also needs to
> +handle REQ_FUA.
>   
> -	blk_queue_write_cache(sdkp->disk->queue, true, false);
> +For remapping drivers the REQ_FUA bits need to be propagated to underlying
> +devices, and a global flush needs to be implemented for bios with the
> +REQ_PREFLUSH bit set.
>   
> -and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
> -REQ_PREFLUSH requests with a payload are automatically turned into a sequence
> -of an empty REQ_OP_FLUSH request followed by the actual write by the block
> -layer.  For devices that also support the FUA bit the block layer needs
> -to be told to pass through the REQ_FUA bit using::
> +Implementation details for blk-mq drivers
> +-----------------------------------------
>   
> -	blk_queue_write_cache(sdkp->disk->queue, true, true);
> +When the BLK_FEAT_WRITE_CACHE flag is set, REQ_OP_WRITE | REQ_PREFLUSH requests
> +with a payload are automatically turned into a sequence of a REQ_OP_FLUSH
> +request followed by the actual write by the block layer.
>   
> -and the driver must handle write requests that have the REQ_FUA bit set
> -in prep_fn/request_fn.  If the FUA bit is not natively supported the block
> -layer turns it into an empty REQ_OP_FLUSH request after the actual write.
> +When the BLK_FEA_FUA flags is set, the REQ_FUA bit simplify passed on for the
> +REQ_OP_WRITE request, else a REQ_OP_FLUSH request is sent by the block layer
> +after the completion of the write request for bio submissions with the REQ_FUA
> +bit set.
> diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
> index cdcb75a68989dd..19e01691ea0ea7 100644
> --- a/arch/um/drivers/ubd_kern.c
> +++ b/arch/um/drivers/ubd_kern.c
> @@ -835,6 +835,7 @@ static int ubd_add(int n, char **error_out)
>   	struct queue_limits lim = {
>   		.max_segments		= MAX_SG,
>   		.seg_boundary_mask	= PAGE_SIZE - 1,
> +		.features		= BLK_FEAT_WRITE_CACHE,
>   	};
>   	struct gendisk *disk;
>   	int err = 0;
> @@ -882,7 +883,6 @@ static int ubd_add(int n, char **error_out)
>   	}
>   
>   	blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
> -	blk_queue_write_cache(disk->queue, true, false);
>   	disk->major = UBD_MAJOR;
>   	disk->first_minor = n << UBD_SHIFT;
>   	disk->minors = 1 << UBD_SHIFT;
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 82c3ae22d76d88..2b45a4df9a1aa1 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -782,7 +782,7 @@ void submit_bio_noacct(struct bio *bio)
>   		if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE &&
>   				 bio_op(bio) != REQ_OP_ZONE_APPEND))
>   			goto end_io;
> -		if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
> +		if (!bdev_write_cache(bdev)) {
>   			bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
>   			if (!bio_sectors(bio)) {
>   				status = BLK_STS_OK;
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 2234f8b3fc05f2..30b9d5033a2b85 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -381,8 +381,8 @@ static void blk_rq_init_flush(struct request *rq)
>   bool blk_insert_flush(struct request *rq)
>   {
>   	struct request_queue *q = rq->q;
> -	unsigned long fflags = q->queue_flags;	/* may change, cache */
>   	struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
> +	bool supports_fua = q->limits.features & BLK_FEAT_FUA;

Shouldn't we have a helper like blk_feat_fua() here?

>   	unsigned int policy = 0;
>   
>   	/* FLUSH/FUA request must never be merged */
> @@ -394,11 +394,10 @@ bool blk_insert_flush(struct request *rq)
>   	/*
>   	 * Check which flushes we need to sequence for this operation.
>   	 */
> -	if (fflags & (1UL << QUEUE_FLAG_WC)) {
> +	if (blk_queue_write_cache(q)) {
>   		if (rq->cmd_flags & REQ_PREFLUSH)
>   			policy |= REQ_FSEQ_PREFLUSH;
> -		if (!(fflags & (1UL << QUEUE_FLAG_FUA)) &&
> -		    (rq->cmd_flags & REQ_FUA))
> +		if ((rq->cmd_flags & REQ_FUA) && !supports_fua)
>   			policy |= REQ_FSEQ_POSTFLUSH;
>   	}
>   
> @@ -407,7 +406,7 @@ bool blk_insert_flush(struct request *rq)
>   	 * REQ_PREFLUSH and FUA for the driver.
>   	 */
>   	rq->cmd_flags &= ~REQ_PREFLUSH;
> -	if (!(fflags & (1UL << QUEUE_FLAG_FUA)))
> +	if (!supports_fua)
>   		rq->cmd_flags &= ~REQ_FUA;
>   
>   	/*
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 770c0c2b72faaa..e8b9db7c30c455 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -93,8 +93,6 @@ static const char *const blk_queue_flag_name[] = {
>   	QUEUE_FLAG_NAME(INIT_DONE),
>   	QUEUE_FLAG_NAME(STABLE_WRITES),
>   	QUEUE_FLAG_NAME(POLL),
> -	QUEUE_FLAG_NAME(WC),
> -	QUEUE_FLAG_NAME(FUA),
>   	QUEUE_FLAG_NAME(DAX),
>   	QUEUE_FLAG_NAME(STATS),
>   	QUEUE_FLAG_NAME(REGISTERED),
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index f11c8676eb4c67..536ee202fcdccb 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -261,6 +261,9 @@ static int blk_validate_limits(struct queue_limits *lim)
>   		lim->misaligned = 0;
>   	}
>   
> +	if (!(lim->features & BLK_FEAT_WRITE_CACHE))
> +		lim->features &= ~BLK_FEAT_FUA;
> +
>   	err = blk_validate_integrity_limits(lim);
>   	if (err)
>   		return err;
> @@ -454,6 +457,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>   {
>   	unsigned int top, bottom, alignment, ret = 0;
>   
> +	t->features |= (b->features & BLK_FEAT_INHERIT_MASK);
> +
>   	t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
>   	t->max_user_sectors = min_not_zero(t->max_user_sectors,
>   			b->max_user_sectors);
> @@ -711,30 +716,6 @@ void blk_set_queue_depth(struct request_queue *q, unsigned int depth)
>   }
>   EXPORT_SYMBOL(blk_set_queue_depth);
>   
> -/**
> - * blk_queue_write_cache - configure queue's write cache
> - * @q:		the request queue for the device
> - * @wc:		write back cache on or off
> - * @fua:	device supports FUA writes, if true
> - *
> - * Tell the block layer about the write cache of @q.
> - */
> -void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
> -{
> -	if (wc) {
> -		blk_queue_flag_set(QUEUE_FLAG_HW_WC, q);
> -		blk_queue_flag_set(QUEUE_FLAG_WC, q);
> -	} else {
> -		blk_queue_flag_clear(QUEUE_FLAG_HW_WC, q);
> -		blk_queue_flag_clear(QUEUE_FLAG_WC, q);
> -	}
> -	if (fua)
> -		blk_queue_flag_set(QUEUE_FLAG_FUA, q);
> -	else
> -		blk_queue_flag_clear(QUEUE_FLAG_FUA, q);
> -}
> -EXPORT_SYMBOL_GPL(blk_queue_write_cache);
> -
>   int bdev_alignment_offset(struct block_device *bdev)
>   {
>   	struct request_queue *q = bdev_get_queue(bdev);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 5c787965b7d09e..4f524c1d5e08bd 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -423,32 +423,41 @@ static ssize_t queue_io_timeout_store(struct request_queue *q, const char *page,
>   
>   static ssize_t queue_wc_show(struct request_queue *q, char *page)
>   {
> -	if (test_bit(QUEUE_FLAG_WC, &q->queue_flags))
> -		return sprintf(page, "write back\n");
> -
> -	return sprintf(page, "write through\n");
> +	if (q->limits.features & BLK_FLAGS_WRITE_CACHE_DISABLED)

Where is the difference between 'flags' and 'features'?
Ie why is is named BLK_FEAT_FUA but BLK_FLAGS_WRITE_CACHE_DISABLED?
And if the feature is the existence of a capability, and the flag is
the setting of that capability, can you make it clear in the documentation?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                  Kernel Storage Architect
hare@suse.de                                +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 13/26] block: move cache control settings out of queue->flags
  2024-06-11  9:58   ` Hannes Reinecke
@ 2024-06-12  4:52     ` Christoph Hellwig
  0 siblings, 0 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-12  4:52 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

A friendly reminder that I've skipped over the full quote.  Please
properly quote mails if you want your replies to be seen.


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 13/26] block: move cache control settings out of queue->flags
  2024-06-11  5:19 ` [PATCH 13/26] block: move cache control settings out of queue->flags Christoph Hellwig
  2024-06-11  7:55   ` Damien Le Moal
  2024-06-11  9:58   ` Hannes Reinecke
@ 2024-06-12 14:53   ` Ulf Hansson
  2 siblings, 0 replies; 104+ messages in thread
From: Ulf Hansson @ 2024-06-12 14:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Geert Uytterhoeven, Richard Weinberger,
	Philipp Reisner, Lars Ellenberg, Christoph Böhmwalder,
	Josef Bacik, Ming Lei, Michael S. Tsirkin, Jason Wang,
	Roger Pau Monné, Alasdair Kergon, Mike Snitzer,
	Mikulas Patocka, Song Liu, Yu Kuai, Vineeth Vijayan,
	Martin K. Petersen, linux-m68k, linux-um, drbd-dev, nbd,
	linuxppc-dev, ceph-devel, virtualization, xen-devel, linux-bcache,
	dm-devel, linux-raid, linux-mmc, linux-mtd, nvdimm, linux-nvme,
	linux-s390, linux-scsi, linux-block

On Tue, 11 Jun 2024 at 07:24, Christoph Hellwig <hch@lst.de> wrote:
>
> Move the cache control settings into the queue_limits so that they
> can be set atomically and all I/O is frozen when changing the
> flags.
>
> Add new features and flags field for the driver set flags, and internal
> (usually sysfs-controlled) flags in the block layer.  Note that we'll
> eventually remove enough field from queue_limits to bring it back to the
> previous size.
>
> The disable flag is inverted compared to the previous meaning, which
> means it now survives a rescan, similar to the max_sectors and
> max_discard_sectors user limits.
>
> The FLUSH and FUA flags are now inherited by blk_stack_limits, which
> simplified the code in dm a lot, but also causes a slight behavior
> change in that dm-switch and dm-unstripe now advertise a write cache
> despite setting num_flush_bios to 0.  The I/O path will handle this
> gracefully, but as far as I can tell the lack of num_flush_bios
> and thus flush support is a pre-existing data integrity bug in those
> targets that really needs fixing, after which a non-zero num_flush_bios
> should be required in dm for targets that map to underlying devices.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC

FYI, for now I don't expect any other patches in my mmc tree to clash
with this for v6.11, assuming that is the target.

Kind regards
Uffe

> ---
>  .../block/writeback_cache_control.rst         | 67 +++++++++++--------
>  arch/um/drivers/ubd_kern.c                    |  2 +-
>  block/blk-core.c                              |  2 +-
>  block/blk-flush.c                             |  9 ++-
>  block/blk-mq-debugfs.c                        |  2 -
>  block/blk-settings.c                          | 29 ++------
>  block/blk-sysfs.c                             | 29 +++++---
>  block/blk-wbt.c                               |  4 +-
>  drivers/block/drbd/drbd_main.c                |  2 +-
>  drivers/block/loop.c                          |  9 +--
>  drivers/block/nbd.c                           | 14 ++--
>  drivers/block/null_blk/main.c                 | 12 ++--
>  drivers/block/ps3disk.c                       |  7 +-
>  drivers/block/rnbd/rnbd-clt.c                 | 10 +--
>  drivers/block/ublk_drv.c                      |  8 ++-
>  drivers/block/virtio_blk.c                    | 20 ++++--
>  drivers/block/xen-blkfront.c                  |  9 ++-
>  drivers/md/bcache/super.c                     |  7 +-
>  drivers/md/dm-table.c                         | 39 +++--------
>  drivers/md/md.c                               |  8 ++-
>  drivers/mmc/core/block.c                      | 42 ++++++------
>  drivers/mmc/core/queue.c                      | 12 ++--
>  drivers/mmc/core/queue.h                      |  3 +-
>  drivers/mtd/mtd_blkdevs.c                     |  5 +-
>  drivers/nvdimm/pmem.c                         |  4 +-
>  drivers/nvme/host/core.c                      |  7 +-
>  drivers/nvme/host/multipath.c                 |  6 --
>  drivers/scsi/sd.c                             | 28 +++++---
>  include/linux/blkdev.h                        | 38 +++++++++--
>  29 files changed, 227 insertions(+), 207 deletions(-)
>
> diff --git a/Documentation/block/writeback_cache_control.rst b/Documentation/block/writeback_cache_control.rst
> index b208488d0aae85..9cfe27f90253c7 100644
> --- a/Documentation/block/writeback_cache_control.rst
> +++ b/Documentation/block/writeback_cache_control.rst
> @@ -46,41 +46,50 @@ worry if the underlying devices need any explicit cache flushing and how
>  the Forced Unit Access is implemented.  The REQ_PREFLUSH and REQ_FUA flags
>  may both be set on a single bio.
>
> +Feature settings for block drivers
> +----------------------------------
>
> -Implementation details for bio based block drivers
> ---------------------------------------------------------------
> +For devices that do not support volatile write caches there is no driver
> +support required, the block layer completes empty REQ_PREFLUSH requests before
> +entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
> +requests that have a payload.
>
> -These drivers will always see the REQ_PREFLUSH and REQ_FUA bits as they sit
> -directly below the submit_bio interface.  For remapping drivers the REQ_FUA
> -bits need to be propagated to underlying devices, and a global flush needs
> -to be implemented for bios with the REQ_PREFLUSH bit set.  For real device
> -drivers that do not have a volatile cache the REQ_PREFLUSH and REQ_FUA bits
> -on non-empty bios can simply be ignored, and REQ_PREFLUSH requests without
> -data can be completed successfully without doing any work.  Drivers for
> -devices with volatile caches need to implement the support for these
> -flags themselves without any help from the block layer.
> +For devices with volatile write caches the driver needs to tell the block layer
> +that it supports flushing caches by setting the
>
> +   BLK_FEAT_WRITE_CACHE
>
> -Implementation details for request_fn based block drivers
> ----------------------------------------------------------
> +flag in the queue_limits feature field.  For devices that also support the FUA
> +bit the block layer needs to be told to pass on the REQ_FUA bit by also setting
> +the
>
> -For devices that do not support volatile write caches there is no driver
> -support required, the block layer completes empty REQ_PREFLUSH requests before
> -entering the driver and strips off the REQ_PREFLUSH and REQ_FUA bits from
> -requests that have a payload.  For devices with volatile write caches the
> -driver needs to tell the block layer that it supports flushing caches by
> -doing::
> +   BLK_FEAT_FUA
> +
> +flag in the features field of the queue_limits structure.
> +
> +Implementation details for bio based block drivers
> +--------------------------------------------------
> +
> +For bio based drivers the REQ_PREFLUSH and REQ_FUA bit are simplify passed on
> +to the driver if the drivers sets the BLK_FEAT_WRITE_CACHE flag and the drivers
> +needs to handle them.
> +
> +*NOTE*: The REQ_FUA bit also gets passed on when the BLK_FEAT_FUA flags is
> +_not_ set.  Any bio based driver that sets BLK_FEAT_WRITE_CACHE also needs to
> +handle REQ_FUA.
>
> -       blk_queue_write_cache(sdkp->disk->queue, true, false);
> +For remapping drivers the REQ_FUA bits need to be propagated to underlying
> +devices, and a global flush needs to be implemented for bios with the
> +REQ_PREFLUSH bit set.
>
> -and handle empty REQ_OP_FLUSH requests in its prep_fn/request_fn.  Note that
> -REQ_PREFLUSH requests with a payload are automatically turned into a sequence
> -of an empty REQ_OP_FLUSH request followed by the actual write by the block
> -layer.  For devices that also support the FUA bit the block layer needs
> -to be told to pass through the REQ_FUA bit using::
> +Implementation details for blk-mq drivers
> +-----------------------------------------
>
> -       blk_queue_write_cache(sdkp->disk->queue, true, true);
> +When the BLK_FEAT_WRITE_CACHE flag is set, REQ_OP_WRITE | REQ_PREFLUSH requests
> +with a payload are automatically turned into a sequence of a REQ_OP_FLUSH
> +request followed by the actual write by the block layer.
>
> -and the driver must handle write requests that have the REQ_FUA bit set
> -in prep_fn/request_fn.  If the FUA bit is not natively supported the block
> -layer turns it into an empty REQ_OP_FLUSH request after the actual write.
> +When the BLK_FEA_FUA flags is set, the REQ_FUA bit simplify passed on for the
> +REQ_OP_WRITE request, else a REQ_OP_FLUSH request is sent by the block layer
> +after the completion of the write request for bio submissions with the REQ_FUA
> +bit set.
> diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
> index cdcb75a68989dd..19e01691ea0ea7 100644
> --- a/arch/um/drivers/ubd_kern.c
> +++ b/arch/um/drivers/ubd_kern.c
> @@ -835,6 +835,7 @@ static int ubd_add(int n, char **error_out)
>         struct queue_limits lim = {
>                 .max_segments           = MAX_SG,
>                 .seg_boundary_mask      = PAGE_SIZE - 1,
> +               .features               = BLK_FEAT_WRITE_CACHE,
>         };
>         struct gendisk *disk;
>         int err = 0;
> @@ -882,7 +883,6 @@ static int ubd_add(int n, char **error_out)
>         }
>
>         blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
> -       blk_queue_write_cache(disk->queue, true, false);
>         disk->major = UBD_MAJOR;
>         disk->first_minor = n << UBD_SHIFT;
>         disk->minors = 1 << UBD_SHIFT;
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 82c3ae22d76d88..2b45a4df9a1aa1 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -782,7 +782,7 @@ void submit_bio_noacct(struct bio *bio)
>                 if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_WRITE &&
>                                  bio_op(bio) != REQ_OP_ZONE_APPEND))
>                         goto end_io;
> -               if (!test_bit(QUEUE_FLAG_WC, &q->queue_flags)) {
> +               if (!bdev_write_cache(bdev)) {
>                         bio->bi_opf &= ~(REQ_PREFLUSH | REQ_FUA);
>                         if (!bio_sectors(bio)) {
>                                 status = BLK_STS_OK;
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index 2234f8b3fc05f2..30b9d5033a2b85 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -381,8 +381,8 @@ static void blk_rq_init_flush(struct request *rq)
>  bool blk_insert_flush(struct request *rq)
>  {
>         struct request_queue *q = rq->q;
> -       unsigned long fflags = q->queue_flags;  /* may change, cache */
>         struct blk_flush_queue *fq = blk_get_flush_queue(q, rq->mq_ctx);
> +       bool supports_fua = q->limits.features & BLK_FEAT_FUA;
>         unsigned int policy = 0;
>
>         /* FLUSH/FUA request must never be merged */
> @@ -394,11 +394,10 @@ bool blk_insert_flush(struct request *rq)
>         /*
>          * Check which flushes we need to sequence for this operation.
>          */
> -       if (fflags & (1UL << QUEUE_FLAG_WC)) {
> +       if (blk_queue_write_cache(q)) {
>                 if (rq->cmd_flags & REQ_PREFLUSH)
>                         policy |= REQ_FSEQ_PREFLUSH;
> -               if (!(fflags & (1UL << QUEUE_FLAG_FUA)) &&
> -                   (rq->cmd_flags & REQ_FUA))
> +               if ((rq->cmd_flags & REQ_FUA) && !supports_fua)
>                         policy |= REQ_FSEQ_POSTFLUSH;
>         }
>
> @@ -407,7 +406,7 @@ bool blk_insert_flush(struct request *rq)
>          * REQ_PREFLUSH and FUA for the driver.
>          */
>         rq->cmd_flags &= ~REQ_PREFLUSH;
> -       if (!(fflags & (1UL << QUEUE_FLAG_FUA)))
> +       if (!supports_fua)
>                 rq->cmd_flags &= ~REQ_FUA;
>
>         /*
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 770c0c2b72faaa..e8b9db7c30c455 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -93,8 +93,6 @@ static const char *const blk_queue_flag_name[] = {
>         QUEUE_FLAG_NAME(INIT_DONE),
>         QUEUE_FLAG_NAME(STABLE_WRITES),
>         QUEUE_FLAG_NAME(POLL),
> -       QUEUE_FLAG_NAME(WC),
> -       QUEUE_FLAG_NAME(FUA),
>         QUEUE_FLAG_NAME(DAX),
>         QUEUE_FLAG_NAME(STATS),
>         QUEUE_FLAG_NAME(REGISTERED),
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index f11c8676eb4c67..536ee202fcdccb 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -261,6 +261,9 @@ static int blk_validate_limits(struct queue_limits *lim)
>                 lim->misaligned = 0;
>         }
>
> +       if (!(lim->features & BLK_FEAT_WRITE_CACHE))
> +               lim->features &= ~BLK_FEAT_FUA;
> +
>         err = blk_validate_integrity_limits(lim);
>         if (err)
>                 return err;
> @@ -454,6 +457,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
>  {
>         unsigned int top, bottom, alignment, ret = 0;
>
> +       t->features |= (b->features & BLK_FEAT_INHERIT_MASK);
> +
>         t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
>         t->max_user_sectors = min_not_zero(t->max_user_sectors,
>                         b->max_user_sectors);
> @@ -711,30 +716,6 @@ void blk_set_queue_depth(struct request_queue *q, unsigned int depth)
>  }
>  EXPORT_SYMBOL(blk_set_queue_depth);
>
> -/**
> - * blk_queue_write_cache - configure queue's write cache
> - * @q:         the request queue for the device
> - * @wc:                write back cache on or off
> - * @fua:       device supports FUA writes, if true
> - *
> - * Tell the block layer about the write cache of @q.
> - */
> -void blk_queue_write_cache(struct request_queue *q, bool wc, bool fua)
> -{
> -       if (wc) {
> -               blk_queue_flag_set(QUEUE_FLAG_HW_WC, q);
> -               blk_queue_flag_set(QUEUE_FLAG_WC, q);
> -       } else {
> -               blk_queue_flag_clear(QUEUE_FLAG_HW_WC, q);
> -               blk_queue_flag_clear(QUEUE_FLAG_WC, q);
> -       }
> -       if (fua)
> -               blk_queue_flag_set(QUEUE_FLAG_FUA, q);
> -       else
> -               blk_queue_flag_clear(QUEUE_FLAG_FUA, q);
> -}
> -EXPORT_SYMBOL_GPL(blk_queue_write_cache);
> -
>  int bdev_alignment_offset(struct block_device *bdev)
>  {
>         struct request_queue *q = bdev_get_queue(bdev);
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 5c787965b7d09e..4f524c1d5e08bd 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -423,32 +423,41 @@ static ssize_t queue_io_timeout_store(struct request_queue *q, const char *page,
>
>  static ssize_t queue_wc_show(struct request_queue *q, char *page)
>  {
> -       if (test_bit(QUEUE_FLAG_WC, &q->queue_flags))
> -               return sprintf(page, "write back\n");
> -
> -       return sprintf(page, "write through\n");
> +       if (q->limits.features & BLK_FLAGS_WRITE_CACHE_DISABLED)
> +               return sprintf(page, "write through\n");
> +       return sprintf(page, "write back\n");
>  }
>
>  static ssize_t queue_wc_store(struct request_queue *q, const char *page,
>                               size_t count)
>  {
> +       struct queue_limits lim;
> +       bool disable;
> +       int err;
> +
>         if (!strncmp(page, "write back", 10)) {
> -               if (!test_bit(QUEUE_FLAG_HW_WC, &q->queue_flags))
> -                       return -EINVAL;
> -               blk_queue_flag_set(QUEUE_FLAG_WC, q);
> +               disable = false;
>         } else if (!strncmp(page, "write through", 13) ||
> -                !strncmp(page, "none", 4)) {
> -               blk_queue_flag_clear(QUEUE_FLAG_WC, q);
> +                  !strncmp(page, "none", 4)) {
> +               disable = true;
>         } else {
>                 return -EINVAL;
>         }
>
> +       lim = queue_limits_start_update(q);
> +       if (disable)
> +               lim.flags |= BLK_FLAGS_WRITE_CACHE_DISABLED;
> +       else
> +               lim.flags &= ~BLK_FLAGS_WRITE_CACHE_DISABLED;
> +       err = queue_limits_commit_update(q, &lim);
> +       if (err)
> +               return err;
>         return count;
>  }
>
>  static ssize_t queue_fua_show(struct request_queue *q, char *page)
>  {
> -       return sprintf(page, "%u\n", test_bit(QUEUE_FLAG_FUA, &q->queue_flags));
> +       return sprintf(page, "%u\n", !!(q->limits.features & BLK_FEAT_FUA));
>  }
>
>  static ssize_t queue_dax_show(struct request_queue *q, char *page)
> diff --git a/block/blk-wbt.c b/block/blk-wbt.c
> index 64472134dd26df..1a5e4b049ecd1d 100644
> --- a/block/blk-wbt.c
> +++ b/block/blk-wbt.c
> @@ -206,8 +206,8 @@ static void wbt_rqw_done(struct rq_wb *rwb, struct rq_wait *rqw,
>          */
>         if (wb_acct & WBT_DISCARD)
>                 limit = rwb->wb_background;
> -       else if (test_bit(QUEUE_FLAG_WC, &rwb->rqos.disk->queue->queue_flags) &&
> -                !wb_recent_wait(rwb))
> +       else if (blk_queue_write_cache(rwb->rqos.disk->queue) &&
> +                !wb_recent_wait(rwb))
>                 limit = 0;
>         else
>                 limit = rwb->wb_normal;
> diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
> index 113b441d4d3670..bf42a46781fa21 100644
> --- a/drivers/block/drbd/drbd_main.c
> +++ b/drivers/block/drbd/drbd_main.c
> @@ -2697,6 +2697,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
>                  * connect.
>                  */
>                 .max_hw_sectors         = DRBD_MAX_BIO_SIZE_SAFE >> 8,
> +               .features               = BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
>         };
>
>         device = minor_to_device(minor);
> @@ -2736,7 +2737,6 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
>         disk->private_data = device;
>
>         blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
> -       blk_queue_write_cache(disk->queue, true, true);
>
>         device->md_io.page = alloc_page(GFP_KERNEL);
>         if (!device->md_io.page)
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index 2c4a5eb3a6a7f9..0b23fdc4e2edcc 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -985,6 +985,9 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
>         lim.logical_block_size = bsize;
>         lim.physical_block_size = bsize;
>         lim.io_min = bsize;
> +       lim.features &= ~BLK_FEAT_WRITE_CACHE;
> +       if (file->f_op->fsync && !(lo->lo_flags & LO_FLAGS_READ_ONLY))
> +               lim.features |= BLK_FEAT_WRITE_CACHE;
>         if (!backing_bdev || bdev_nonrot(backing_bdev))
>                 blk_queue_flag_set(QUEUE_FLAG_NONROT, lo->lo_queue);
>         else
> @@ -1078,9 +1081,6 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
>         lo->old_gfp_mask = mapping_gfp_mask(mapping);
>         mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));
>
> -       if (!(lo->lo_flags & LO_FLAGS_READ_ONLY) && file->f_op->fsync)
> -               blk_queue_write_cache(lo->lo_queue, true, false);
> -
>         error = loop_reconfigure_limits(lo, config->block_size);
>         if (WARN_ON_ONCE(error))
>                 goto out_unlock;
> @@ -1131,9 +1131,6 @@ static void __loop_clr_fd(struct loop_device *lo, bool release)
>         struct file *filp;
>         gfp_t gfp = lo->old_gfp_mask;
>
> -       if (test_bit(QUEUE_FLAG_WC, &lo->lo_queue->queue_flags))
> -               blk_queue_write_cache(lo->lo_queue, false, false);
> -
>         /*
>          * Freeze the request queue when unbinding on a live file descriptor and
>          * thus an open device.  When called from ->release we are guaranteed
> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
> index 44b8c671921e5c..cb1c86a6a3fb9d 100644
> --- a/drivers/block/nbd.c
> +++ b/drivers/block/nbd.c
> @@ -342,12 +342,14 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
>                 lim.max_hw_discard_sectors = UINT_MAX;
>         else
>                 lim.max_hw_discard_sectors = 0;
> -       if (!(nbd->config->flags & NBD_FLAG_SEND_FLUSH))
> -               blk_queue_write_cache(nbd->disk->queue, false, false);
> -       else if (nbd->config->flags & NBD_FLAG_SEND_FUA)
> -               blk_queue_write_cache(nbd->disk->queue, true, true);
> -       else
> -               blk_queue_write_cache(nbd->disk->queue, true, false);
> +       if (!(nbd->config->flags & NBD_FLAG_SEND_FLUSH)) {
> +               lim.features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
> +       } else if (nbd->config->flags & NBD_FLAG_SEND_FUA) {
> +               lim.features |= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA;
> +       } else {
> +               lim.features |= BLK_FEAT_WRITE_CACHE;
> +               lim.features &= ~BLK_FEAT_FUA;
> +       }
>         lim.logical_block_size = blksize;
>         lim.physical_block_size = blksize;
>         error = queue_limits_commit_update(nbd->disk->queue, &lim);
> diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
> index 631dca2e4e8442..73e4aecf5bb492 100644
> --- a/drivers/block/null_blk/main.c
> +++ b/drivers/block/null_blk/main.c
> @@ -1928,6 +1928,13 @@ static int null_add_dev(struct nullb_device *dev)
>                         goto out_cleanup_tags;
>         }
>
> +       if (dev->cache_size > 0) {
> +               set_bit(NULLB_DEV_FL_CACHE, &nullb->dev->flags);
> +               lim.features |= BLK_FEAT_WRITE_CACHE;
> +               if (dev->fua)
> +                       lim.features |= BLK_FEAT_FUA;
> +       }
> +
>         nullb->disk = blk_mq_alloc_disk(nullb->tag_set, &lim, nullb);
>         if (IS_ERR(nullb->disk)) {
>                 rv = PTR_ERR(nullb->disk);
> @@ -1940,11 +1947,6 @@ static int null_add_dev(struct nullb_device *dev)
>                 nullb_setup_bwtimer(nullb);
>         }
>
> -       if (dev->cache_size > 0) {
> -               set_bit(NULLB_DEV_FL_CACHE, &nullb->dev->flags);
> -               blk_queue_write_cache(nullb->q, true, dev->fua);
> -       }
> -
>         nullb->q->queuedata = nullb;
>         blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
>
> diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
> index b810ac0a5c4b97..8b73cf459b5937 100644
> --- a/drivers/block/ps3disk.c
> +++ b/drivers/block/ps3disk.c
> @@ -388,9 +388,8 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
>                 .max_segments           = -1,
>                 .max_segment_size       = dev->bounce_size,
>                 .dma_alignment          = dev->blk_size - 1,
> +               .features               = BLK_FEAT_WRITE_CACHE,
>         };
> -
> -       struct request_queue *queue;
>         struct gendisk *gendisk;
>
>         if (dev->blk_size < 512) {
> @@ -447,10 +446,6 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
>                 goto fail_free_tag_set;
>         }
>
> -       queue = gendisk->queue;
> -
> -       blk_queue_write_cache(queue, true, false);
> -
>         priv->gendisk = gendisk;
>         gendisk->major = ps3disk_major;
>         gendisk->first_minor = devidx * PS3DISK_MINORS;
> diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
> index b7ffe03c61606d..02c4b173182719 100644
> --- a/drivers/block/rnbd/rnbd-clt.c
> +++ b/drivers/block/rnbd/rnbd-clt.c
> @@ -1389,6 +1389,12 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev *dev,
>                         le32_to_cpu(rsp->max_discard_sectors);
>         }
>
> +       if (rsp->cache_policy & RNBD_WRITEBACK) {
> +               lim.features |= BLK_FEAT_WRITE_CACHE;
> +               if (rsp->cache_policy & RNBD_FUA)
> +                       lim.features |= BLK_FEAT_FUA;
> +       }
> +
>         dev->gd = blk_mq_alloc_disk(&dev->sess->tag_set, &lim, dev);
>         if (IS_ERR(dev->gd))
>                 return PTR_ERR(dev->gd);
> @@ -1397,10 +1403,6 @@ static int rnbd_client_setup_device(struct rnbd_clt_dev *dev,
>
>         blk_queue_flag_set(QUEUE_FLAG_SAME_COMP, dev->queue);
>         blk_queue_flag_set(QUEUE_FLAG_SAME_FORCE, dev->queue);
> -       blk_queue_write_cache(dev->queue,
> -                             !!(rsp->cache_policy & RNBD_WRITEBACK),
> -                             !!(rsp->cache_policy & RNBD_FUA));
> -
>         return rnbd_clt_setup_gen_disk(dev, rsp, idx);
>  }
>
> diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
> index 4e159948c912c2..e45c65c1848d31 100644
> --- a/drivers/block/ublk_drv.c
> +++ b/drivers/block/ublk_drv.c
> @@ -487,8 +487,6 @@ static void ublk_dev_param_basic_apply(struct ublk_device *ub)
>         struct request_queue *q = ub->ub_disk->queue;
>         const struct ublk_param_basic *p = &ub->params.basic;
>
> -       blk_queue_write_cache(q, p->attrs & UBLK_ATTR_VOLATILE_CACHE,
> -                       p->attrs & UBLK_ATTR_FUA);
>         if (p->attrs & UBLK_ATTR_ROTATIONAL)
>                 blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
>         else
> @@ -2210,6 +2208,12 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
>                 lim.max_zone_append_sectors = p->max_zone_append_sectors;
>         }
>
> +       if (ub->params.basic.attrs & UBLK_ATTR_VOLATILE_CACHE) {
> +               lim.features |= BLK_FEAT_WRITE_CACHE;
> +               if (ub->params.basic.attrs & UBLK_ATTR_FUA)
> +                       lim.features |= BLK_FEAT_FUA;
> +       }
> +
>         if (wait_for_completion_interruptible(&ub->completion) != 0)
>                 return -EINTR;
>
> diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
> index 378b241911ca87..b1a3c293528519 100644
> --- a/drivers/block/virtio_blk.c
> +++ b/drivers/block/virtio_blk.c
> @@ -1100,6 +1100,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
>         struct gendisk *disk = dev_to_disk(dev);
>         struct virtio_blk *vblk = disk->private_data;
>         struct virtio_device *vdev = vblk->vdev;
> +       struct queue_limits lim;
>         int i;
>
>         BUG_ON(!virtio_has_feature(vblk->vdev, VIRTIO_BLK_F_CONFIG_WCE));
> @@ -1108,7 +1109,17 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
>                 return i;
>
>         virtio_cwrite8(vdev, offsetof(struct virtio_blk_config, wce), i);
> -       blk_queue_write_cache(disk->queue, virtblk_get_cache_mode(vdev), false);
> +
> +       lim = queue_limits_start_update(disk->queue);
> +       if (virtblk_get_cache_mode(vdev))
> +               lim.features |= BLK_FEAT_WRITE_CACHE;
> +       else
> +               lim.features &= ~BLK_FEAT_WRITE_CACHE;
> +       blk_mq_freeze_queue(disk->queue);
> +       i = queue_limits_commit_update(disk->queue, &lim);
> +       blk_mq_unfreeze_queue(disk->queue);
> +       if (i)
> +               return i;
>         return count;
>  }
>
> @@ -1504,6 +1515,9 @@ static int virtblk_probe(struct virtio_device *vdev)
>         if (err)
>                 goto out_free_tags;
>
> +       if (virtblk_get_cache_mode(vdev))
> +               lim.features |= BLK_FEAT_WRITE_CACHE;
> +
>         vblk->disk = blk_mq_alloc_disk(&vblk->tag_set, &lim, vblk);
>         if (IS_ERR(vblk->disk)) {
>                 err = PTR_ERR(vblk->disk);
> @@ -1519,10 +1533,6 @@ static int virtblk_probe(struct virtio_device *vdev)
>         vblk->disk->fops = &virtblk_fops;
>         vblk->index = index;
>
> -       /* configure queue flush support */
> -       blk_queue_write_cache(vblk->disk->queue, virtblk_get_cache_mode(vdev),
> -                       false);
> -
>         /* If disk is read-only in the host, the guest should obey */
>         if (virtio_has_feature(vdev, VIRTIO_BLK_F_RO))
>                 set_disk_ro(vblk->disk, 1);
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 9794ac2d3299d1..de38e025769b14 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -956,6 +956,12 @@ static void blkif_set_queue_limits(const struct blkfront_info *info,
>                         lim->max_secure_erase_sectors = UINT_MAX;
>         }
>
> +       if (info->feature_flush) {
> +               lim->features |= BLK_FEAT_WRITE_CACHE;
> +               if (info->feature_fua)
> +                       lim->features |= BLK_FEAT_FUA;
> +       }
> +
>         /* Hard sector size and max sectors impersonate the equiv. hardware. */
>         lim->logical_block_size = info->sector_size;
>         lim->physical_block_size = info->physical_sector_size;
> @@ -1150,9 +1156,6 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>         info->sector_size = sector_size;
>         info->physical_sector_size = physical_sector_size;
>
> -       blk_queue_write_cache(info->rq, info->feature_flush ? true : false,
> -                             info->feature_fua ? true : false);
> -
>         pr_info("blkfront: %s: %s %s %s %s %s %s %s\n",
>                 info->gd->disk_name, flush_info(info),
>                 "persistent grants:", info->feature_persistent ?
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 4d11fc664cb0b8..cb6595c8b5514e 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -897,7 +897,6 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
>                 sector_t sectors, struct block_device *cached_bdev,
>                 const struct block_device_operations *ops)
>  {
> -       struct request_queue *q;
>         const size_t max_stripes = min_t(size_t, INT_MAX,
>                                          SIZE_MAX / sizeof(atomic_t));
>         struct queue_limits lim = {
> @@ -909,6 +908,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
>                 .io_min                 = block_size,
>                 .logical_block_size     = block_size,
>                 .physical_block_size    = block_size,
> +               .features               = BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
>         };
>         uint64_t n;
>         int idx;
> @@ -975,12 +975,7 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
>         d->disk->fops           = ops;
>         d->disk->private_data   = d;
>
> -       q = d->disk->queue;
> -
>         blk_queue_flag_set(QUEUE_FLAG_NONROT, d->disk->queue);
> -
> -       blk_queue_write_cache(q, true, true);
> -
>         return 0;
>
>  out_bioset_exit:
> diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
> index fd789eeb62d943..fbe125d55e25b4 100644
> --- a/drivers/md/dm-table.c
> +++ b/drivers/md/dm-table.c
> @@ -1686,34 +1686,16 @@ int dm_calculate_queue_limits(struct dm_table *t,
>         return validate_hardware_logical_block_alignment(t, limits);
>  }
>
> -static int device_flush_capable(struct dm_target *ti, struct dm_dev *dev,
> -                               sector_t start, sector_t len, void *data)
> -{
> -       unsigned long flush = (unsigned long) data;
> -       struct request_queue *q = bdev_get_queue(dev->bdev);
> -
> -       return (q->queue_flags & flush);
> -}
> -
> -static bool dm_table_supports_flush(struct dm_table *t, unsigned long flush)
> +/*
> + * Check if an target requires flush support even if none of the underlying
> + * devices need it (e.g. to persist target-specific metadata).
> + */
> +static bool dm_table_supports_flush(struct dm_table *t)
>  {
> -       /*
> -        * Require at least one underlying device to support flushes.
> -        * t->devices includes internal dm devices such as mirror logs
> -        * so we need to use iterate_devices here, which targets
> -        * supporting flushes must provide.
> -        */
>         for (unsigned int i = 0; i < t->num_targets; i++) {
>                 struct dm_target *ti = dm_table_get_target(t, i);
>
> -               if (!ti->num_flush_bios)
> -                       continue;
> -
> -               if (ti->flush_supported)
> -                       return true;
> -
> -               if (ti->type->iterate_devices &&
> -                   ti->type->iterate_devices(ti, device_flush_capable, (void *) flush))
> +               if (ti->num_flush_bios && ti->flush_supported)
>                         return true;
>         }
>
> @@ -1855,7 +1837,6 @@ static int device_requires_stable_pages(struct dm_target *ti,
>  int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
>                               struct queue_limits *limits)
>  {
> -       bool wc = false, fua = false;
>         int r;
>
>         if (dm_table_supports_nowait(t))
> @@ -1876,12 +1857,8 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
>         if (!dm_table_supports_secure_erase(t))
>                 limits->max_secure_erase_sectors = 0;
>
> -       if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_WC))) {
> -               wc = true;
> -               if (dm_table_supports_flush(t, (1UL << QUEUE_FLAG_FUA)))
> -                       fua = true;
> -       }
> -       blk_queue_write_cache(q, wc, fua);
> +       if (dm_table_supports_flush(t))
> +               limits->features |= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA;
>
>         if (dm_table_supports_dax(t, device_not_dax_capable)) {
>                 blk_queue_flag_set(QUEUE_FLAG_DAX, q);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 67ece2cd725f50..2f4c5d1755d857 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -5785,7 +5785,10 @@ struct mddev *md_alloc(dev_t dev, char *name)
>         int partitioned;
>         int shift;
>         int unit;
> -       int error ;
> +       int error;
> +       struct queue_limits lim = {
> +               .features               = BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
> +       };
>
>         /*
>          * Wait for any previous instance of this device to be completely
> @@ -5825,7 +5828,7 @@ struct mddev *md_alloc(dev_t dev, char *name)
>                  */
>                 mddev->hold_active = UNTIL_STOP;
>
> -       disk = blk_alloc_disk(NULL, NUMA_NO_NODE);
> +       disk = blk_alloc_disk(&lim, NUMA_NO_NODE);
>         if (IS_ERR(disk)) {
>                 error = PTR_ERR(disk);
>                 goto out_free_mddev;
> @@ -5843,7 +5846,6 @@ struct mddev *md_alloc(dev_t dev, char *name)
>         disk->fops = &md_fops;
>         disk->private_data = mddev;
>
> -       blk_queue_write_cache(disk->queue, true, true);
>         disk->events |= DISK_EVENT_MEDIA_CHANGE;
>         mddev->gendisk = disk;
>         error = add_disk(disk);
> diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
> index 367509b5b6466c..2c9963248fcbd6 100644
> --- a/drivers/mmc/core/block.c
> +++ b/drivers/mmc/core/block.c
> @@ -2466,8 +2466,7 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
>         struct mmc_blk_data *md;
>         int devidx, ret;
>         char cap_str[10];
> -       bool cache_enabled = false;
> -       bool fua_enabled = false;
> +       unsigned int features = 0;
>
>         devidx = ida_alloc_max(&mmc_blk_ida, max_devices - 1, GFP_KERNEL);
>         if (devidx < 0) {
> @@ -2499,7 +2498,24 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
>          */
>         md->read_only = mmc_blk_readonly(card);
>
> -       md->disk = mmc_init_queue(&md->queue, card);
> +       if (mmc_host_cmd23(card->host)) {
> +               if ((mmc_card_mmc(card) &&
> +                    card->csd.mmca_vsn >= CSD_SPEC_VER_3) ||
> +                   (mmc_card_sd(card) &&
> +                    card->scr.cmds & SD_SCR_CMD23_SUPPORT))
> +                       md->flags |= MMC_BLK_CMD23;
> +       }
> +
> +       if (md->flags & MMC_BLK_CMD23 &&
> +           ((card->ext_csd.rel_param & EXT_CSD_WR_REL_PARAM_EN) ||
> +            card->ext_csd.rel_sectors)) {
> +               md->flags |= MMC_BLK_REL_WR;
> +               features |= (BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
> +       } else if (mmc_cache_enabled(card->host)) {
> +               features |= BLK_FEAT_WRITE_CACHE;
> +       }
> +
> +       md->disk = mmc_init_queue(&md->queue, card, features);
>         if (IS_ERR(md->disk)) {
>                 ret = PTR_ERR(md->disk);
>                 goto err_kfree;
> @@ -2539,26 +2555,6 @@ static struct mmc_blk_data *mmc_blk_alloc_req(struct mmc_card *card,
>
>         set_capacity(md->disk, size);
>
> -       if (mmc_host_cmd23(card->host)) {
> -               if ((mmc_card_mmc(card) &&
> -                    card->csd.mmca_vsn >= CSD_SPEC_VER_3) ||
> -                   (mmc_card_sd(card) &&
> -                    card->scr.cmds & SD_SCR_CMD23_SUPPORT))
> -                       md->flags |= MMC_BLK_CMD23;
> -       }
> -
> -       if (md->flags & MMC_BLK_CMD23 &&
> -           ((card->ext_csd.rel_param & EXT_CSD_WR_REL_PARAM_EN) ||
> -            card->ext_csd.rel_sectors)) {
> -               md->flags |= MMC_BLK_REL_WR;
> -               fua_enabled = true;
> -               cache_enabled = true;
> -       }
> -       if (mmc_cache_enabled(card->host))
> -               cache_enabled  = true;
> -
> -       blk_queue_write_cache(md->queue.queue, cache_enabled, fua_enabled);
> -
>         string_get_size((u64)size, 512, STRING_UNITS_2,
>                         cap_str, sizeof(cap_str));
>         pr_info("%s: %s %s %s%s\n",
> diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
> index 241cdc2b2a2a3b..97ff993d31570c 100644
> --- a/drivers/mmc/core/queue.c
> +++ b/drivers/mmc/core/queue.c
> @@ -344,10 +344,12 @@ static const struct blk_mq_ops mmc_mq_ops = {
>  };
>
>  static struct gendisk *mmc_alloc_disk(struct mmc_queue *mq,
> -               struct mmc_card *card)
> +               struct mmc_card *card, unsigned int features)
>  {
>         struct mmc_host *host = card->host;
> -       struct queue_limits lim = { };
> +       struct queue_limits lim = {
> +               .features               = features,
> +       };
>         struct gendisk *disk;
>
>         if (mmc_can_erase(card))
> @@ -413,10 +415,12 @@ static inline bool mmc_merge_capable(struct mmc_host *host)
>   * mmc_init_queue - initialise a queue structure.
>   * @mq: mmc queue
>   * @card: mmc card to attach this queue
> + * @features: block layer features (BLK_FEAT_*)
>   *
>   * Initialise a MMC card request queue.
>   */
> -struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
> +struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
> +               unsigned int features)
>  {
>         struct mmc_host *host = card->host;
>         struct gendisk *disk;
> @@ -460,7 +464,7 @@ struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card)
>                 return ERR_PTR(ret);
>
>
> -       disk = mmc_alloc_disk(mq, card);
> +       disk = mmc_alloc_disk(mq, card, features);
>         if (IS_ERR(disk))
>                 blk_mq_free_tag_set(&mq->tag_set);
>         return disk;
> diff --git a/drivers/mmc/core/queue.h b/drivers/mmc/core/queue.h
> index 9ade3bcbb714e4..1498840a4ea008 100644
> --- a/drivers/mmc/core/queue.h
> +++ b/drivers/mmc/core/queue.h
> @@ -94,7 +94,8 @@ struct mmc_queue {
>         struct work_struct      complete_work;
>  };
>
> -struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card);
> +struct gendisk *mmc_init_queue(struct mmc_queue *mq, struct mmc_card *card,
> +               unsigned int features);
>  extern void mmc_cleanup_queue(struct mmc_queue *);
>  extern void mmc_queue_suspend(struct mmc_queue *);
>  extern void mmc_queue_resume(struct mmc_queue *);
> diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
> index 3caa0717d46c01..1b9f57f231e8be 100644
> --- a/drivers/mtd/mtd_blkdevs.c
> +++ b/drivers/mtd/mtd_blkdevs.c
> @@ -336,6 +336,8 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
>         lim.logical_block_size = tr->blksize;
>         if (tr->discard)
>                 lim.max_hw_discard_sectors = UINT_MAX;
> +       if (tr->flush)
> +               lim.features |= BLK_FEAT_WRITE_CACHE;
>
>         /* Create gendisk */
>         gd = blk_mq_alloc_disk(new->tag_set, &lim, new);
> @@ -373,9 +375,6 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
>         spin_lock_init(&new->queue_lock);
>         INIT_LIST_HEAD(&new->rq_list);
>
> -       if (tr->flush)
> -               blk_queue_write_cache(new->rq, true, false);
> -
>         blk_queue_flag_set(QUEUE_FLAG_NONROT, new->rq);
>         blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, new->rq);
>
> diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
> index 598fe2e89bda45..aff818469c114c 100644
> --- a/drivers/nvdimm/pmem.c
> +++ b/drivers/nvdimm/pmem.c
> @@ -455,6 +455,7 @@ static int pmem_attach_disk(struct device *dev,
>                 .logical_block_size     = pmem_sector_size(ndns),
>                 .physical_block_size    = PAGE_SIZE,
>                 .max_hw_sectors         = UINT_MAX,
> +               .features               = BLK_FEAT_WRITE_CACHE,
>         };
>         int nid = dev_to_node(dev), fua;
>         struct resource *res = &nsio->res;
> @@ -495,6 +496,8 @@ static int pmem_attach_disk(struct device *dev,
>                 dev_warn(dev, "unable to guarantee persistence of writes\n");
>                 fua = 0;
>         }
> +       if (fua)
> +               lim.features |= BLK_FEAT_FUA;
>
>         if (!devm_request_mem_region(dev, res->start, resource_size(res),
>                                 dev_name(&ndns->dev))) {
> @@ -543,7 +546,6 @@ static int pmem_attach_disk(struct device *dev,
>         }
>         pmem->virt_addr = addr;
>
> -       blk_queue_write_cache(q, true, fua);
>         blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
>         blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
>         if (pmem->pfn_flags & PFN_MAP)
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 5a673fa5cb2612..9fc5e36fe2e55e 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2056,7 +2056,6 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
>  static int nvme_update_ns_info_block(struct nvme_ns *ns,
>                 struct nvme_ns_info *info)
>  {
> -       bool vwc = ns->ctrl->vwc & NVME_CTRL_VWC_PRESENT;
>         struct queue_limits lim;
>         struct nvme_id_ns_nvm *nvm = NULL;
>         struct nvme_zone_info zi = {};
> @@ -2106,6 +2105,11 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
>             ns->head->ids.csi == NVME_CSI_ZNS)
>                 nvme_update_zone_info(ns, &lim, &zi);
>
> +       if (ns->ctrl->vwc & NVME_CTRL_VWC_PRESENT)
> +               lim.features |= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA;
> +       else
> +               lim.features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
> +
>         /*
>          * Register a metadata profile for PI, or the plain non-integrity NVMe
>          * metadata masquerading as Type 0 if supported, otherwise reject block
> @@ -2132,7 +2136,6 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
>         if ((id->dlfeat & 0x7) == 0x1 && (id->dlfeat & (1 << 3)))
>                 ns->head->features |= NVME_NS_DEAC;
>         set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
> -       blk_queue_write_cache(ns->disk->queue, vwc, vwc);
>         set_bit(NVME_NS_READY, &ns->flags);
>         blk_mq_unfreeze_queue(ns->disk->queue);
>
> diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
> index 12c59db02539e5..3d0e23a0a4ddd8 100644
> --- a/drivers/nvme/host/multipath.c
> +++ b/drivers/nvme/host/multipath.c
> @@ -521,7 +521,6 @@ static void nvme_requeue_work(struct work_struct *work)
>  int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
>  {
>         struct queue_limits lim;
> -       bool vwc = false;
>
>         mutex_init(&head->lock);
>         bio_list_init(&head->requeue_list);
> @@ -562,11 +561,6 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
>         if (ctrl->tagset->nr_maps > HCTX_TYPE_POLL &&
>             ctrl->tagset->map[HCTX_TYPE_POLL].nr_queues)
>                 blk_queue_flag_set(QUEUE_FLAG_POLL, head->disk->queue);
> -
> -       /* we need to propagate up the VMC settings */
> -       if (ctrl->vwc & NVME_CTRL_VWC_PRESENT)
> -               vwc = true;
> -       blk_queue_write_cache(head->disk->queue, vwc, vwc);
>         return 0;
>  }
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 5bfed61c70db8f..8764ea14c9b881 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -120,17 +120,18 @@ static const char *sd_cache_types[] = {
>         "write back, no read (daft)"
>  };
>
> -static void sd_set_flush_flag(struct scsi_disk *sdkp)
> +static void sd_set_flush_flag(struct scsi_disk *sdkp,
> +               struct queue_limits *lim)
>  {
> -       bool wc = false, fua = false;
> -
>         if (sdkp->WCE) {
> -               wc = true;
> +               lim->features |= BLK_FEAT_WRITE_CACHE;
>                 if (sdkp->DPOFUA)
> -                       fua = true;
> +                       lim->features |= BLK_FEAT_FUA;
> +               else
> +                       lim->features &= ~BLK_FEAT_FUA;
> +       } else {
> +               lim->features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA);
>         }
> -
> -       blk_queue_write_cache(sdkp->disk->queue, wc, fua);
>  }
>
>  static ssize_t
> @@ -168,9 +169,18 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
>         wce = (ct & 0x02) && !sdkp->write_prot ? 1 : 0;
>
>         if (sdkp->cache_override) {
> +               struct queue_limits lim;
> +
>                 sdkp->WCE = wce;
>                 sdkp->RCD = rcd;
> -               sd_set_flush_flag(sdkp);
> +
> +               lim = queue_limits_start_update(sdkp->disk->queue);
> +               sd_set_flush_flag(sdkp, &lim);
> +               blk_mq_freeze_queue(sdkp->disk->queue);
> +               ret = queue_limits_commit_update(sdkp->disk->queue, &lim);
> +               blk_mq_unfreeze_queue(sdkp->disk->queue);
> +               if (ret)
> +                       return ret;
>                 return count;
>         }
>
> @@ -3659,7 +3669,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
>          * We now have all cache related info, determine how we deal
>          * with flush requests.
>          */
> -       sd_set_flush_flag(sdkp);
> +       sd_set_flush_flag(sdkp, &lim);
>
>         /* Initial block count limit based on CDB TRANSFER LENGTH field size. */
>         dev_max = sdp->use_16_for_rw ? SD_MAX_XFER_BLOCKS : SD_DEF_XFER_BLOCKS;
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index c792d4d81e5fcc..4e8931a2c76b07 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -282,6 +282,28 @@ static inline bool blk_op_is_passthrough(blk_opf_t op)
>         return op == REQ_OP_DRV_IN || op == REQ_OP_DRV_OUT;
>  }
>
> +/* flags set by the driver in queue_limits.features */
> +enum {
> +       /* supports a a volatile write cache */
> +       BLK_FEAT_WRITE_CACHE                    = (1u << 0),
> +
> +       /* supports passing on the FUA bit */
> +       BLK_FEAT_FUA                            = (1u << 1),
> +};
> +
> +/*
> + * Flags automatically inherited when stacking limits.
> + */
> +#define BLK_FEAT_INHERIT_MASK \
> +       (BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA)
> +
> +
> +/* internal flags in queue_limits.flags */
> +enum {
> +       /* do not send FLUSH or FUA command despite advertised write cache */
> +       BLK_FLAGS_WRITE_CACHE_DISABLED          = (1u << 31),
> +};
> +
>  /*
>   * BLK_BOUNCE_NONE:    never bounce (default)
>   * BLK_BOUNCE_HIGH:    bounce all highmem pages
> @@ -292,6 +314,8 @@ enum blk_bounce {
>  };
>
>  struct queue_limits {
> +       unsigned int            features;
> +       unsigned int            flags;
>         enum blk_bounce         bounce;
>         unsigned long           seg_boundary_mask;
>         unsigned long           virt_boundary_mask;
> @@ -536,12 +560,9 @@ struct request_queue {
>  #define QUEUE_FLAG_ADD_RANDOM  10      /* Contributes to random pool */
>  #define QUEUE_FLAG_SYNCHRONOUS 11      /* always completes in submit context */
>  #define QUEUE_FLAG_SAME_FORCE  12      /* force complete on same CPU */
> -#define QUEUE_FLAG_HW_WC       13      /* Write back caching supported */
>  #define QUEUE_FLAG_INIT_DONE   14      /* queue is initialized */
>  #define QUEUE_FLAG_STABLE_WRITES 15    /* don't modify blks until WB is done */
>  #define QUEUE_FLAG_POLL                16      /* IO polling enabled if set */
> -#define QUEUE_FLAG_WC          17      /* Write back caching */
> -#define QUEUE_FLAG_FUA         18      /* device supports FUA writes */
>  #define QUEUE_FLAG_DAX         19      /* device supports DAX */
>  #define QUEUE_FLAG_STATS       20      /* track IO start and completion times */
>  #define QUEUE_FLAG_REGISTERED  22      /* queue has been registered to a disk */
> @@ -951,7 +972,6 @@ void queue_limits_stack_bdev(struct queue_limits *t, struct block_device *bdev,
>                 sector_t offset, const char *pfx);
>  extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
>  extern void blk_queue_rq_timeout(struct request_queue *, unsigned int);
> -extern void blk_queue_write_cache(struct request_queue *q, bool enabled, bool fua);
>
>  struct blk_independent_access_ranges *
>  disk_alloc_independent_access_ranges(struct gendisk *disk, int nr_ia_ranges);
> @@ -1305,14 +1325,20 @@ static inline bool bdev_stable_writes(struct block_device *bdev)
>         return test_bit(QUEUE_FLAG_STABLE_WRITES, &q->queue_flags);
>  }
>
> +static inline bool blk_queue_write_cache(struct request_queue *q)
> +{
> +       return (q->limits.features & BLK_FEAT_WRITE_CACHE) &&
> +               (q->limits.flags & BLK_FLAGS_WRITE_CACHE_DISABLED);
> +}
> +
>  static inline bool bdev_write_cache(struct block_device *bdev)
>  {
> -       return test_bit(QUEUE_FLAG_WC, &bdev_get_queue(bdev)->queue_flags);
> +       return blk_queue_write_cache(bdev_get_queue(bdev));
>  }
>
>  static inline bool bdev_fua(struct block_device *bdev)
>  {
> -       return test_bit(QUEUE_FLAG_FUA, &bdev_get_queue(bdev)->queue_flags);
> +       return bdev_get_queue(bdev)->limits.features & BLK_FEAT_FUA;
>  }
>
>  static inline bool bdev_nowait(struct block_device *bdev)
> --
> 2.43.0
>
>

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 14/26] block: move the nonrot flag to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (12 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 13/26] block: move cache control settings out of queue->flags Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:02   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 15/26] block: move the add_random " Christoph Hellwig
                   ` (11 subsequent siblings)
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the norot flag into the queue_limits feature field so that it can be
set atomically and all I/O is frozen when changing the flag.

Use the chance to switch to defaulting to non-rotational and require
the driver to opt into rotational, which matches the polarity of the
sysfs interface.

For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new
rotational flag is not set as they clearly are not rotational despite
this being a behavior change.  There are some other drivers that
unconditionally set the rotational flag to keep the existing behavior
as they arguably can be used on rotational devices even if that is
probably not their main use today (e.g. virtio_blk and drbd).

The flag is automatically inherited in blk_stack_limits matching the
existing behavior in dm and md.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 arch/m68k/emu/nfblock.c             |  1 +
 arch/um/drivers/ubd_kern.c          |  1 -
 arch/xtensa/platforms/iss/simdisk.c |  5 +++-
 block/blk-mq-debugfs.c              |  1 -
 block/blk-sysfs.c                   | 39 ++++++++++++++++++++++++++---
 drivers/block/amiflop.c             |  5 +++-
 drivers/block/aoe/aoeblk.c          |  1 +
 drivers/block/ataflop.c             |  5 +++-
 drivers/block/brd.c                 |  2 --
 drivers/block/drbd/drbd_main.c      |  3 ++-
 drivers/block/floppy.c              |  3 ++-
 drivers/block/loop.c                |  8 +++---
 drivers/block/mtip32xx/mtip32xx.c   |  1 -
 drivers/block/n64cart.c             |  2 --
 drivers/block/nbd.c                 |  5 ----
 drivers/block/null_blk/main.c       |  1 -
 drivers/block/pktcdvd.c             |  1 +
 drivers/block/ps3disk.c             |  3 ++-
 drivers/block/rbd.c                 |  3 ---
 drivers/block/rnbd/rnbd-clt.c       |  4 ---
 drivers/block/sunvdc.c              |  1 +
 drivers/block/swim.c                |  5 +++-
 drivers/block/swim3.c               |  5 +++-
 drivers/block/ublk_drv.c            |  9 +++----
 drivers/block/virtio_blk.c          |  4 ++-
 drivers/block/xen-blkfront.c        |  1 -
 drivers/block/zram/zram_drv.c       |  2 --
 drivers/cdrom/gdrom.c               |  1 +
 drivers/md/bcache/super.c           |  2 --
 drivers/md/dm-table.c               | 12 ---------
 drivers/md/md.c                     | 13 ----------
 drivers/mmc/core/queue.c            |  1 -
 drivers/mtd/mtd_blkdevs.c           |  1 -
 drivers/nvdimm/btt.c                |  1 -
 drivers/nvdimm/pmem.c               |  1 -
 drivers/nvme/host/core.c            |  1 -
 drivers/nvme/host/multipath.c       |  1 -
 drivers/s390/block/dasd_genhd.c     |  1 -
 drivers/s390/block/scm_blk.c        |  1 -
 drivers/scsi/sd.c                   |  4 +--
 include/linux/blkdev.h              | 10 ++++----
 41 files changed, 83 insertions(+), 88 deletions(-)

diff --git a/arch/m68k/emu/nfblock.c b/arch/m68k/emu/nfblock.c
index 642fb80c5c4e31..8eea7ef9115146 100644
--- a/arch/m68k/emu/nfblock.c
+++ b/arch/m68k/emu/nfblock.c
@@ -98,6 +98,7 @@ static int __init nfhd_init_one(int id, u32 blocks, u32 bsize)
 {
 	struct queue_limits lim = {
 		.logical_block_size	= bsize,
+		.features		= BLK_FEAT_ROTATIONAL,
 	};
 	struct nfhd_device *dev;
 	int dev_id = id - NFHD_DEV_OFFSET;
diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
index 19e01691ea0ea7..9f1e76ddda5a26 100644
--- a/arch/um/drivers/ubd_kern.c
+++ b/arch/um/drivers/ubd_kern.c
@@ -882,7 +882,6 @@ static int ubd_add(int n, char **error_out)
 		goto out_cleanup_tags;
 	}
 
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
 	disk->major = UBD_MAJOR;
 	disk->first_minor = n << UBD_SHIFT;
 	disk->minors = 1 << UBD_SHIFT;
diff --git a/arch/xtensa/platforms/iss/simdisk.c b/arch/xtensa/platforms/iss/simdisk.c
index defc67909a9c74..d6d2b533a5744d 100644
--- a/arch/xtensa/platforms/iss/simdisk.c
+++ b/arch/xtensa/platforms/iss/simdisk.c
@@ -263,6 +263,9 @@ static const struct proc_ops simdisk_proc_ops = {
 static int __init simdisk_setup(struct simdisk *dev, int which,
 		struct proc_dir_entry *procdir)
 {
+	struct queue_limits lim = {
+		.features		= BLK_FEAT_ROTATIONAL,
+	};
 	char tmp[2] = { '0' + which, 0 };
 	int err;
 
@@ -271,7 +274,7 @@ static int __init simdisk_setup(struct simdisk *dev, int which,
 	spin_lock_init(&dev->lock);
 	dev->users = 0;
 
-	dev->gd = blk_alloc_disk(NULL, NUMA_NO_NODE);
+	dev->gd = blk_alloc_disk(&lim, NUMA_NO_NODE);
 	if (IS_ERR(dev->gd)) {
 		err = PTR_ERR(dev->gd);
 		goto out;
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index e8b9db7c30c455..4d0e62ec88f033 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -84,7 +84,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(NOMERGES),
 	QUEUE_FLAG_NAME(SAME_COMP),
 	QUEUE_FLAG_NAME(FAIL_IO),
-	QUEUE_FLAG_NAME(NONROT),
 	QUEUE_FLAG_NAME(IO_STAT),
 	QUEUE_FLAG_NAME(NOXMERGES),
 	QUEUE_FLAG_NAME(ADD_RANDOM),
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 4f524c1d5e08bd..637ed3bbbfb46f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -263,6 +263,39 @@ static ssize_t queue_dma_alignment_show(struct request_queue *q, char *page)
 	return queue_var_show(queue_dma_alignment(q), page);
 }
 
+static ssize_t queue_feature_store(struct request_queue *q, const char *page,
+		size_t count, unsigned int feature)
+{
+	struct queue_limits lim;
+	unsigned long val;
+	ssize_t ret;
+
+	ret = queue_var_store(&val, page, count);
+	if (ret < 0)
+		return ret;
+
+	lim = queue_limits_start_update(q);
+	if (val)
+		lim.features |= feature;
+	else
+		lim.features &= ~feature;
+	ret = queue_limits_commit_update(q, &lim);
+	if (ret)
+		return ret;
+	return count;
+}
+
+#define QUEUE_SYSFS_FEATURE(_name, _feature)				 \
+static ssize_t queue_##_name##_show(struct request_queue *q, char *page) \
+{									 \
+	return sprintf(page, "%u\n", !!(q->limits.features & _feature)); \
+}									 \
+static ssize_t queue_##_name##_store(struct request_queue *q,		 \
+		const char *page, size_t count)				 \
+{									 \
+	return queue_feature_store(q, page, count, _feature);		 \
+}
+
 #define QUEUE_SYSFS_BIT_FNS(name, flag, neg)				\
 static ssize_t								\
 queue_##name##_show(struct request_queue *q, char *page)		\
@@ -289,7 +322,7 @@ queue_##name##_store(struct request_queue *q, const char *page, size_t count) \
 	return ret;							\
 }
 
-QUEUE_SYSFS_BIT_FNS(nonrot, NONROT, 1);
+QUEUE_SYSFS_FEATURE(rotational, BLK_FEAT_ROTATIONAL)
 QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
@@ -526,7 +559,7 @@ static struct queue_sysfs_entry queue_hw_sector_size_entry = {
 	.show = queue_logical_block_size_show,
 };
 
-QUEUE_RW_ENTRY(queue_nonrot, "rotational");
+QUEUE_RW_ENTRY(queue_rotational, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
 QUEUE_RW_ENTRY(queue_random, "add_random");
 QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
@@ -624,7 +657,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_write_zeroes_max_entry.attr,
 	&queue_zone_append_max_entry.attr,
 	&queue_zone_write_granularity_entry.attr,
-	&queue_nonrot_entry.attr,
+	&queue_rotational_entry.attr,
 	&queue_zoned_entry.attr,
 	&queue_nr_zones_entry.attr,
 	&queue_max_open_zones_entry.attr,
diff --git a/drivers/block/amiflop.c b/drivers/block/amiflop.c
index a25414228e4741..ff45701f7a5e31 100644
--- a/drivers/block/amiflop.c
+++ b/drivers/block/amiflop.c
@@ -1776,10 +1776,13 @@ static const struct blk_mq_ops amiflop_mq_ops = {
 
 static int fd_alloc_disk(int drive, int system)
 {
+	struct queue_limits lim = {
+		.features		= BLK_FEAT_ROTATIONAL,
+	};
 	struct gendisk *disk;
 	int err;
 
-	disk = blk_mq_alloc_disk(&unit[drive].tag_set, NULL, NULL);
+	disk = blk_mq_alloc_disk(&unit[drive].tag_set, &lim, NULL);
 	if (IS_ERR(disk))
 		return PTR_ERR(disk);
 
diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index b6dac8cee70fe1..2028795ec61cbb 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -337,6 +337,7 @@ aoeblk_gdalloc(void *vp)
 	struct queue_limits lim = {
 		.max_hw_sectors		= aoe_maxsectors,
 		.io_opt			= SZ_2M,
+		.features		= BLK_FEAT_ROTATIONAL,
 	};
 	ulong flags;
 	int late = 0;
diff --git a/drivers/block/ataflop.c b/drivers/block/ataflop.c
index cacc4ba942a814..4ee10a742bdb93 100644
--- a/drivers/block/ataflop.c
+++ b/drivers/block/ataflop.c
@@ -1992,9 +1992,12 @@ static const struct blk_mq_ops ataflop_mq_ops = {
 
 static int ataflop_alloc_disk(unsigned int drive, unsigned int type)
 {
+	struct queue_limits lim = {
+		.features		= BLK_FEAT_ROTATIONAL,
+	};
 	struct gendisk *disk;
 
-	disk = blk_mq_alloc_disk(&unit[drive].tag_set, NULL, NULL);
+	disk = blk_mq_alloc_disk(&unit[drive].tag_set, &lim, NULL);
 	if (IS_ERR(disk))
 		return PTR_ERR(disk);
 
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 558d8e67056608..b25dc463b5e3a6 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -366,8 +366,6 @@ static int brd_alloc(int i)
 	strscpy(disk->disk_name, buf, DISK_NAME_LEN);
 	set_capacity(disk, rd_size * 2);
 	
-	/* Tell the block layer that this is not a rotational device */
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
 	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, disk->queue);
 	blk_queue_flag_set(QUEUE_FLAG_NOWAIT, disk->queue);
 	err = add_disk(disk);
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index bf42a46781fa21..2ef29a47807550 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2697,7 +2697,8 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
 		 * connect.
 		 */
 		.max_hw_sectors		= DRBD_MAX_BIO_SIZE_SAFE >> 8,
-		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
+		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA |
+					  BLK_FEAT_ROTATIONAL,
 	};
 
 	device = minor_to_device(minor);
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 25c9d85667f1a2..6d7f7df97c3a6c 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4516,7 +4516,8 @@ static bool floppy_available(int drive)
 static int floppy_alloc_disk(unsigned int drive, unsigned int type)
 {
 	struct queue_limits lim = {
-		.max_hw_sectors = 64,
+		.max_hw_sectors		= 64,
+		.features		= BLK_FEAT_ROTATIONAL,
 	};
 	struct gendisk *disk;
 
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 0b23fdc4e2edcc..6b01b30245b74a 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -985,13 +985,11 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned short bsize)
 	lim.logical_block_size = bsize;
 	lim.physical_block_size = bsize;
 	lim.io_min = bsize;
-	lim.features &= ~BLK_FEAT_WRITE_CACHE;
+	lim.features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_ROTATIONAL);
 	if (file->f_op->fsync && !(lo->lo_flags & LO_FLAGS_READ_ONLY))
 		lim.features |= BLK_FEAT_WRITE_CACHE;
-	if (!backing_bdev || bdev_nonrot(backing_bdev))
-		blk_queue_flag_set(QUEUE_FLAG_NONROT, lo->lo_queue);
-	else
-		blk_queue_flag_clear(QUEUE_FLAG_NONROT, lo->lo_queue);
+	if (backing_bdev && !bdev_nonrot(backing_bdev))
+		lim.features |= BLK_FEAT_ROTATIONAL;
 	loop_config_discard(lo, &lim);
 	return queue_limits_commit_update(lo->lo_queue, &lim);
 }
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 43a187609ef794..1dbbf72659d549 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3485,7 +3485,6 @@ static int mtip_block_initialize(struct driver_data *dd)
 		goto start_service_thread;
 
 	/* Set device limits. */
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, dd->queue);
 	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, dd->queue);
 	dma_set_max_seg_size(&dd->pdev->dev, 0x400000);
 
diff --git a/drivers/block/n64cart.c b/drivers/block/n64cart.c
index 27b2187e7a6d55..b9fdeff31cafdf 100644
--- a/drivers/block/n64cart.c
+++ b/drivers/block/n64cart.c
@@ -150,8 +150,6 @@ static int __init n64cart_probe(struct platform_device *pdev)
 	set_capacity(disk, size >> SECTOR_SHIFT);
 	set_disk_ro(disk, 1);
 
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
-
 	err = add_disk(disk);
 	if (err)
 		goto out_cleanup_disk;
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index cb1c86a6a3fb9d..6cddf5baffe02a 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1867,11 +1867,6 @@ static struct nbd_device *nbd_dev_add(int index, unsigned int refs)
 		goto out_err_disk;
 	}
 
-	/*
-	 * Tell the block layer that we are not a rotational device
-	 */
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, disk->queue);
-
 	mutex_init(&nbd->config_lock);
 	refcount_set(&nbd->config_refs, 0);
 	/*
diff --git a/drivers/block/null_blk/main.c b/drivers/block/null_blk/main.c
index 73e4aecf5bb492..3c521ec123ea3b 100644
--- a/drivers/block/null_blk/main.c
+++ b/drivers/block/null_blk/main.c
@@ -1948,7 +1948,6 @@ static int null_add_dev(struct nullb_device *dev)
 	}
 
 	nullb->q->queuedata = nullb;
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q);
 
 	rv = ida_alloc(&nullb_indexes, GFP_KERNEL);
 	if (rv < 0)
diff --git a/drivers/block/pktcdvd.c b/drivers/block/pktcdvd.c
index 8a2ce80700109d..7cece5884b9c67 100644
--- a/drivers/block/pktcdvd.c
+++ b/drivers/block/pktcdvd.c
@@ -2622,6 +2622,7 @@ static int pkt_setup_dev(dev_t dev, dev_t* pkt_dev)
 	struct queue_limits lim = {
 		.max_hw_sectors		= PACKET_MAX_SECTORS,
 		.logical_block_size	= CD_FRAMESIZE,
+		.features		= BLK_FEAT_ROTATIONAL,
 	};
 	int idx;
 	int ret = -ENOMEM;
diff --git a/drivers/block/ps3disk.c b/drivers/block/ps3disk.c
index 8b73cf459b5937..ff45ed76646957 100644
--- a/drivers/block/ps3disk.c
+++ b/drivers/block/ps3disk.c
@@ -388,7 +388,8 @@ static int ps3disk_probe(struct ps3_system_bus_device *_dev)
 		.max_segments		= -1,
 		.max_segment_size	= dev->bounce_size,
 		.dma_alignment		= dev->blk_size - 1,
-		.features		= BLK_FEAT_WRITE_CACHE,
+		.features		= BLK_FEAT_WRITE_CACHE |
+					  BLK_FEAT_ROTATIONAL,
 	};
 	struct gendisk *gendisk;
 
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index 22ad704f81d8b9..ec1f1c7d4275cd 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4997,9 +4997,6 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
 	disk->fops = &rbd_bd_ops;
 	disk->private_data = rbd_dev;
 
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
-	/* QUEUE_FLAG_ADD_RANDOM is off by default for blk-mq */
-
 	if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
 		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
 
diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c
index 02c4b173182719..4918b0f68b46cd 100644
--- a/drivers/block/rnbd/rnbd-clt.c
+++ b/drivers/block/rnbd/rnbd-clt.c
@@ -1352,10 +1352,6 @@ static int rnbd_clt_setup_gen_disk(struct rnbd_clt_dev *dev,
 	if (dev->access_mode == RNBD_ACCESS_RO)
 		set_disk_ro(dev->gd, true);
 
-	/*
-	 * Network device does not need rotational
-	 */
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, dev->queue);
 	err = add_disk(dev->gd);
 	if (err)
 		put_disk(dev->gd);
diff --git a/drivers/block/sunvdc.c b/drivers/block/sunvdc.c
index 5286cb8e0824d1..2d38331ee66793 100644
--- a/drivers/block/sunvdc.c
+++ b/drivers/block/sunvdc.c
@@ -791,6 +791,7 @@ static int probe_disk(struct vdc_port *port)
 		.seg_boundary_mask		= PAGE_SIZE - 1,
 		.max_segment_size		= PAGE_SIZE,
 		.max_segments			= port->ring_cookies,
+		.features			= BLK_FEAT_ROTATIONAL,
 	};
 	struct request_queue *q;
 	struct gendisk *g;
diff --git a/drivers/block/swim.c b/drivers/block/swim.c
index 6731678f3a41db..126f151c4f2cf0 100644
--- a/drivers/block/swim.c
+++ b/drivers/block/swim.c
@@ -787,6 +787,9 @@ static void swim_cleanup_floppy_disk(struct floppy_state *fs)
 
 static int swim_floppy_init(struct swim_priv *swd)
 {
+	struct queue_limits lim = {
+		.features		= BLK_FEAT_ROTATIONAL,
+	};
 	int err;
 	int drive;
 	struct swim __iomem *base = swd->base;
@@ -820,7 +823,7 @@ static int swim_floppy_init(struct swim_priv *swd)
 			goto exit_put_disks;
 
 		swd->unit[drive].disk =
-			blk_mq_alloc_disk(&swd->unit[drive].tag_set, NULL,
+			blk_mq_alloc_disk(&swd->unit[drive].tag_set, &lim,
 					  &swd->unit[drive]);
 		if (IS_ERR(swd->unit[drive].disk)) {
 			blk_mq_free_tag_set(&swd->unit[drive].tag_set);
diff --git a/drivers/block/swim3.c b/drivers/block/swim3.c
index a04756ac778ee8..90be1017f7bfcd 100644
--- a/drivers/block/swim3.c
+++ b/drivers/block/swim3.c
@@ -1189,6 +1189,9 @@ static int swim3_add_device(struct macio_dev *mdev, int index)
 static int swim3_attach(struct macio_dev *mdev,
 			const struct of_device_id *match)
 {
+	struct queue_limits lim = {
+		.features		= BLK_FEAT_ROTATIONAL,
+	};
 	struct floppy_state *fs;
 	struct gendisk *disk;
 	int rc;
@@ -1210,7 +1213,7 @@ static int swim3_attach(struct macio_dev *mdev,
 	if (rc)
 		goto out_unregister;
 
-	disk = blk_mq_alloc_disk(&fs->tag_set, NULL, fs);
+	disk = blk_mq_alloc_disk(&fs->tag_set, &lim, fs);
 	if (IS_ERR(disk)) {
 		rc = PTR_ERR(disk);
 		goto out_free_tag_set;
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index e45c65c1848d31..4fcde099935868 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -484,14 +484,8 @@ static inline unsigned ublk_pos_to_tag(loff_t pos)
 
 static void ublk_dev_param_basic_apply(struct ublk_device *ub)
 {
-	struct request_queue *q = ub->ub_disk->queue;
 	const struct ublk_param_basic *p = &ub->params.basic;
 
-	if (p->attrs & UBLK_ATTR_ROTATIONAL)
-		blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
-	else
-		blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
-
 	if (p->attrs & UBLK_ATTR_READ_ONLY)
 		set_disk_ro(ub->ub_disk, true);
 
@@ -2214,6 +2208,9 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
 			lim.features |= BLK_FEAT_FUA;
 	}
 
+	if (ub->params.basic.attrs & UBLK_ATTR_ROTATIONAL)
+		lim.features |= BLK_FEAT_ROTATIONAL;
+
 	if (wait_for_completion_interruptible(&ub->completion) != 0)
 		return -EINTR;
 
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index b1a3c293528519..13a2f24f176628 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1451,7 +1451,9 @@ static int virtblk_read_limits(struct virtio_blk *vblk,
 static int virtblk_probe(struct virtio_device *vdev)
 {
 	struct virtio_blk *vblk;
-	struct queue_limits lim = { };
+	struct queue_limits lim = {
+		.features		= BLK_FEAT_ROTATIONAL,
+	};
 	int err, index;
 	unsigned int queue_depth;
 
diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index de38e025769b14..4fe95a2bffe91a 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1133,7 +1133,6 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 		err = PTR_ERR(gd);
 		goto out_free_tag_set;
 	}
-	blk_queue_flag_set(QUEUE_FLAG_VIRT, gd->queue);
 
 	strcpy(gd->disk_name, DEV_NAME);
 	ptr = encode_disk_name(gd->disk_name + sizeof(DEV_NAME) - 1, offset);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 3acd7006ad2ccd..aad840fc7e18e3 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2245,8 +2245,6 @@ static int zram_add(void)
 
 	/* Actual capacity set using sysfs (/sys/block/zram<id>/disksize */
 	set_capacity(zram->disk, 0);
-	/* zram devices sort of resembles non-rotational disks */
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, zram->disk->queue);
 	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, zram->disk->queue);
 	blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
 	ret = device_add_disk(NULL, zram->disk, zram_disk_groups);
diff --git a/drivers/cdrom/gdrom.c b/drivers/cdrom/gdrom.c
index eefdd422ad8e9f..71cfe7a85913c4 100644
--- a/drivers/cdrom/gdrom.c
+++ b/drivers/cdrom/gdrom.c
@@ -744,6 +744,7 @@ static int probe_gdrom(struct platform_device *devptr)
 		.max_segments			= 1,
 		/* set a large max size to get most from DMA */
 		.max_segment_size		= 0x40000,
+		.features			= BLK_FEAT_ROTATIONAL,
 	};
 	int err;
 
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index cb6595c8b5514e..baa364eedd0051 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -974,8 +974,6 @@ static int bcache_device_init(struct bcache_device *d, unsigned int block_size,
 	d->disk->minors		= BCACHE_MINORS;
 	d->disk->fops		= ops;
 	d->disk->private_data	= d;
-
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, d->disk->queue);
 	return 0;
 
 out_bioset_exit:
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index fbe125d55e25b4..3514a57c2df5d2 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1716,12 +1716,6 @@ static int device_dax_write_cache_enabled(struct dm_target *ti,
 	return false;
 }
 
-static int device_is_rotational(struct dm_target *ti, struct dm_dev *dev,
-				sector_t start, sector_t len, void *data)
-{
-	return !bdev_nonrot(dev->bdev);
-}
-
 static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
 			     sector_t start, sector_t len, void *data)
 {
@@ -1870,12 +1864,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (dm_table_any_dev_attr(t, device_dax_write_cache_enabled, NULL))
 		dax_write_cache(t->md->dax_dev, true);
 
-	/* Ensure that all underlying devices are non-rotational. */
-	if (dm_table_any_dev_attr(t, device_is_rotational, NULL))
-		blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
-	else
-		blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
-
 	/*
 	 * Some devices don't use blk_integrity but still want stable pages
 	 * because they do their own checksumming.
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2f4c5d1755d857..c23423c51fb7c2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6151,20 +6151,7 @@ int md_run(struct mddev *mddev)
 
 	if (!mddev_is_dm(mddev)) {
 		struct request_queue *q = mddev->gendisk->queue;
-		bool nonrot = true;
 
-		rdev_for_each(rdev, mddev) {
-			if (rdev->raid_disk >= 0 && !bdev_nonrot(rdev->bdev)) {
-				nonrot = false;
-				break;
-			}
-		}
-		if (mddev->degraded)
-			nonrot = false;
-		if (nonrot)
-			blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
-		else
-			blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
 		blk_queue_flag_set(QUEUE_FLAG_IO_STAT, q);
 
 		/* Set the NOWAIT flags if all underlying devices support it */
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index 97ff993d31570c..b4f62fa845864c 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -387,7 +387,6 @@ static struct gendisk *mmc_alloc_disk(struct mmc_queue *mq,
 		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, mq->queue);
 	blk_queue_rq_timeout(mq->queue, 60 * HZ);
 
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, mq->queue);
 	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, mq->queue);
 
 	dma_set_max_seg_size(mmc_dev(host), queue_max_segment_size(mq->queue));
diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index 1b9f57f231e8be..bf8369ce7ddf1d 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -375,7 +375,6 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
 	spin_lock_init(&new->queue_lock);
 	INIT_LIST_HEAD(&new->rq_list);
 
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, new->rq);
 	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, new->rq);
 
 	gd->queue = new->rq;
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index c5f8451b494d6c..e474afa8e9f68d 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1518,7 +1518,6 @@ static int btt_blk_init(struct btt *btt)
 	btt->btt_disk->fops = &btt_fops;
 	btt->btt_disk->private_data = btt;
 
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, btt->btt_disk->queue);
 	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, btt->btt_disk->queue);
 
 	set_capacity(btt->btt_disk, btt->nlba * btt->sector_size >> 9);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index aff818469c114c..501cf226df0187 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -546,7 +546,6 @@ static int pmem_attach_disk(struct device *dev,
 	}
 	pmem->virt_addr = addr;
 
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
 	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
 	if (pmem->pfn_flags & PFN_MAP)
 		blk_queue_flag_set(QUEUE_FLAG_DAX, q);
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 9fc5e36fe2e55e..0d753fe71f35b0 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3744,7 +3744,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 	if (ctrl->opts && ctrl->opts->data_digest)
 		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
 
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, ns->queue);
 	if (ctrl->ops->supports_pci_p2pdma &&
 	    ctrl->ops->supports_pci_p2pdma(ctrl))
 		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 3d0e23a0a4ddd8..58c13304e558e0 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -549,7 +549,6 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 	sprintf(head->disk->disk_name, "nvme%dn%d",
 			ctrl->subsys->instance, head->instance);
 
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, head->disk->queue);
 	blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue);
 	blk_queue_flag_set(QUEUE_FLAG_IO_STAT, head->disk->queue);
 	/*
diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c
index 4533dd055ca8e3..1aa426b1deddc7 100644
--- a/drivers/s390/block/dasd_genhd.c
+++ b/drivers/s390/block/dasd_genhd.c
@@ -68,7 +68,6 @@ int dasd_gendisk_alloc(struct dasd_block *block)
 		blk_mq_free_tag_set(&block->tag_set);
 		return PTR_ERR(gdp);
 	}
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, gdp->queue);
 
 	/* Initialize gendisk structure. */
 	gdp->major = DASD_MAJOR;
diff --git a/drivers/s390/block/scm_blk.c b/drivers/s390/block/scm_blk.c
index 1d456a5a3bfb8e..2e2309fa9a0b34 100644
--- a/drivers/s390/block/scm_blk.c
+++ b/drivers/s390/block/scm_blk.c
@@ -475,7 +475,6 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct scm_device *scmdev)
 		goto out_tag;
 	}
 	rq = bdev->rq = bdev->gendisk->queue;
-	blk_queue_flag_set(QUEUE_FLAG_NONROT, rq);
 	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, rq);
 
 	bdev->gendisk->private_data = scmdev;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8764ea14c9b881..254b00f896dbb4 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3314,7 +3314,7 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp,
 	rcu_read_unlock();
 
 	if (rot == 1) {
-		blk_queue_flag_set(QUEUE_FLAG_NONROT, q);
+		lim->features &= ~BLK_FEAT_ROTATIONAL;
 		blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
 	}
 
@@ -3642,7 +3642,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		 * cause this to be updated correctly and any device which
 		 * doesn't support it should be treated as rotational.
 		 */
-		blk_queue_flag_clear(QUEUE_FLAG_NONROT, q);
+		lim.features |= BLK_FEAT_ROTATIONAL;
 		blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
 
 		if (scsi_device_supports_vpd(sdp)) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4e8931a2c76b07..c103f5adc17d84 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -289,14 +289,16 @@ enum {
 
 	/* supports passing on the FUA bit */
 	BLK_FEAT_FUA				= (1u << 1),
+
+	/* rotational device (hard drive or floppy) */
+	BLK_FEAT_ROTATIONAL			= (1u << 2),
 };
 
 /*
  * Flags automatically inherited when stacking limits.
  */
 #define BLK_FEAT_INHERIT_MASK \
-	(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA)
-
+	(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA | BLK_FEAT_ROTATIONAL)
 
 /* internal flags in queue_limits.flags */
 enum {
@@ -553,8 +555,6 @@ struct request_queue {
 #define QUEUE_FLAG_NOMERGES     3	/* disable merge attempts */
 #define QUEUE_FLAG_SAME_COMP	4	/* complete on same CPU-group */
 #define QUEUE_FLAG_FAIL_IO	5	/* fake timeout */
-#define QUEUE_FLAG_NONROT	6	/* non-rotational device (SSD) */
-#define QUEUE_FLAG_VIRT		QUEUE_FLAG_NONROT /* paravirt device */
 #define QUEUE_FLAG_IO_STAT	7	/* do disk/partitions IO accounting */
 #define QUEUE_FLAG_NOXMERGES	9	/* No extended merges */
 #define QUEUE_FLAG_ADD_RANDOM	10	/* Contributes to random pool */
@@ -589,7 +589,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_nomerges(q)	test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
 #define blk_queue_noxmerges(q)	\
 	test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags)
-#define blk_queue_nonrot(q)	test_bit(QUEUE_FLAG_NONROT, &(q)->queue_flags)
+#define blk_queue_nonrot(q)	((q)->limits.features & BLK_FEAT_ROTATIONAL)
 #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
 #define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
 #define blk_queue_zone_resetall(q)	\
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 14/26] block: move the nonrot flag to queue_limits
  2024-06-11  5:19 ` [PATCH 14/26] block: move the nonrot flag to queue_limits Christoph Hellwig
@ 2024-06-11  8:02   ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:02 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the norot flag into the queue_limits feature field so that it can be

s/norot/nonrot

> set atomically and all I/O is frozen when changing the flag.

and... -> with the queue frozen when... ?

> 
> Use the chance to switch to defaulting to non-rotational and require
> the driver to opt into rotational, which matches the polarity of the
> sysfs interface.
> 
> For the z2ram, ps3vram, 2x memstick, ubiblock and dcssblk the new
> rotational flag is not set as they clearly are not rotational despite
> this being a behavior change.  There are some other drivers that
> unconditionally set the rotational flag to keep the existing behavior
> as they arguably can be used on rotational devices even if that is
> probably not their main use today (e.g. virtio_blk and drbd).
> 
> The flag is automatically inherited in blk_stack_limits matching the
> existing behavior in dm and md.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Other than that, looks good to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 15/26] block: move the add_random flag to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (13 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 14/26] block: move the nonrot flag to queue_limits Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:06   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 16/26] block: move the io_stat flag setting " Christoph Hellwig
                   ` (10 subsequent siblings)
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the add_random flag into the queue_limits feature field so that it
can be set atomically and all I/O is frozen when changing the flag.

Note that this also removes code from dm to clear the flag based on
the underlying devices, which can't be reached as dm devices will
always start out without the flag set.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-debugfs.c            |  1 -
 block/blk-sysfs.c                 |  6 +++---
 drivers/block/mtip32xx/mtip32xx.c |  1 -
 drivers/md/dm-table.c             | 18 ------------------
 drivers/mmc/core/queue.c          |  2 --
 drivers/mtd/mtd_blkdevs.c         |  3 ---
 drivers/s390/block/scm_blk.c      |  4 ----
 drivers/scsi/scsi_lib.c           |  3 +--
 drivers/scsi/sd.c                 | 11 +++--------
 include/linux/blkdev.h            |  5 +++--
 10 files changed, 10 insertions(+), 44 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 4d0e62ec88f033..6b7edb50bfd3fa 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -86,7 +86,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(FAIL_IO),
 	QUEUE_FLAG_NAME(IO_STAT),
 	QUEUE_FLAG_NAME(NOXMERGES),
-	QUEUE_FLAG_NAME(ADD_RANDOM),
 	QUEUE_FLAG_NAME(SYNCHRONOUS),
 	QUEUE_FLAG_NAME(SAME_FORCE),
 	QUEUE_FLAG_NAME(INIT_DONE),
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 637ed3bbbfb46f..9174aca3b85526 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -323,7 +323,7 @@ queue_##name##_store(struct request_queue *q, const char *page, size_t count) \
 }
 
 QUEUE_SYSFS_FEATURE(rotational, BLK_FEAT_ROTATIONAL)
-QUEUE_SYSFS_BIT_FNS(random, ADD_RANDOM, 0);
+QUEUE_SYSFS_FEATURE(add_random, BLK_FEAT_ADD_RANDOM)
 QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
 #undef QUEUE_SYSFS_BIT_FNS
@@ -561,7 +561,7 @@ static struct queue_sysfs_entry queue_hw_sector_size_entry = {
 
 QUEUE_RW_ENTRY(queue_rotational, "rotational");
 QUEUE_RW_ENTRY(queue_iostats, "iostats");
-QUEUE_RW_ENTRY(queue_random, "add_random");
+QUEUE_RW_ENTRY(queue_add_random, "add_random");
 QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
 
 #ifdef CONFIG_BLK_WBT
@@ -665,7 +665,7 @@ static struct attribute *queue_attrs[] = {
 	&queue_nomerges_entry.attr,
 	&queue_iostats_entry.attr,
 	&queue_stable_writes_entry.attr,
-	&queue_random_entry.attr,
+	&queue_add_random_entry.attr,
 	&queue_poll_entry.attr,
 	&queue_wc_entry.attr,
 	&queue_fua_entry.attr,
diff --git a/drivers/block/mtip32xx/mtip32xx.c b/drivers/block/mtip32xx/mtip32xx.c
index 1dbbf72659d549..c6ef0546ffc9d2 100644
--- a/drivers/block/mtip32xx/mtip32xx.c
+++ b/drivers/block/mtip32xx/mtip32xx.c
@@ -3485,7 +3485,6 @@ static int mtip_block_initialize(struct driver_data *dd)
 		goto start_service_thread;
 
 	/* Set device limits. */
-	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, dd->queue);
 	dma_set_max_seg_size(&dd->pdev->dev, 0x400000);
 
 	/* Set the capacity of the device in 512 byte sectors. */
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3514a57c2df5d2..7654babc2775c1 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1716,14 +1716,6 @@ static int device_dax_write_cache_enabled(struct dm_target *ti,
 	return false;
 }
 
-static int device_is_not_random(struct dm_target *ti, struct dm_dev *dev,
-			     sector_t start, sector_t len, void *data)
-{
-	struct request_queue *q = bdev_get_queue(dev->bdev);
-
-	return !blk_queue_add_random(q);
-}
-
 static int device_not_write_zeroes_capable(struct dm_target *ti, struct dm_dev *dev,
 					   sector_t start, sector_t len, void *data)
 {
@@ -1876,16 +1868,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	else
 		blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q);
 
-	/*
-	 * Determine whether or not this queue's I/O timings contribute
-	 * to the entropy pool, Only request-based targets use this.
-	 * Clear QUEUE_FLAG_ADD_RANDOM if any underlying device does not
-	 * have it set.
-	 */
-	if (blk_queue_add_random(q) &&
-	    dm_table_any_dev_attr(t, device_is_not_random, NULL))
-		blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
-
 	/*
 	 * For a zoned target, setup the zones related queue attributes
 	 * and resources necessary for zone append emulation if necessary.
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index b4f62fa845864c..da00904d4a3c7e 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -387,8 +387,6 @@ static struct gendisk *mmc_alloc_disk(struct mmc_queue *mq,
 		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, mq->queue);
 	blk_queue_rq_timeout(mq->queue, 60 * HZ);
 
-	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, mq->queue);
-
 	dma_set_max_seg_size(mmc_dev(host), queue_max_segment_size(mq->queue));
 
 	INIT_WORK(&mq->recovery_work, mmc_mq_recovery_handler);
diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c
index bf8369ce7ddf1d..47ead84407cdcf 100644
--- a/drivers/mtd/mtd_blkdevs.c
+++ b/drivers/mtd/mtd_blkdevs.c
@@ -374,9 +374,6 @@ int add_mtd_blktrans_dev(struct mtd_blktrans_dev *new)
 	/* Create the request queue */
 	spin_lock_init(&new->queue_lock);
 	INIT_LIST_HEAD(&new->rq_list);
-
-	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, new->rq);
-
 	gd->queue = new->rq;
 
 	if (new->readonly)
diff --git a/drivers/s390/block/scm_blk.c b/drivers/s390/block/scm_blk.c
index 2e2309fa9a0b34..3fcfe029db1b3a 100644
--- a/drivers/s390/block/scm_blk.c
+++ b/drivers/s390/block/scm_blk.c
@@ -439,7 +439,6 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct scm_device *scmdev)
 		.logical_block_size	= 1 << 12,
 	};
 	unsigned int devindex;
-	struct request_queue *rq;
 	int len, ret;
 
 	lim.max_segments = min(scmdev->nr_max_block,
@@ -474,9 +473,6 @@ int scm_blk_dev_setup(struct scm_blk_dev *bdev, struct scm_device *scmdev)
 		ret = PTR_ERR(bdev->gendisk);
 		goto out_tag;
 	}
-	rq = bdev->rq = bdev->gendisk->queue;
-	blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, rq);
-
 	bdev->gendisk->private_data = scmdev;
 	bdev->gendisk->fops = &scm_blk_devops;
 	bdev->gendisk->major = scm_major;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ec39acc986d6ec..54f771ec8cfb5e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -631,8 +631,7 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
 	if (blk_update_request(req, error, bytes))
 		return true;
 
-	// XXX:
-	if (blk_queue_add_random(q))
+	if (q->limits.features & BLK_FEAT_ADD_RANDOM)
 		add_disk_randomness(req->q->disk);
 
 	WARN_ON_ONCE(!blk_rq_is_passthrough(req) &&
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 254b00f896dbb4..6b645bec6c4a56 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3297,7 +3297,6 @@ static void sd_read_block_limits_ext(struct scsi_disk *sdkp)
 static void sd_read_block_characteristics(struct scsi_disk *sdkp,
 		struct queue_limits *lim)
 {
-	struct request_queue *q = sdkp->disk->queue;
 	struct scsi_vpd *vpd;
 	u16 rot;
 
@@ -3313,10 +3312,8 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp,
 	sdkp->zoned = (vpd->data[8] >> 4) & 3;
 	rcu_read_unlock();
 
-	if (rot == 1) {
-		lim->features &= ~BLK_FEAT_ROTATIONAL;
-		blk_queue_flag_clear(QUEUE_FLAG_ADD_RANDOM, q);
-	}
+	if (rot == 1)
+		lim->features &= ~(BLK_FEAT_ROTATIONAL | BLK_FEAT_ADD_RANDOM);
 
 	if (!sdkp->first_scan)
 		return;
@@ -3595,7 +3592,6 @@ static int sd_revalidate_disk(struct gendisk *disk)
 {
 	struct scsi_disk *sdkp = scsi_disk(disk);
 	struct scsi_device *sdp = sdkp->device;
-	struct request_queue *q = sdkp->disk->queue;
 	sector_t old_capacity = sdkp->capacity;
 	struct queue_limits lim;
 	unsigned char *buffer;
@@ -3642,8 +3638,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		 * cause this to be updated correctly and any device which
 		 * doesn't support it should be treated as rotational.
 		 */
-		lim.features |= BLK_FEAT_ROTATIONAL;
-		blk_queue_flag_set(QUEUE_FLAG_ADD_RANDOM, q);
+		lim.features |= (BLK_FEAT_ROTATIONAL | BLK_FEAT_ADD_RANDOM);
 
 		if (scsi_device_supports_vpd(sdp)) {
 			sd_read_block_provisioning(sdkp);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c103f5adc17d84..e6a2382e21c4fe 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -292,6 +292,9 @@ enum {
 
 	/* rotational device (hard drive or floppy) */
 	BLK_FEAT_ROTATIONAL			= (1u << 2),
+
+	/* contributes to the random number pool */
+	BLK_FEAT_ADD_RANDOM			= (1u << 3),
 };
 
 /*
@@ -557,7 +560,6 @@ struct request_queue {
 #define QUEUE_FLAG_FAIL_IO	5	/* fake timeout */
 #define QUEUE_FLAG_IO_STAT	7	/* do disk/partitions IO accounting */
 #define QUEUE_FLAG_NOXMERGES	9	/* No extended merges */
-#define QUEUE_FLAG_ADD_RANDOM	10	/* Contributes to random pool */
 #define QUEUE_FLAG_SYNCHRONOUS	11	/* always completes in submit context */
 #define QUEUE_FLAG_SAME_FORCE	12	/* force complete on same CPU */
 #define QUEUE_FLAG_INIT_DONE	14	/* queue is initialized */
@@ -591,7 +593,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 	test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags)
 #define blk_queue_nonrot(q)	((q)->limits.features & BLK_FEAT_ROTATIONAL)
 #define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
-#define blk_queue_add_random(q)	test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 15/26] block: move the add_random flag to queue_limits
  2024-06-11  5:19 ` [PATCH 15/26] block: move the add_random " Christoph Hellwig
@ 2024-06-11  8:06   ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:06 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the add_random flag into the queue_limits feature field so that it
> can be set atomically and all I/O is frozen when changing the flag.

Same remark as the previous patches for the end of this sentence.c

> 
> Note that this also removes code from dm to clear the flag based on
> the underlying devices, which can't be reached as dm devices will
> always start out without the flag set.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Other than that, looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 16/26] block: move the io_stat flag setting to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (14 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 15/26] block: move the add_random " Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:09   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 17/26] block: move the stable_write flag " Christoph Hellwig
                   ` (9 subsequent siblings)
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the io_stat flag into the queue_limits feature field so that it
can be set atomically and all I/O is frozen when changing the flag.

Simplify md and dm to set the flag unconditionally instead of avoiding
setting a simple flag for cases where it already is set by other means,
which is a bit pointless.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-debugfs.c        |  1 -
 block/blk-mq.c                |  6 +++++-
 block/blk-sysfs.c             |  2 +-
 drivers/md/dm-table.c         | 12 +++++++++---
 drivers/md/dm.c               | 13 +++----------
 drivers/md/md.c               |  5 ++---
 drivers/nvme/host/multipath.c |  2 +-
 include/linux/blkdev.h        |  9 +++++----
 8 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 6b7edb50bfd3fa..cbe99444ed1a54 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -84,7 +84,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(NOMERGES),
 	QUEUE_FLAG_NAME(SAME_COMP),
 	QUEUE_FLAG_NAME(FAIL_IO),
-	QUEUE_FLAG_NAME(IO_STAT),
 	QUEUE_FLAG_NAME(NOXMERGES),
 	QUEUE_FLAG_NAME(SYNCHRONOUS),
 	QUEUE_FLAG_NAME(SAME_FORCE),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 58b0d6c7cc34d6..cf67dc13f7dd4c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4116,7 +4116,11 @@ struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
 	struct request_queue *q;
 	int ret;
 
-	q = blk_alloc_queue(lim ? lim : &default_lim, set->numa_node);
+	if (!lim)
+		lim = &default_lim;
+	lim->features |= BLK_FEAT_IO_STAT;
+
+	q = blk_alloc_queue(lim, set->numa_node);
 	if (IS_ERR(q))
 		return q;
 	q->queuedata = queuedata;
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 9174aca3b85526..6f58530fb3c08e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -324,7 +324,7 @@ queue_##name##_store(struct request_queue *q, const char *page, size_t count) \
 
 QUEUE_SYSFS_FEATURE(rotational, BLK_FEAT_ROTATIONAL)
 QUEUE_SYSFS_FEATURE(add_random, BLK_FEAT_ADD_RANDOM)
-QUEUE_SYSFS_BIT_FNS(iostats, IO_STAT, 0);
+QUEUE_SYSFS_FEATURE(iostats, BLK_FEAT_IO_STAT)
 QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
 #undef QUEUE_SYSFS_BIT_FNS
 
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 7654babc2775c1..3e3b713502f61e 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -579,6 +579,12 @@ int dm_split_args(int *argc, char ***argvp, char *input)
 	return 0;
 }
 
+static void dm_set_stacking_limits(struct queue_limits *limits)
+{
+	blk_set_stacking_limits(limits);
+	limits->features |= BLK_FEAT_IO_STAT;
+}
+
 /*
  * Impose necessary and sufficient conditions on a devices's table such
  * that any incoming bio which respects its logical_block_size can be
@@ -617,7 +623,7 @@ static int validate_hardware_logical_block_alignment(struct dm_table *t,
 	for (i = 0; i < t->num_targets; i++) {
 		ti = dm_table_get_target(t, i);
 
-		blk_set_stacking_limits(&ti_limits);
+		dm_set_stacking_limits(&ti_limits);
 
 		/* combine all target devices' limits */
 		if (ti->type->iterate_devices)
@@ -1591,7 +1597,7 @@ int dm_calculate_queue_limits(struct dm_table *t,
 	unsigned int zone_sectors = 0;
 	bool zoned = false;
 
-	blk_set_stacking_limits(limits);
+	dm_set_stacking_limits(limits);
 
 	t->integrity_supported = true;
 	for (unsigned int i = 0; i < t->num_targets; i++) {
@@ -1604,7 +1610,7 @@ int dm_calculate_queue_limits(struct dm_table *t,
 	for (unsigned int i = 0; i < t->num_targets; i++) {
 		struct dm_target *ti = dm_table_get_target(t, i);
 
-		blk_set_stacking_limits(&ti_limits);
+		dm_set_stacking_limits(&ti_limits);
 
 		if (!ti->type->iterate_devices) {
 			/* Set I/O hints portion of queue limits */
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 13037d6a6f62a2..8a976cee448bed 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2386,22 +2386,15 @@ int dm_setup_md_queue(struct mapped_device *md, struct dm_table *t)
 	struct table_device *td;
 	int r;
 
-	switch (type) {
-	case DM_TYPE_REQUEST_BASED:
+	WARN_ON_ONCE(type == DM_TYPE_NONE);
+
+	if (type == DM_TYPE_REQUEST_BASED) {
 		md->disk->fops = &dm_rq_blk_dops;
 		r = dm_mq_init_request_queue(md, t);
 		if (r) {
 			DMERR("Cannot initialize queue for request-based dm mapped device");
 			return r;
 		}
-		break;
-	case DM_TYPE_BIO_BASED:
-	case DM_TYPE_DAX_BIO_BASED:
-		blk_queue_flag_set(QUEUE_FLAG_IO_STAT, md->queue);
-		break;
-	case DM_TYPE_NONE:
-		WARN_ON_ONCE(true);
-		break;
 	}
 
 	r = dm_calculate_queue_limits(t, &limits);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index c23423c51fb7c2..8db0db8d5a27ac 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5787,7 +5787,8 @@ struct mddev *md_alloc(dev_t dev, char *name)
 	int unit;
 	int error;
 	struct queue_limits lim = {
-		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA,
+		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA |
+					  BLK_FEAT_IO_STAT,
 	};
 
 	/*
@@ -6152,8 +6153,6 @@ int md_run(struct mddev *mddev)
 	if (!mddev_is_dm(mddev)) {
 		struct request_queue *q = mddev->gendisk->queue;
 
-		blk_queue_flag_set(QUEUE_FLAG_IO_STAT, q);
-
 		/* Set the NOWAIT flags if all underlying devices support it */
 		if (nowait)
 			blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 58c13304e558e0..eea727cfa9e67d 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -538,6 +538,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 
 	blk_set_stacking_limits(&lim);
 	lim.dma_alignment = 3;
+	lim.features |= BLK_FEAT_IO_STAT;
 	if (head->ids.csi != NVME_CSI_ZNS)
 		lim.max_zone_append_sectors = 0;
 
@@ -550,7 +551,6 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 			ctrl->subsys->instance, head->instance);
 
 	blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue);
-	blk_queue_flag_set(QUEUE_FLAG_IO_STAT, head->disk->queue);
 	/*
 	 * This assumes all controllers that refer to a namespace either
 	 * support poll queues or not.  That is not a strict guarantee,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e6a2382e21c4fe..f8e38f94fd8c9a 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -295,6 +295,9 @@ enum {
 
 	/* contributes to the random number pool */
 	BLK_FEAT_ADD_RANDOM			= (1u << 3),
+
+	/* do disk/partitions IO accounting */
+	BLK_FEAT_IO_STAT			= (1u << 4),
 };
 
 /*
@@ -558,7 +561,6 @@ struct request_queue {
 #define QUEUE_FLAG_NOMERGES     3	/* disable merge attempts */
 #define QUEUE_FLAG_SAME_COMP	4	/* complete on same CPU-group */
 #define QUEUE_FLAG_FAIL_IO	5	/* fake timeout */
-#define QUEUE_FLAG_IO_STAT	7	/* do disk/partitions IO accounting */
 #define QUEUE_FLAG_NOXMERGES	9	/* No extended merges */
 #define QUEUE_FLAG_SYNCHRONOUS	11	/* always completes in submit context */
 #define QUEUE_FLAG_SAME_FORCE	12	/* force complete on same CPU */
@@ -577,8 +579,7 @@ struct request_queue {
 #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
 #define QUEUE_FLAG_SKIP_TAGSET_QUIESCE	31 /* quiesce_tagset skip the queue*/
 
-#define QUEUE_FLAG_MQ_DEFAULT	((1UL << QUEUE_FLAG_IO_STAT) |		\
-				 (1UL << QUEUE_FLAG_SAME_COMP) |	\
+#define QUEUE_FLAG_MQ_DEFAULT	((1UL << QUEUE_FLAG_SAME_COMP) |	\
 				 (1UL << QUEUE_FLAG_NOWAIT))
 
 void blk_queue_flag_set(unsigned int flag, struct request_queue *q);
@@ -592,7 +593,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_noxmerges(q)	\
 	test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags)
 #define blk_queue_nonrot(q)	((q)->limits.features & BLK_FEAT_ROTATIONAL)
-#define blk_queue_io_stat(q)	test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags)
+#define blk_queue_io_stat(q)	((q)->limits.features & BLK_FEAT_IO_STAT)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
 #define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 16/26] block: move the io_stat flag setting to queue_limits
  2024-06-11  5:19 ` [PATCH 16/26] block: move the io_stat flag setting " Christoph Hellwig
@ 2024-06-11  8:09   ` Damien Le Moal
  2024-06-12  4:58     ` Christoph Hellwig
  0 siblings, 1 reply; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:09 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the io_stat flag into the queue_limits feature field so that it
> can be set atomically and all I/O is frozen when changing the flag.

Why a feature ? It seems more appropriate for io_stat to be a flag rather than
a feature as that is a block layer thing rather than a device characteristic, no ?

> 
> Simplify md and dm to set the flag unconditionally instead of avoiding
> setting a simple flag for cases where it already is set by other means,
> which is a bit pointless.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 16/26] block: move the io_stat flag setting to queue_limits
  2024-06-11  8:09   ` Damien Le Moal
@ 2024-06-12  4:58     ` Christoph Hellwig
  0 siblings, 0 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-12  4:58 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Tue, Jun 11, 2024 at 05:09:45PM +0900, Damien Le Moal wrote:
> On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> > Move the io_stat flag into the queue_limits feature field so that it
> > can be set atomically and all I/O is frozen when changing the flag.
> 
> Why a feature ? It seems more appropriate for io_stat to be a flag rather than
> a feature as that is a block layer thing rather than a device characteristic, no ?

Because it must actually be supported by the driver for bio based
drivers.  Then again we also support chaning it through sysfs, so
we might actually need both.  At least unlike say the cache it's
not actively harmful when enabled despite not being supported.

I can look into that, but I'll do it in another series after getting
all the driver changes out.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 17/26]  block: move the stable_write flag to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (15 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 16/26] block: move the io_stat flag setting " Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:12   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 18/26] block: move the synchronous " Christoph Hellwig
                   ` (8 subsequent siblings)
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the io_stat flag into the queue_limits feature field so that it can
be set atomically and all I/O is frozen when changing the flag.

The flag is now inherited by blk_stack_limits, which greatly simplifies
the code in dm, and fixed md which previously did not pass on the flag
set on lower devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-debugfs.c         |  1 -
 block/blk-sysfs.c              | 29 +----------------------------
 drivers/block/drbd/drbd_main.c |  5 ++---
 drivers/block/rbd.c            |  9 +++------
 drivers/block/zram/zram_drv.c  |  2 +-
 drivers/md/dm-table.c          | 19 -------------------
 drivers/md/raid5.c             |  6 ++++--
 drivers/mmc/core/queue.c       |  5 +++--
 drivers/nvme/host/core.c       |  9 +++++----
 drivers/nvme/host/multipath.c  |  4 ----
 drivers/scsi/iscsi_tcp.c       |  8 ++++----
 include/linux/blkdev.h         |  9 ++++++---
 12 files changed, 29 insertions(+), 77 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index cbe99444ed1a54..eb73f1d348e5a9 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -88,7 +88,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(SYNCHRONOUS),
 	QUEUE_FLAG_NAME(SAME_FORCE),
 	QUEUE_FLAG_NAME(INIT_DONE),
-	QUEUE_FLAG_NAME(STABLE_WRITES),
 	QUEUE_FLAG_NAME(POLL),
 	QUEUE_FLAG_NAME(DAX),
 	QUEUE_FLAG_NAME(STATS),
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 6f58530fb3c08e..cde525724831ef 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -296,37 +296,10 @@ static ssize_t queue_##_name##_store(struct request_queue *q,		 \
 	return queue_feature_store(q, page, count, _feature);		 \
 }
 
-#define QUEUE_SYSFS_BIT_FNS(name, flag, neg)				\
-static ssize_t								\
-queue_##name##_show(struct request_queue *q, char *page)		\
-{									\
-	int bit;							\
-	bit = test_bit(QUEUE_FLAG_##flag, &q->queue_flags);		\
-	return queue_var_show(neg ? !bit : bit, page);			\
-}									\
-static ssize_t								\
-queue_##name##_store(struct request_queue *q, const char *page, size_t count) \
-{									\
-	unsigned long val;						\
-	ssize_t ret;							\
-	ret = queue_var_store(&val, page, count);			\
-	if (ret < 0)							\
-		 return ret;						\
-	if (neg)							\
-		val = !val;						\
-									\
-	if (val)							\
-		blk_queue_flag_set(QUEUE_FLAG_##flag, q);		\
-	else								\
-		blk_queue_flag_clear(QUEUE_FLAG_##flag, q);		\
-	return ret;							\
-}
-
 QUEUE_SYSFS_FEATURE(rotational, BLK_FEAT_ROTATIONAL)
 QUEUE_SYSFS_FEATURE(add_random, BLK_FEAT_ADD_RANDOM)
 QUEUE_SYSFS_FEATURE(iostats, BLK_FEAT_IO_STAT)
-QUEUE_SYSFS_BIT_FNS(stable_writes, STABLE_WRITES, 0);
-#undef QUEUE_SYSFS_BIT_FNS
+QUEUE_SYSFS_FEATURE(stable_writes, BLK_FEAT_STABLE_WRITES);
 
 static ssize_t queue_zoned_show(struct request_queue *q, char *page)
 {
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 2ef29a47807550..f92673f05c7abc 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2698,7 +2698,8 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
 		 */
 		.max_hw_sectors		= DRBD_MAX_BIO_SIZE_SAFE >> 8,
 		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA |
-					  BLK_FEAT_ROTATIONAL,
+					  BLK_FEAT_ROTATIONAL |
+					  BLK_FEAT_STABLE_WRITES,
 	};
 
 	device = minor_to_device(minor);
@@ -2737,8 +2738,6 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
 	sprintf(disk->disk_name, "drbd%d", minor);
 	disk->private_data = device;
 
-	blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, disk->queue);
-
 	device->md_io.page = alloc_page(GFP_KERNEL);
 	if (!device->md_io.page)
 		goto out_no_io_page;
diff --git a/drivers/block/rbd.c b/drivers/block/rbd.c
index ec1f1c7d4275cd..008e850555f41a 100644
--- a/drivers/block/rbd.c
+++ b/drivers/block/rbd.c
@@ -4949,7 +4949,6 @@ static const struct blk_mq_ops rbd_mq_ops = {
 static int rbd_init_disk(struct rbd_device *rbd_dev)
 {
 	struct gendisk *disk;
-	struct request_queue *q;
 	unsigned int objset_bytes =
 	    rbd_dev->layout.object_size * rbd_dev->layout.stripe_count;
 	struct queue_limits lim = {
@@ -4979,12 +4978,14 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
 		lim.max_write_zeroes_sectors = objset_bytes >> SECTOR_SHIFT;
 	}
 
+	if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
+		lim.features |= BLK_FEAT_STABLE_WRITES;
+
 	disk = blk_mq_alloc_disk(&rbd_dev->tag_set, &lim, rbd_dev);
 	if (IS_ERR(disk)) {
 		err = PTR_ERR(disk);
 		goto out_tag_set;
 	}
-	q = disk->queue;
 
 	snprintf(disk->disk_name, sizeof(disk->disk_name), RBD_DRV_NAME "%d",
 		 rbd_dev->dev_id);
@@ -4996,10 +4997,6 @@ static int rbd_init_disk(struct rbd_device *rbd_dev)
 		disk->minors = RBD_MINORS_PER_MAJOR;
 	disk->fops = &rbd_bd_ops;
 	disk->private_data = rbd_dev;
-
-	if (!ceph_test_opt(rbd_dev->rbd_client->client, NOCRC))
-		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
-
 	rbd_dev->disk = disk;
 
 	return 0;
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index aad840fc7e18e3..f8f1b5b54795ac 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2208,6 +2208,7 @@ static int zram_add(void)
 #if ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE
 		.max_write_zeroes_sectors	= UINT_MAX,
 #endif
+		.features			= BLK_FEAT_STABLE_WRITES,
 	};
 	struct zram *zram;
 	int ret, device_id;
@@ -2246,7 +2247,6 @@ static int zram_add(void)
 	/* Actual capacity set using sysfs (/sys/block/zram<id>/disksize */
 	set_capacity(zram->disk, 0);
 	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, zram->disk->queue);
-	blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
 	ret = device_add_disk(NULL, zram->disk, zram_disk_groups);
 	if (ret)
 		goto out_cleanup_disk;
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 3e3b713502f61e..f4e1b50ffdcda5 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1819,13 +1819,6 @@ static bool dm_table_supports_secure_erase(struct dm_table *t)
 	return true;
 }
 
-static int device_requires_stable_pages(struct dm_target *ti,
-					struct dm_dev *dev, sector_t start,
-					sector_t len, void *data)
-{
-	return bdev_stable_writes(dev->bdev);
-}
-
 int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 			      struct queue_limits *limits)
 {
@@ -1862,18 +1855,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (dm_table_any_dev_attr(t, device_dax_write_cache_enabled, NULL))
 		dax_write_cache(t->md->dax_dev, true);
 
-	/*
-	 * Some devices don't use blk_integrity but still want stable pages
-	 * because they do their own checksumming.
-	 * If any underlying device requires stable pages, a table must require
-	 * them as well.  Only targets that support iterate_devices are considered:
-	 * don't want error, zero, etc to require stable pages.
-	 */
-	if (dm_table_any_dev_attr(t, device_requires_stable_pages, NULL))
-		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
-	else
-		blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q);
-
 	/*
 	 * For a zoned target, setup the zones related queue attributes
 	 * and resources necessary for zone append emulation if necessary.
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 675c68fa6c6403..e875763d69917d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7082,12 +7082,14 @@ raid5_store_skip_copy(struct mddev *mddev, const char *page, size_t len)
 		err = -ENODEV;
 	else if (new != conf->skip_copy) {
 		struct request_queue *q = mddev->gendisk->queue;
+		struct queue_limits lim = queue_limits_start_update(q);
 
 		conf->skip_copy = new;
 		if (new)
-			blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, q);
+			lim.features |= BLK_FEAT_STABLE_WRITES;
 		else
-			blk_queue_flag_clear(QUEUE_FLAG_STABLE_WRITES, q);
+			lim.features &= ~BLK_FEAT_STABLE_WRITES;
+		err = queue_limits_commit_update(q, &lim);
 	}
 	mddev_unlock_and_resume(mddev);
 	return err ?: len;
diff --git a/drivers/mmc/core/queue.c b/drivers/mmc/core/queue.c
index da00904d4a3c7e..d0b3ca8a11f071 100644
--- a/drivers/mmc/core/queue.c
+++ b/drivers/mmc/core/queue.c
@@ -378,13 +378,14 @@ static struct gendisk *mmc_alloc_disk(struct mmc_queue *mq,
 		lim.max_segments = host->max_segs;
 	}
 
+	if (mmc_host_is_spi(host) && host->use_spi_crc)
+		lim.features |= BLK_FEAT_STABLE_WRITES;
+
 	disk = blk_mq_alloc_disk(&mq->tag_set, &lim, mq);
 	if (IS_ERR(disk))
 		return disk;
 	mq->queue = disk->queue;
 
-	if (mmc_host_is_spi(host) && host->use_spi_crc)
-		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, mq->queue);
 	blk_queue_rq_timeout(mq->queue, 60 * HZ);
 
 	dma_set_max_seg_size(mmc_dev(host), queue_max_segment_size(mq->queue));
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0d753fe71f35b0..5ecf762d7c8837 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3724,6 +3724,7 @@ static void nvme_ns_add_to_ctrl_list(struct nvme_ns *ns)
 
 static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 {
+	struct queue_limits lim = { };
 	struct nvme_ns *ns;
 	struct gendisk *disk;
 	int node = ctrl->numa_node;
@@ -3732,7 +3733,10 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 	if (!ns)
 		return;
 
-	disk = blk_mq_alloc_disk(ctrl->tagset, NULL, ns);
+	if (ctrl->opts && ctrl->opts->data_digest)
+		lim.features |= BLK_FEAT_STABLE_WRITES;
+
+	disk = blk_mq_alloc_disk(ctrl->tagset, &lim, ns);
 	if (IS_ERR(disk))
 		goto out_free_ns;
 	disk->fops = &nvme_bdev_ops;
@@ -3741,9 +3745,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 	ns->disk = disk;
 	ns->queue = disk->queue;
 
-	if (ctrl->opts && ctrl->opts->data_digest)
-		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, ns->queue);
-
 	if (ctrl->ops->supports_pci_p2pdma &&
 	    ctrl->ops->supports_pci_p2pdma(ctrl))
 		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index eea727cfa9e67d..173796f2ddea9f 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -868,10 +868,6 @@ void nvme_mpath_add_disk(struct nvme_ns *ns, __le32 anagrpid)
 		nvme_mpath_set_live(ns);
 	}
 
-	if (test_bit(QUEUE_FLAG_STABLE_WRITES, &ns->queue->queue_flags) &&
-	    ns->head->disk)
-		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES,
-				   ns->head->disk->queue);
 #ifdef CONFIG_BLK_DEV_ZONED
 	if (blk_queue_is_zoned(ns->queue) && ns->head->disk)
 		ns->head->disk->nr_zones = ns->disk->nr_zones;
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 60688f18fac6f7..c708e105963833 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -1057,15 +1057,15 @@ static umode_t iscsi_sw_tcp_attr_is_visible(int param_type, int param)
 	return 0;
 }
 
-static int iscsi_sw_tcp_slave_configure(struct scsi_device *sdev)
+static int iscsi_sw_tcp_device_configure(struct scsi_device *sdev,
+		struct queue_limits *lim)
 {
 	struct iscsi_sw_tcp_host *tcp_sw_host = iscsi_host_priv(sdev->host);
 	struct iscsi_session *session = tcp_sw_host->session;
 	struct iscsi_conn *conn = session->leadconn;
 
 	if (conn->datadgst_en)
-		blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES,
-				   sdev->request_queue);
+		lim->features |= BLK_FEAT_STABLE_WRITES;
 	return 0;
 }
 
@@ -1083,7 +1083,7 @@ static const struct scsi_host_template iscsi_sw_tcp_sht = {
 	.eh_device_reset_handler= iscsi_eh_device_reset,
 	.eh_target_reset_handler = iscsi_eh_recover_target,
 	.dma_boundary		= PAGE_SIZE - 1,
-	.slave_configure        = iscsi_sw_tcp_slave_configure,
+	.device_configure	= iscsi_sw_tcp_device_configure,
 	.proc_name		= "iscsi_tcp",
 	.this_id		= -1,
 	.track_queue_depth	= 1,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index f8e38f94fd8c9a..db14c61791e022 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -298,13 +298,17 @@ enum {
 
 	/* do disk/partitions IO accounting */
 	BLK_FEAT_IO_STAT			= (1u << 4),
+
+	/* don't modify data until writeback is done */
+	BLK_FEAT_STABLE_WRITES			= (1u << 5),
 };
 
 /*
  * Flags automatically inherited when stacking limits.
  */
 #define BLK_FEAT_INHERIT_MASK \
-	(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA | BLK_FEAT_ROTATIONAL)
+	(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA | BLK_FEAT_ROTATIONAL | \
+	 BLK_FEAT_STABLE_WRITES)
 
 /* internal flags in queue_limits.flags */
 enum {
@@ -565,7 +569,6 @@ struct request_queue {
 #define QUEUE_FLAG_SYNCHRONOUS	11	/* always completes in submit context */
 #define QUEUE_FLAG_SAME_FORCE	12	/* force complete on same CPU */
 #define QUEUE_FLAG_INIT_DONE	14	/* queue is initialized */
-#define QUEUE_FLAG_STABLE_WRITES 15	/* don't modify blks until WB is done */
 #define QUEUE_FLAG_POLL		16	/* IO polling enabled if set */
 #define QUEUE_FLAG_DAX		19	/* device supports DAX */
 #define QUEUE_FLAG_STATS	20	/* track IO start and completion times */
@@ -1324,7 +1327,7 @@ static inline bool bdev_stable_writes(struct block_device *bdev)
 	if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
 	    q->limits.integrity.csum_type != 0)
 		return true;
-	return test_bit(QUEUE_FLAG_STABLE_WRITES, &q->queue_flags);
+	return q->limits.features & BLK_FEAT_STABLE_WRITES;
 }
 
 static inline bool blk_queue_write_cache(struct request_queue *q)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 17/26] block: move the stable_write flag to queue_limits
  2024-06-11  5:19 ` [PATCH 17/26] block: move the stable_write flag " Christoph Hellwig
@ 2024-06-11  8:12   ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:12 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the io_stat flag into the queue_limits feature field so that it can

s/io_stat/stable_write

> be set atomically and all I/O is frozen when changing the flag.
> 
> The flag is now inherited by blk_stack_limits, which greatly simplifies
> the code in dm, and fixed md which previously did not pass on the flag
> set on lower devices.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Other than the nit above, looks OK to me.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 18/26] block: move the synchronous flag to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (16 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 17/26] block: move the stable_write flag " Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:13   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 19/26] block: move the nowait " Christoph Hellwig
                   ` (7 subsequent siblings)
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the synchronous flag into the queue_limits feature field so that it
can be set atomically and all I/O is frozen when changing the flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-debugfs.c        | 1 -
 drivers/block/brd.c           | 2 +-
 drivers/block/zram/zram_drv.c | 4 ++--
 drivers/nvdimm/btt.c          | 3 +--
 drivers/nvdimm/pmem.c         | 4 ++--
 include/linux/blkdev.h        | 7 ++++---
 6 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index eb73f1d348e5a9..957774e40b1d0c 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -85,7 +85,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(SAME_COMP),
 	QUEUE_FLAG_NAME(FAIL_IO),
 	QUEUE_FLAG_NAME(NOXMERGES),
-	QUEUE_FLAG_NAME(SYNCHRONOUS),
 	QUEUE_FLAG_NAME(SAME_FORCE),
 	QUEUE_FLAG_NAME(INIT_DONE),
 	QUEUE_FLAG_NAME(POLL),
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index b25dc463b5e3a6..d77deb571dbd06 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -335,6 +335,7 @@ static int brd_alloc(int i)
 		.max_hw_discard_sectors	= UINT_MAX,
 		.max_discard_segments	= 1,
 		.discard_granularity	= PAGE_SIZE,
+		.features		= BLK_FEAT_SYNCHRONOUS,
 	};
 
 	list_for_each_entry(brd, &brd_devices, brd_list)
@@ -366,7 +367,6 @@ static int brd_alloc(int i)
 	strscpy(disk->disk_name, buf, DISK_NAME_LEN);
 	set_capacity(disk, rd_size * 2);
 	
-	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, disk->queue);
 	blk_queue_flag_set(QUEUE_FLAG_NOWAIT, disk->queue);
 	err = add_disk(disk);
 	if (err)
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index f8f1b5b54795ac..efcb8d9d274c31 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -2208,7 +2208,8 @@ static int zram_add(void)
 #if ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE
 		.max_write_zeroes_sectors	= UINT_MAX,
 #endif
-		.features			= BLK_FEAT_STABLE_WRITES,
+		.features			= BLK_FEAT_STABLE_WRITES |
+						  BLK_FEAT_SYNCHRONOUS,
 	};
 	struct zram *zram;
 	int ret, device_id;
@@ -2246,7 +2247,6 @@ static int zram_add(void)
 
 	/* Actual capacity set using sysfs (/sys/block/zram<id>/disksize */
 	set_capacity(zram->disk, 0);
-	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, zram->disk->queue);
 	ret = device_add_disk(NULL, zram->disk, zram_disk_groups);
 	if (ret)
 		goto out_cleanup_disk;
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index e474afa8e9f68d..e79c06d65bb77b 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -1501,6 +1501,7 @@ static int btt_blk_init(struct btt *btt)
 		.logical_block_size	= btt->sector_size,
 		.max_hw_sectors		= UINT_MAX,
 		.max_integrity_segments	= 1,
+		.features		= BLK_FEAT_SYNCHRONOUS,
 	};
 	int rc;
 
@@ -1518,8 +1519,6 @@ static int btt_blk_init(struct btt *btt)
 	btt->btt_disk->fops = &btt_fops;
 	btt->btt_disk->private_data = btt;
 
-	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, btt->btt_disk->queue);
-
 	set_capacity(btt->btt_disk, btt->nlba * btt->sector_size >> 9);
 	rc = device_add_disk(&btt->nd_btt->dev, btt->btt_disk, NULL);
 	if (rc)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 501cf226df0187..b821dcf018f6ae 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -455,7 +455,8 @@ static int pmem_attach_disk(struct device *dev,
 		.logical_block_size	= pmem_sector_size(ndns),
 		.physical_block_size	= PAGE_SIZE,
 		.max_hw_sectors		= UINT_MAX,
-		.features		= BLK_FEAT_WRITE_CACHE,
+		.features		= BLK_FEAT_WRITE_CACHE |
+					  BLK_FEAT_SYNCHRONOUS,
 	};
 	int nid = dev_to_node(dev), fua;
 	struct resource *res = &nsio->res;
@@ -546,7 +547,6 @@ static int pmem_attach_disk(struct device *dev,
 	}
 	pmem->virt_addr = addr;
 
-	blk_queue_flag_set(QUEUE_FLAG_SYNCHRONOUS, q);
 	if (pmem->pfn_flags & PFN_MAP)
 		blk_queue_flag_set(QUEUE_FLAG_DAX, q);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index db14c61791e022..4d908e29c760da 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -301,6 +301,9 @@ enum {
 
 	/* don't modify data until writeback is done */
 	BLK_FEAT_STABLE_WRITES			= (1u << 5),
+
+	/* always completes in submit context */
+	BLK_FEAT_SYNCHRONOUS			= (1u << 6),
 };
 
 /*
@@ -566,7 +569,6 @@ struct request_queue {
 #define QUEUE_FLAG_SAME_COMP	4	/* complete on same CPU-group */
 #define QUEUE_FLAG_FAIL_IO	5	/* fake timeout */
 #define QUEUE_FLAG_NOXMERGES	9	/* No extended merges */
-#define QUEUE_FLAG_SYNCHRONOUS	11	/* always completes in submit context */
 #define QUEUE_FLAG_SAME_FORCE	12	/* force complete on same CPU */
 #define QUEUE_FLAG_INIT_DONE	14	/* queue is initialized */
 #define QUEUE_FLAG_POLL		16	/* IO polling enabled if set */
@@ -1315,8 +1317,7 @@ static inline bool bdev_nonrot(struct block_device *bdev)
 
 static inline bool bdev_synchronous(struct block_device *bdev)
 {
-	return test_bit(QUEUE_FLAG_SYNCHRONOUS,
-			&bdev_get_queue(bdev)->queue_flags);
+	return bdev->bd_disk->queue->limits.features & BLK_FEAT_SYNCHRONOUS;
 }
 
 static inline bool bdev_stable_writes(struct block_device *bdev)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 18/26] block: move the synchronous flag to queue_limits
  2024-06-11  5:19 ` [PATCH 18/26] block: move the synchronous " Christoph Hellwig
@ 2024-06-11  8:13   ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:13 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the synchronous flag into the queue_limits feature field so that it
> can be set atomically and all I/O is frozen when changing the flag.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 19/26] block: move the nowait flag to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (17 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 18/26] block: move the synchronous " Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:16   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 20/26] block: move the dax " Christoph Hellwig
                   ` (6 subsequent siblings)
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the nowait flag into the queue_limits feature field so that it
can be set atomically and all I/O is frozen when changing the flag.

Stacking drivers are simplified in that they now can simply set the
flag, and blk_stack_limits will clear it when the features is not
supported by any of the underlying devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-debugfs.c        |  1 -
 block/blk-mq.c                |  2 +-
 block/blk-settings.c          |  9 +++++++++
 drivers/block/brd.c           |  4 ++--
 drivers/md/dm-table.c         | 16 ++--------------
 drivers/md/md.c               | 18 +-----------------
 drivers/nvme/host/multipath.c |  3 +--
 include/linux/blkdev.h        |  9 +++++----
 8 files changed, 21 insertions(+), 41 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 957774e40b1d0c..62b132e9a9ce3b 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -96,7 +96,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(ZONE_RESETALL),
 	QUEUE_FLAG_NAME(RQ_ALLOC_TIME),
 	QUEUE_FLAG_NAME(HCTX_ACTIVE),
-	QUEUE_FLAG_NAME(NOWAIT),
 	QUEUE_FLAG_NAME(SQ_SCHED),
 	QUEUE_FLAG_NAME(SKIP_TAGSET_QUIESCE),
 };
diff --git a/block/blk-mq.c b/block/blk-mq.c
index cf67dc13f7dd4c..43235acc87505f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4118,7 +4118,7 @@ struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
 
 	if (!lim)
 		lim = &default_lim;
-	lim->features |= BLK_FEAT_IO_STAT;
+	lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
 
 	q = blk_alloc_queue(lim, set->numa_node);
 	if (IS_ERR(q))
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 536ee202fcdccb..bf4622c19b5c09 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -459,6 +459,15 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 
 	t->features |= (b->features & BLK_FEAT_INHERIT_MASK);
 
+	/*
+	 * BLK_FEAT_NOWAIT needs to be supported both by the stacking driver
+	 * and all underlying devices.  The stacking driver sets the flag
+	 * before stacking the limits, and this will clear the flag if any
+	 * of the underlying devices does not support it.
+	 */
+	if (!(b->features & BLK_FEAT_NOWAIT))
+		t->features &= ~BLK_FEAT_NOWAIT;
+
 	t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
 	t->max_user_sectors = min_not_zero(t->max_user_sectors,
 			b->max_user_sectors);
diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index d77deb571dbd06..a300645cd9d4a5 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -335,7 +335,8 @@ static int brd_alloc(int i)
 		.max_hw_discard_sectors	= UINT_MAX,
 		.max_discard_segments	= 1,
 		.discard_granularity	= PAGE_SIZE,
-		.features		= BLK_FEAT_SYNCHRONOUS,
+		.features		= BLK_FEAT_SYNCHRONOUS |
+					  BLK_FEAT_NOWAIT,
 	};
 
 	list_for_each_entry(brd, &brd_devices, brd_list)
@@ -367,7 +368,6 @@ static int brd_alloc(int i)
 	strscpy(disk->disk_name, buf, DISK_NAME_LEN);
 	set_capacity(disk, rd_size * 2);
 	
-	blk_queue_flag_set(QUEUE_FLAG_NOWAIT, disk->queue);
 	err = add_disk(disk);
 	if (err)
 		goto out_cleanup_disk;
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index f4e1b50ffdcda5..eee43d27733f9a 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -582,7 +582,7 @@ int dm_split_args(int *argc, char ***argvp, char *input)
 static void dm_set_stacking_limits(struct queue_limits *limits)
 {
 	blk_set_stacking_limits(limits);
-	limits->features |= BLK_FEAT_IO_STAT;
+	limits->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
 }
 
 /*
@@ -1746,12 +1746,6 @@ static bool dm_table_supports_write_zeroes(struct dm_table *t)
 	return true;
 }
 
-static int device_not_nowait_capable(struct dm_target *ti, struct dm_dev *dev,
-				     sector_t start, sector_t len, void *data)
-{
-	return !bdev_nowait(dev->bdev);
-}
-
 static bool dm_table_supports_nowait(struct dm_table *t)
 {
 	for (unsigned int i = 0; i < t->num_targets; i++) {
@@ -1759,10 +1753,6 @@ static bool dm_table_supports_nowait(struct dm_table *t)
 
 		if (!dm_target_supports_nowait(ti->type))
 			return false;
-
-		if (!ti->type->iterate_devices ||
-		    ti->type->iterate_devices(ti, device_not_nowait_capable, NULL))
-			return false;
 	}
 
 	return true;
@@ -1825,9 +1815,7 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	int r;
 
 	if (dm_table_supports_nowait(t))
-		blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
-	else
-		blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, q);
+		limits->features &= ~BLK_FEAT_NOWAIT;
 
 	if (!dm_table_supports_discards(t)) {
 		limits->max_hw_discard_sectors = 0;
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 8db0db8d5a27ac..f1c7d4f281c521 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -5788,7 +5788,7 @@ struct mddev *md_alloc(dev_t dev, char *name)
 	int error;
 	struct queue_limits lim = {
 		.features		= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA |
-					  BLK_FEAT_IO_STAT,
+					  BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT,
 	};
 
 	/*
@@ -6150,13 +6150,6 @@ int md_run(struct mddev *mddev)
 		}
 	}
 
-	if (!mddev_is_dm(mddev)) {
-		struct request_queue *q = mddev->gendisk->queue;
-
-		/* Set the NOWAIT flags if all underlying devices support it */
-		if (nowait)
-			blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
-	}
 	if (pers->sync_request) {
 		if (mddev->kobj.sd &&
 		    sysfs_create_group(&mddev->kobj, &md_redundancy_group))
@@ -7115,15 +7108,6 @@ static int hot_add_disk(struct mddev *mddev, dev_t dev)
 	set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags);
 	if (!mddev->thread)
 		md_update_sb(mddev, 1);
-	/*
-	 * If the new disk does not support REQ_NOWAIT,
-	 * disable on the whole MD.
-	 */
-	if (!bdev_nowait(rdev->bdev)) {
-		pr_info("%s: Disabling nowait because %pg does not support nowait\n",
-			mdname(mddev), rdev->bdev);
-		blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, mddev->gendisk->queue);
-	}
 	/*
 	 * Kick recovery, maybe this spare has to be added to the
 	 * array immediately.
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 173796f2ddea9f..61a162c9cf4e6c 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -538,7 +538,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 
 	blk_set_stacking_limits(&lim);
 	lim.dma_alignment = 3;
-	lim.features |= BLK_FEAT_IO_STAT;
+	lim.features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
 	if (head->ids.csi != NVME_CSI_ZNS)
 		lim.max_zone_append_sectors = 0;
 
@@ -550,7 +550,6 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 	sprintf(head->disk->disk_name, "nvme%dn%d",
 			ctrl->subsys->instance, head->instance);
 
-	blk_queue_flag_set(QUEUE_FLAG_NOWAIT, head->disk->queue);
 	/*
 	 * This assumes all controllers that refer to a namespace either
 	 * support poll queues or not.  That is not a strict guarantee,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4d908e29c760da..59c2327692589b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -304,6 +304,9 @@ enum {
 
 	/* always completes in submit context */
 	BLK_FEAT_SYNCHRONOUS			= (1u << 6),
+
+	/* supports REQ_NOWAIT */
+	BLK_FEAT_NOWAIT				= (1u << 7),
 };
 
 /*
@@ -580,12 +583,10 @@ struct request_queue {
 #define QUEUE_FLAG_ZONE_RESETALL 26	/* supports Zone Reset All */
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
-#define QUEUE_FLAG_NOWAIT       29	/* device supports NOWAIT */
 #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
 #define QUEUE_FLAG_SKIP_TAGSET_QUIESCE	31 /* quiesce_tagset skip the queue*/
 
-#define QUEUE_FLAG_MQ_DEFAULT	((1UL << QUEUE_FLAG_SAME_COMP) |	\
-				 (1UL << QUEUE_FLAG_NOWAIT))
+#define QUEUE_FLAG_MQ_DEFAULT	(1UL << QUEUE_FLAG_SAME_COMP)
 
 void blk_queue_flag_set(unsigned int flag, struct request_queue *q);
 void blk_queue_flag_clear(unsigned int flag, struct request_queue *q);
@@ -1349,7 +1350,7 @@ static inline bool bdev_fua(struct block_device *bdev)
 
 static inline bool bdev_nowait(struct block_device *bdev)
 {
-	return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags);
+	return bdev->bd_disk->queue->limits.features & BLK_FEAT_NOWAIT;
 }
 
 static inline bool bdev_is_zoned(struct block_device *bdev)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 19/26] block: move the nowait flag to queue_limits
  2024-06-11  5:19 ` [PATCH 19/26] block: move the nowait " Christoph Hellwig
@ 2024-06-11  8:16   ` Damien Le Moal
  2024-06-12  5:01     ` Christoph Hellwig
  0 siblings, 1 reply; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:16 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the nowait flag into the queue_limits feature field so that it
> can be set atomically and all I/O is frozen when changing the flag.
> 
> Stacking drivers are simplified in that they now can simply set the
> flag, and blk_stack_limits will clear it when the features is not
> supported by any of the underlying devices.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>


> @@ -1825,9 +1815,7 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
>  	int r;
>  
>  	if (dm_table_supports_nowait(t))
> -		blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
> -	else
> -		blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, q);
> +		limits->features &= ~BLK_FEAT_NOWAIT;

Shouldn't you set the flag here instead of clearing it ?

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 19/26] block: move the nowait flag to queue_limits
  2024-06-11  8:16   ` Damien Le Moal
@ 2024-06-12  5:01     ` Christoph Hellwig
  0 siblings, 0 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-12  5:01 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Tue, Jun 11, 2024 at 05:16:37PM +0900, Damien Le Moal wrote:
> > @@ -1825,9 +1815,7 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
> >  	int r;
> >  
> >  	if (dm_table_supports_nowait(t))
> > -		blk_queue_flag_set(QUEUE_FLAG_NOWAIT, q);
> > -	else
> > -		blk_queue_flag_clear(QUEUE_FLAG_NOWAIT, q);
> > +		limits->features &= ~BLK_FEAT_NOWAIT;
> 
> Shouldn't you set the flag here instead of clearing it ?

No, but the dm_table_supports_nowait check needs to be inverted.
 

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 20/26] block: move the dax flag to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (18 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 19/26] block: move the nowait " Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:17   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 21/26] block: move the poll " Christoph Hellwig
                   ` (5 subsequent siblings)
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the dax flag into the queue_limits feature field so that it
can be set atomically and all I/O is frozen when changing the flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-debugfs.c       | 1 -
 drivers/md/dm-table.c        | 4 ++--
 drivers/nvdimm/pmem.c        | 7 ++-----
 drivers/s390/block/dcssblk.c | 2 +-
 include/linux/blkdev.h       | 6 ++++--
 5 files changed, 9 insertions(+), 11 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 62b132e9a9ce3b..f4fa820251ce83 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -88,7 +88,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(SAME_FORCE),
 	QUEUE_FLAG_NAME(INIT_DONE),
 	QUEUE_FLAG_NAME(POLL),
-	QUEUE_FLAG_NAME(DAX),
 	QUEUE_FLAG_NAME(STATS),
 	QUEUE_FLAG_NAME(REGISTERED),
 	QUEUE_FLAG_NAME(QUIESCED),
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index eee43d27733f9a..d3a960aee03c6a 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1834,11 +1834,11 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 		limits->features |= BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA;
 
 	if (dm_table_supports_dax(t, device_not_dax_capable)) {
-		blk_queue_flag_set(QUEUE_FLAG_DAX, q);
+		limits->features |= BLK_FEAT_DAX;
 		if (dm_table_supports_dax(t, device_not_dax_synchronous_capable))
 			set_dax_synchronous(t->md->dax_dev);
 	} else
-		blk_queue_flag_clear(QUEUE_FLAG_DAX, q);
+		limits->features &= ~BLK_FEAT_DAX;
 
 	if (dm_table_any_dev_attr(t, device_dax_write_cache_enabled, NULL))
 		dax_write_cache(t->md->dax_dev, true);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index b821dcf018f6ae..1dd74c969d5a09 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -465,7 +465,6 @@ static int pmem_attach_disk(struct device *dev,
 	struct dax_device *dax_dev;
 	struct nd_pfn_sb *pfn_sb;
 	struct pmem_device *pmem;
-	struct request_queue *q;
 	struct gendisk *disk;
 	void *addr;
 	int rc;
@@ -499,6 +498,8 @@ static int pmem_attach_disk(struct device *dev,
 	}
 	if (fua)
 		lim.features |= BLK_FEAT_FUA;
+	if (is_nd_pfn(dev))
+		lim.features |= BLK_FEAT_DAX;
 
 	if (!devm_request_mem_region(dev, res->start, resource_size(res),
 				dev_name(&ndns->dev))) {
@@ -509,7 +510,6 @@ static int pmem_attach_disk(struct device *dev,
 	disk = blk_alloc_disk(&lim, nid);
 	if (IS_ERR(disk))
 		return PTR_ERR(disk);
-	q = disk->queue;
 
 	pmem->disk = disk;
 	pmem->pgmap.owner = pmem;
@@ -547,9 +547,6 @@ static int pmem_attach_disk(struct device *dev,
 	}
 	pmem->virt_addr = addr;
 
-	if (pmem->pfn_flags & PFN_MAP)
-		blk_queue_flag_set(QUEUE_FLAG_DAX, q);
-
 	disk->fops		= &pmem_fops;
 	disk->private_data	= pmem;
 	nvdimm_namespace_disk_name(ndns, disk->disk_name);
diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c
index 6d1689a2717e5f..d5a5d11ae0dcdf 100644
--- a/drivers/s390/block/dcssblk.c
+++ b/drivers/s390/block/dcssblk.c
@@ -548,6 +548,7 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
 {
 	struct queue_limits lim = {
 		.logical_block_size	= 4096,
+		.features		= BLK_FEAT_DAX,
 	};
 	int rc, i, j, num_of_segments;
 	struct dcssblk_dev_info *dev_info;
@@ -643,7 +644,6 @@ dcssblk_add_store(struct device *dev, struct device_attribute *attr, const char
 	dev_info->gd->fops = &dcssblk_devops;
 	dev_info->gd->private_data = dev_info;
 	dev_info->gd->flags |= GENHD_FL_NO_PART;
-	blk_queue_flag_set(QUEUE_FLAG_DAX, dev_info->gd->queue);
 
 	seg_byte_size = (dev_info->end - dev_info->start + 1);
 	set_capacity(dev_info->gd, seg_byte_size >> 9); // size in sectors
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 59c2327692589b..c2545580c5b134 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -307,6 +307,9 @@ enum {
 
 	/* supports REQ_NOWAIT */
 	BLK_FEAT_NOWAIT				= (1u << 7),
+
+	/* supports DAX */
+	BLK_FEAT_DAX				= (1u << 8),
 };
 
 /*
@@ -575,7 +578,6 @@ struct request_queue {
 #define QUEUE_FLAG_SAME_FORCE	12	/* force complete on same CPU */
 #define QUEUE_FLAG_INIT_DONE	14	/* queue is initialized */
 #define QUEUE_FLAG_POLL		16	/* IO polling enabled if set */
-#define QUEUE_FLAG_DAX		19	/* device supports DAX */
 #define QUEUE_FLAG_STATS	20	/* track IO start and completion times */
 #define QUEUE_FLAG_REGISTERED	22	/* queue has been registered to a disk */
 #define QUEUE_FLAG_QUIESCED	24	/* queue has been quiesced */
@@ -602,7 +604,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_io_stat(q)	((q)->limits.features & BLK_FEAT_IO_STAT)
 #define blk_queue_zone_resetall(q)	\
 	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
-#define blk_queue_dax(q)	test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags)
+#define blk_queue_dax(q)	((q)->limits.features & BLK_FEAT_DAX)
 #define blk_queue_pci_p2pdma(q)	\
 	test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags)
 #ifdef CONFIG_BLK_RQ_ALLOC_TIME
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 20/26] block: move the dax flag to queue_limits
  2024-06-11  5:19 ` [PATCH 20/26] block: move the dax " Christoph Hellwig
@ 2024-06-11  8:17   ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:17 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the dax flag into the queue_limits feature field so that it
> can be set atomically and all I/O is frozen when changing the flag.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 21/26] block: move the poll flag to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (19 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 20/26] block: move the dax " Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:21   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 22/26] block: move the zoned flag into the feature field Christoph Hellwig
                   ` (4 subsequent siblings)
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the poll flag into the queue_limits feature field so that it
can be set atomically and all I/O is frozen when changing the flag.

Stacking drivers are simplified in that they now can simply set the
flag, and blk_stack_limits will clear it when the features is not
supported by any of the underlying devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-core.c              |  5 ++--
 block/blk-mq-debugfs.c        |  1 -
 block/blk-mq.c                | 31 +++++++++++---------
 block/blk-settings.c          | 10 ++++---
 block/blk-sysfs.c             |  4 +--
 drivers/md/dm-table.c         | 54 +++++++++--------------------------
 drivers/nvme/host/multipath.c | 12 +-------
 include/linux/blkdev.h        |  4 ++-
 8 files changed, 45 insertions(+), 76 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 2b45a4df9a1aa1..8d9fbd353fc7fc 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -791,7 +791,7 @@ void submit_bio_noacct(struct bio *bio)
 		}
 	}
 
-	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (!(q->limits.features & BLK_FEAT_POLL))
 		bio_clear_polled(bio);
 
 	switch (bio_op(bio)) {
@@ -915,8 +915,7 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
 		return 0;
 
 	q = bdev_get_queue(bdev);
-	if (cookie == BLK_QC_T_NONE ||
-	    !test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (cookie == BLK_QC_T_NONE || !(q->limits.features & BLK_FEAT_POLL))
 		return 0;
 
 	blk_flush_plug(current->plug, false);
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index f4fa820251ce83..3a21527913840d 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -87,7 +87,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(NOXMERGES),
 	QUEUE_FLAG_NAME(SAME_FORCE),
 	QUEUE_FLAG_NAME(INIT_DONE),
-	QUEUE_FLAG_NAME(POLL),
 	QUEUE_FLAG_NAME(STATS),
 	QUEUE_FLAG_NAME(REGISTERED),
 	QUEUE_FLAG_NAME(QUIESCED),
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 43235acc87505f..e2b9710ddc5ad1 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4109,6 +4109,12 @@ void blk_mq_release(struct request_queue *q)
 	blk_mq_sysfs_deinit(q);
 }
 
+static bool blk_mq_can_poll(struct blk_mq_tag_set *set)
+{
+	return set->nr_maps > HCTX_TYPE_POLL &&
+		set->map[HCTX_TYPE_POLL].nr_queues;
+}
+
 struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
 		struct queue_limits *lim, void *queuedata)
 {
@@ -4119,6 +4125,8 @@ struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
 	if (!lim)
 		lim = &default_lim;
 	lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
+	if (blk_mq_can_poll(set))
+		lim->features |= BLK_FEAT_POLL;
 
 	q = blk_alloc_queue(lim, set->numa_node);
 	if (IS_ERR(q))
@@ -4273,17 +4281,6 @@ static void blk_mq_realloc_hw_ctxs(struct blk_mq_tag_set *set,
 	mutex_unlock(&q->sysfs_lock);
 }
 
-static void blk_mq_update_poll_flag(struct request_queue *q)
-{
-	struct blk_mq_tag_set *set = q->tag_set;
-
-	if (set->nr_maps > HCTX_TYPE_POLL &&
-	    set->map[HCTX_TYPE_POLL].nr_queues)
-		blk_queue_flag_set(QUEUE_FLAG_POLL, q);
-	else
-		blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
-}
-
 int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 		struct request_queue *q)
 {
@@ -4311,7 +4308,6 @@ int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
 	q->tag_set = set;
 
 	q->queue_flags |= QUEUE_FLAG_MQ_DEFAULT;
-	blk_mq_update_poll_flag(q);
 
 	INIT_DELAYED_WORK(&q->requeue_work, blk_mq_requeue_work);
 	INIT_LIST_HEAD(&q->flush_list);
@@ -4798,8 +4794,10 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 fallback:
 	blk_mq_update_queue_map(set);
 	list_for_each_entry(q, &set->tag_list, tag_set_list) {
+		struct queue_limits lim;
+
 		blk_mq_realloc_hw_ctxs(set, q);
-		blk_mq_update_poll_flag(q);
+
 		if (q->nr_hw_queues != set->nr_hw_queues) {
 			int i = prev_nr_hw_queues;
 
@@ -4811,6 +4809,13 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
 			set->nr_hw_queues = prev_nr_hw_queues;
 			goto fallback;
 		}
+		lim = queue_limits_start_update(q);
+		if (blk_mq_can_poll(set))
+			lim.features |= BLK_FEAT_POLL;
+		else
+			lim.features &= ~BLK_FEAT_POLL;
+		if (queue_limits_commit_update(q, &lim) < 0)
+			pr_warn("updating the poll flag failed\n");
 		blk_mq_map_swqueue(q);
 	}
 
diff --git a/block/blk-settings.c b/block/blk-settings.c
index bf4622c19b5c09..026ba68d829856 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -460,13 +460,15 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->features |= (b->features & BLK_FEAT_INHERIT_MASK);
 
 	/*
-	 * BLK_FEAT_NOWAIT needs to be supported both by the stacking driver
-	 * and all underlying devices.  The stacking driver sets the flag
-	 * before stacking the limits, and this will clear the flag if any
-	 * of the underlying devices does not support it.
+	 * BLK_FEAT_NOWAIT and BLK_FEAT_POLL need to be supported both by the
+	 * stacking driver and all underlying devices.  The stacking driver sets
+	 * the flags before stacking the limits, and this will clear the flags
+	 * if any of the underlying devices does not support it.
 	 */
 	if (!(b->features & BLK_FEAT_NOWAIT))
 		t->features &= ~BLK_FEAT_NOWAIT;
+	if (!(b->features & BLK_FEAT_POLL))
+		t->features &= ~BLK_FEAT_POLL;
 
 	t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
 	t->max_user_sectors = min_not_zero(t->max_user_sectors,
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index cde525724831ef..da4e96d686f91e 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -394,13 +394,13 @@ static ssize_t queue_poll_delay_store(struct request_queue *q, const char *page,
 
 static ssize_t queue_poll_show(struct request_queue *q, char *page)
 {
-	return queue_var_show(test_bit(QUEUE_FLAG_POLL, &q->queue_flags), page);
+	return queue_var_show(q->limits.features & BLK_FEAT_POLL, page);
 }
 
 static ssize_t queue_poll_store(struct request_queue *q, const char *page,
 				size_t count)
 {
-	if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags))
+	if (!(q->limits.features & BLK_FEAT_POLL))
 		return -EINVAL;
 	pr_info_ratelimited("writes to the poll attribute are ignored.\n");
 	pr_info_ratelimited("please use driver specific parameters instead.\n");
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index d3a960aee03c6a..653c253b6f7f32 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -582,7 +582,7 @@ int dm_split_args(int *argc, char ***argvp, char *input)
 static void dm_set_stacking_limits(struct queue_limits *limits)
 {
 	blk_set_stacking_limits(limits);
-	limits->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
+	limits->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT | BLK_FEAT_POLL;
 }
 
 /*
@@ -1024,14 +1024,13 @@ bool dm_table_request_based(struct dm_table *t)
 	return __table_type_request_based(dm_table_get_type(t));
 }
 
-static bool dm_table_supports_poll(struct dm_table *t);
-
 static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device *md)
 {
 	enum dm_queue_mode type = dm_table_get_type(t);
 	unsigned int per_io_data_size = 0, front_pad, io_front_pad;
 	unsigned int min_pool_size = 0, pool_size;
 	struct dm_md_mempools *pools;
+	unsigned int bioset_flags = 0;
 
 	if (unlikely(type == DM_TYPE_NONE)) {
 		DMERR("no table type is set, can't allocate mempools");
@@ -1048,6 +1047,9 @@ static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device *
 		goto init_bs;
 	}
 
+	if (md->queue->limits.features & BLK_FEAT_POLL)
+		bioset_flags |= BIOSET_PERCPU_CACHE;
+
 	for (unsigned int i = 0; i < t->num_targets; i++) {
 		struct dm_target *ti = dm_table_get_target(t, i);
 
@@ -1060,8 +1062,7 @@ static int dm_table_alloc_md_mempools(struct dm_table *t, struct mapped_device *
 
 	io_front_pad = roundup(per_io_data_size,
 		__alignof__(struct dm_io)) + DM_IO_BIO_OFFSET;
-	if (bioset_init(&pools->io_bs, pool_size, io_front_pad,
-			dm_table_supports_poll(t) ? BIOSET_PERCPU_CACHE : 0))
+	if (bioset_init(&pools->io_bs, pool_size, io_front_pad, bioset_flags))
 		goto out_free_pools;
 	if (t->integrity_supported &&
 	    bioset_integrity_create(&pools->io_bs, pool_size))
@@ -1404,14 +1405,6 @@ struct dm_target *dm_table_find_target(struct dm_table *t, sector_t sector)
 	return &t->targets[(KEYS_PER_NODE * n) + k];
 }
 
-static int device_not_poll_capable(struct dm_target *ti, struct dm_dev *dev,
-				   sector_t start, sector_t len, void *data)
-{
-	struct request_queue *q = bdev_get_queue(dev->bdev);
-
-	return !test_bit(QUEUE_FLAG_POLL, &q->queue_flags);
-}
-
 /*
  * type->iterate_devices() should be called when the sanity check needs to
  * iterate and check all underlying data devices. iterate_devices() will
@@ -1459,19 +1452,6 @@ static int count_device(struct dm_target *ti, struct dm_dev *dev,
 	return 0;
 }
 
-static bool dm_table_supports_poll(struct dm_table *t)
-{
-	for (unsigned int i = 0; i < t->num_targets; i++) {
-		struct dm_target *ti = dm_table_get_target(t, i);
-
-		if (!ti->type->iterate_devices ||
-		    ti->type->iterate_devices(ti, device_not_poll_capable, NULL))
-			return false;
-	}
-
-	return true;
-}
-
 /*
  * Check whether a table has no data devices attached using each
  * target's iterate_devices method.
@@ -1817,6 +1797,13 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (dm_table_supports_nowait(t))
 		limits->features &= ~BLK_FEAT_NOWAIT;
 
+	/*
+	 * The current polling impementation does not support request based
+	 * stacking.
+	 */
+	if (!__table_type_bio_based(t->type))
+		limits->features &= ~BLK_FEAT_POLL;
+
 	if (!dm_table_supports_discards(t)) {
 		limits->max_hw_discard_sectors = 0;
 		limits->discard_granularity = 0;
@@ -1858,21 +1845,6 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 		return r;
 
 	dm_update_crypto_profile(q, t);
-
-	/*
-	 * Check for request-based device is left to
-	 * dm_mq_init_request_queue()->blk_mq_init_allocated_queue().
-	 *
-	 * For bio-based device, only set QUEUE_FLAG_POLL when all
-	 * underlying devices supporting polling.
-	 */
-	if (__table_type_bio_based(t->type)) {
-		if (dm_table_supports_poll(t))
-			blk_queue_flag_set(QUEUE_FLAG_POLL, q);
-		else
-			blk_queue_flag_clear(QUEUE_FLAG_POLL, q);
-	}
-
 	return 0;
 }
 
diff --git a/drivers/nvme/host/multipath.c b/drivers/nvme/host/multipath.c
index 61a162c9cf4e6c..4933194d00e592 100644
--- a/drivers/nvme/host/multipath.c
+++ b/drivers/nvme/host/multipath.c
@@ -538,7 +538,7 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 
 	blk_set_stacking_limits(&lim);
 	lim.dma_alignment = 3;
-	lim.features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
+	lim.features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT | BLK_FEAT_POLL;
 	if (head->ids.csi != NVME_CSI_ZNS)
 		lim.max_zone_append_sectors = 0;
 
@@ -549,16 +549,6 @@ int nvme_mpath_alloc_disk(struct nvme_ctrl *ctrl, struct nvme_ns_head *head)
 	head->disk->private_data = head;
 	sprintf(head->disk->disk_name, "nvme%dn%d",
 			ctrl->subsys->instance, head->instance);
-
-	/*
-	 * This assumes all controllers that refer to a namespace either
-	 * support poll queues or not.  That is not a strict guarantee,
-	 * but if the assumption is wrong the effect is only suboptimal
-	 * performance but not correctness problem.
-	 */
-	if (ctrl->tagset->nr_maps > HCTX_TYPE_POLL &&
-	    ctrl->tagset->map[HCTX_TYPE_POLL].nr_queues)
-		blk_queue_flag_set(QUEUE_FLAG_POLL, head->disk->queue);
 	return 0;
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c2545580c5b134..d0db354b12db47 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -310,6 +310,9 @@ enum {
 
 	/* supports DAX */
 	BLK_FEAT_DAX				= (1u << 8),
+
+	/* supports I/O polling */
+	BLK_FEAT_POLL				= (1u << 9),
 };
 
 /*
@@ -577,7 +580,6 @@ struct request_queue {
 #define QUEUE_FLAG_NOXMERGES	9	/* No extended merges */
 #define QUEUE_FLAG_SAME_FORCE	12	/* force complete on same CPU */
 #define QUEUE_FLAG_INIT_DONE	14	/* queue is initialized */
-#define QUEUE_FLAG_POLL		16	/* IO polling enabled if set */
 #define QUEUE_FLAG_STATS	20	/* track IO start and completion times */
 #define QUEUE_FLAG_REGISTERED	22	/* queue has been registered to a disk */
 #define QUEUE_FLAG_QUIESCED	24	/* queue has been quiesced */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 21/26] block: move the poll flag to queue_limits
  2024-06-11  5:19 ` [PATCH 21/26] block: move the poll " Christoph Hellwig
@ 2024-06-11  8:21   ` Damien Le Moal
  2024-06-12  5:03     ` Christoph Hellwig
  0 siblings, 1 reply; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:21 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the poll flag into the queue_limits feature field so that it
> can be set atomically and all I/O is frozen when changing the flag.
> 
> Stacking drivers are simplified in that they now can simply set the
> flag, and blk_stack_limits will clear it when the features is not
> supported by any of the underlying devices.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Kind of the same remark as for io_stat about this not really being a device
feature. But I guess seeing "features" as a queue feature rather than just a
device feature makes it OK to have poll (and io_stat) as a feature rather than
a flag.

So:

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* Re: [PATCH 21/26] block: move the poll flag to queue_limits
  2024-06-11  8:21   ` Damien Le Moal
@ 2024-06-12  5:03     ` Christoph Hellwig
  0 siblings, 0 replies; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-12  5:03 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, Jens Axboe, Geert Uytterhoeven,
	Richard Weinberger, Philipp Reisner, Lars Ellenberg,
	Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On Tue, Jun 11, 2024 at 05:21:07PM +0900, Damien Le Moal wrote:
> Kind of the same remark as for io_stat about this not really being a device
> feature. But I guess seeing "features" as a queue feature rather than just a
> device feature makes it OK to have poll (and io_stat) as a feature rather than
> a flag.

So unlike io_stat this very much is a feature and a feature only as
we don't even allow changing it.  It purely exposes a device (or
rather driver) capability.

^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 22/26] block: move the zoned flag into the feature field
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (20 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 21/26] block: move the poll " Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:23   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 23/26] block: move the zone_resetall flag to queue_limits Christoph Hellwig
                   ` (3 subsequent siblings)
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the boolean zoned field into the flags field to reclaim a little
bit of space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-settings.c           |  5 ++---
 drivers/block/null_blk/zoned.c |  2 +-
 drivers/block/ublk_drv.c       |  2 +-
 drivers/block/virtio_blk.c     |  5 +++--
 drivers/md/dm-table.c          | 11 ++++++-----
 drivers/md/dm-zone.c           |  2 +-
 drivers/md/dm-zoned-target.c   |  2 +-
 drivers/nvme/host/zns.c        |  2 +-
 drivers/scsi/sd_zbc.c          |  4 ++--
 include/linux/blkdev.h         |  9 ++++++---
 10 files changed, 24 insertions(+), 20 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 026ba68d829856..96e07f24bd9aa1 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -68,7 +68,7 @@ static void blk_apply_bdi_limits(struct backing_dev_info *bdi,
 
 static int blk_validate_zoned_limits(struct queue_limits *lim)
 {
-	if (!lim->zoned) {
+	if (!(lim->features & BLK_FEAT_ZONED)) {
 		if (WARN_ON_ONCE(lim->max_open_zones) ||
 		    WARN_ON_ONCE(lim->max_active_zones) ||
 		    WARN_ON_ONCE(lim->zone_write_granularity) ||
@@ -602,8 +602,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 						   b->max_secure_erase_sectors);
 	t->zone_write_granularity = max(t->zone_write_granularity,
 					b->zone_write_granularity);
-	t->zoned = max(t->zoned, b->zoned);
-	if (!t->zoned) {
+	if (!(t->features & BLK_FEAT_ZONED)) {
 		t->zone_write_granularity = 0;
 		t->max_zone_append_sectors = 0;
 	}
diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
index f118d304f31080..ca8e739e76b981 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -158,7 +158,7 @@ int null_init_zoned_dev(struct nullb_device *dev,
 		sector += dev->zone_size_sects;
 	}
 
-	lim->zoned = true;
+	lim->features |= BLK_FEAT_ZONED;
 	lim->chunk_sectors = dev->zone_size_sects;
 	lim->max_zone_append_sectors = dev->zone_append_max_sectors;
 	lim->max_open_zones = dev->zone_max_open;
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 4fcde099935868..69c16018cbb19a 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -2196,7 +2196,7 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
 		if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
 			return -EOPNOTSUPP;
 
-		lim.zoned = true;
+		lim.features |= BLK_FEAT_ZONED;
 		lim.max_active_zones = p->max_active_zones;
 		lim.max_open_zones =  p->max_open_zones;
 		lim.max_zone_append_sectors = p->max_zone_append_sectors;
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 13a2f24f176628..cea45b296f8bec 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -728,7 +728,7 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
 
 	dev_dbg(&vdev->dev, "probing host-managed zoned device\n");
 
-	lim->zoned = true;
+	lim->features |= BLK_FEAT_ZONED;
 
 	virtio_cread(vdev, struct virtio_blk_config,
 		     zoned.max_open_zones, &v);
@@ -1546,7 +1546,8 @@ static int virtblk_probe(struct virtio_device *vdev)
 	 * All steps that follow use the VQs therefore they need to be
 	 * placed after the virtio_device_ready() call above.
 	 */
-	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && lim.zoned) {
+	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
+	    (lim.features & BLK_FEAT_ZONED)) {
 		blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, vblk->disk->queue);
 		err = blk_revalidate_disk_zones(vblk->disk);
 		if (err)
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 653c253b6f7f32..48ccd9a396d8e6 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1605,12 +1605,12 @@ int dm_calculate_queue_limits(struct dm_table *t,
 		ti->type->iterate_devices(ti, dm_set_device_limits,
 					  &ti_limits);
 
-		if (!zoned && ti_limits.zoned) {
+		if (!zoned && (ti_limits.features & BLK_FEAT_ZONED)) {
 			/*
 			 * After stacking all limits, validate all devices
 			 * in table support this zoned model and zone sectors.
 			 */
-			zoned = ti_limits.zoned;
+			zoned = (ti_limits.features & BLK_FEAT_ZONED);
 			zone_sectors = ti_limits.chunk_sectors;
 		}
 
@@ -1658,12 +1658,12 @@ int dm_calculate_queue_limits(struct dm_table *t,
 	 *   zoned model on host-managed zoned block devices.
 	 * BUT...
 	 */
-	if (limits->zoned) {
+	if (limits->features & BLK_FEAT_ZONED) {
 		/*
 		 * ...IF the above limits stacking determined a zoned model
 		 * validate that all of the table's devices conform to it.
 		 */
-		zoned = limits->zoned;
+		zoned = limits->features & BLK_FEAT_ZONED;
 		zone_sectors = limits->chunk_sectors;
 	}
 	if (validate_hardware_zoned(t, zoned, zone_sectors))
@@ -1834,7 +1834,8 @@ int dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	 * For a zoned target, setup the zones related queue attributes
 	 * and resources necessary for zone append emulation if necessary.
 	 */
-	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) && limits->zoned) {
+	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
+	    (limits->features & limits->features & BLK_FEAT_ZONED)) {
 		r = dm_set_zones_restrictions(t, q, limits);
 		if (r)
 			return r;
diff --git a/drivers/md/dm-zone.c b/drivers/md/dm-zone.c
index 5d66d916730efa..88d313229b43ff 100644
--- a/drivers/md/dm-zone.c
+++ b/drivers/md/dm-zone.c
@@ -263,7 +263,7 @@ int dm_set_zones_restrictions(struct dm_table *t, struct request_queue *q,
 	if (nr_conv_zones >= ret) {
 		lim->max_open_zones = 0;
 		lim->max_active_zones = 0;
-		lim->zoned = false;
+		lim->features &= ~BLK_FEAT_ZONED;
 		clear_bit(DMF_EMULATE_ZONE_APPEND, &md->flags);
 		disk->nr_zones = 0;
 		return 0;
diff --git a/drivers/md/dm-zoned-target.c b/drivers/md/dm-zoned-target.c
index 12236e6f46f39c..cd0ee144973f9f 100644
--- a/drivers/md/dm-zoned-target.c
+++ b/drivers/md/dm-zoned-target.c
@@ -1009,7 +1009,7 @@ static void dmz_io_hints(struct dm_target *ti, struct queue_limits *limits)
 	limits->max_sectors = chunk_sectors;
 
 	/* We are exposing a drive-managed zoned block device */
-	limits->zoned = false;
+	limits->features &= ~BLK_FEAT_ZONED;
 }
 
 /*
diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index 77aa0f440a6d2a..06f2417aa50de7 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -108,7 +108,7 @@ int nvme_query_zone_info(struct nvme_ns *ns, unsigned lbaf,
 void nvme_update_zone_info(struct nvme_ns *ns, struct queue_limits *lim,
 		struct nvme_zone_info *zi)
 {
-	lim->zoned = 1;
+	lim->features |= BLK_FEAT_ZONED;
 	lim->max_open_zones = zi->max_open_zones;
 	lim->max_active_zones = zi->max_active_zones;
 	lim->max_zone_append_sectors = ns->ctrl->max_zone_append;
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index e9501db0450be3..26b6e92350cda9 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -599,11 +599,11 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, struct queue_limits *lim,
 	int ret;
 
 	if (!sd_is_zoned(sdkp)) {
-		lim->zoned = false;
+		lim->features &= ~BLK_FEAT_ZONED;
 		return 0;
 	}
 
-	lim->zoned = true;
+	lim->features |= BLK_FEAT_ZONED;
 
 	/*
 	 * Per ZBC and ZAC specifications, writes in sequential write required
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d0db354b12db47..c0e06ff1b24a3d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -313,6 +313,9 @@ enum {
 
 	/* supports I/O polling */
 	BLK_FEAT_POLL				= (1u << 9),
+
+	/* is a zoned device */
+	BLK_FEAT_ZONED				= (1u << 10),
 };
 
 /*
@@ -320,7 +323,7 @@ enum {
  */
 #define BLK_FEAT_INHERIT_MASK \
 	(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA | BLK_FEAT_ROTATIONAL | \
-	 BLK_FEAT_STABLE_WRITES)
+	 BLK_FEAT_STABLE_WRITES | BLK_FEAT_ZONED)
 
 /* internal flags in queue_limits.flags */
 enum {
@@ -372,7 +375,6 @@ struct queue_limits {
 	unsigned char		misaligned;
 	unsigned char		discard_misaligned;
 	unsigned char		raid_partial_stripes_expensive;
-	bool			zoned;
 	unsigned int		max_open_zones;
 	unsigned int		max_active_zones;
 
@@ -654,7 +656,8 @@ static inline enum rpm_status queue_rpm_status(struct request_queue *q)
 
 static inline bool blk_queue_is_zoned(struct request_queue *q)
 {
-	return IS_ENABLED(CONFIG_BLK_DEV_ZONED) && q->limits.zoned;
+	return IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
+		(q->limits.features & BLK_FEAT_ZONED);
 }
 
 #ifdef CONFIG_BLK_DEV_ZONED
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 22/26] block: move the zoned flag into the feature field
  2024-06-11  5:19 ` [PATCH 22/26] block: move the zoned flag into the feature field Christoph Hellwig
@ 2024-06-11  8:23   ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:23 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the boolean zoned field into the flags field to reclaim a little
> bit of space.

Nit: flags -> feature flags

> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 23/26] block: move the zone_resetall flag to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (21 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 22/26] block: move the zoned flag into the feature field Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:24   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 24/26] block: move the pci_p2pdma " Christoph Hellwig
                   ` (2 subsequent siblings)
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the zone_resetall flag into the queue_limits feature field so that
it can be set atomically and all I/O is frozen when changing the flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-debugfs.c         | 1 -
 drivers/block/null_blk/zoned.c | 3 +--
 drivers/block/ublk_drv.c       | 4 +---
 drivers/block/virtio_blk.c     | 3 +--
 drivers/nvme/host/zns.c        | 3 +--
 drivers/scsi/sd_zbc.c          | 5 +----
 include/linux/blkdev.h         | 6 ++++--
 7 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 3a21527913840d..f2fd72f4414ae8 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -91,7 +91,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(REGISTERED),
 	QUEUE_FLAG_NAME(QUIESCED),
 	QUEUE_FLAG_NAME(PCI_P2PDMA),
-	QUEUE_FLAG_NAME(ZONE_RESETALL),
 	QUEUE_FLAG_NAME(RQ_ALLOC_TIME),
 	QUEUE_FLAG_NAME(HCTX_ACTIVE),
 	QUEUE_FLAG_NAME(SQ_SCHED),
diff --git a/drivers/block/null_blk/zoned.c b/drivers/block/null_blk/zoned.c
index ca8e739e76b981..b42c00f1313254 100644
--- a/drivers/block/null_blk/zoned.c
+++ b/drivers/block/null_blk/zoned.c
@@ -158,7 +158,7 @@ int null_init_zoned_dev(struct nullb_device *dev,
 		sector += dev->zone_size_sects;
 	}
 
-	lim->features |= BLK_FEAT_ZONED;
+	lim->features |= BLK_FEAT_ZONED | BLK_FEAT_ZONE_RESETALL;
 	lim->chunk_sectors = dev->zone_size_sects;
 	lim->max_zone_append_sectors = dev->zone_append_max_sectors;
 	lim->max_open_zones = dev->zone_max_open;
@@ -171,7 +171,6 @@ int null_register_zoned_dev(struct nullb *nullb)
 	struct request_queue *q = nullb->q;
 	struct gendisk *disk = nullb->disk;
 
-	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
 	disk->nr_zones = bdev_nr_zones(disk->part0);
 
 	pr_info("%s: using %s zone append\n",
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 69c16018cbb19a..4fdff13fc23b8a 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -248,8 +248,6 @@ static int ublk_dev_param_zoned_validate(const struct ublk_device *ub)
 
 static void ublk_dev_param_zoned_apply(struct ublk_device *ub)
 {
-	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, ub->ub_disk->queue);
-
 	ub->ub_disk->nr_zones = ublk_get_nr_zones(ub);
 }
 
@@ -2196,7 +2194,7 @@ static int ublk_ctrl_start_dev(struct ublk_device *ub, struct io_uring_cmd *cmd)
 		if (!IS_ENABLED(CONFIG_BLK_DEV_ZONED))
 			return -EOPNOTSUPP;
 
-		lim.features |= BLK_FEAT_ZONED;
+		lim.features |= BLK_FEAT_ZONED | BLK_FEAT_ZONE_RESETALL;
 		lim.max_active_zones = p->max_active_zones;
 		lim.max_open_zones =  p->max_open_zones;
 		lim.max_zone_append_sectors = p->max_zone_append_sectors;
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index cea45b296f8bec..6c64a67ab9c901 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -728,7 +728,7 @@ static int virtblk_read_zoned_limits(struct virtio_blk *vblk,
 
 	dev_dbg(&vdev->dev, "probing host-managed zoned device\n");
 
-	lim->features |= BLK_FEAT_ZONED;
+	lim->features |= BLK_FEAT_ZONED | BLK_FEAT_ZONE_RESETALL;
 
 	virtio_cread(vdev, struct virtio_blk_config,
 		     zoned.max_open_zones, &v);
@@ -1548,7 +1548,6 @@ static int virtblk_probe(struct virtio_device *vdev)
 	 */
 	if (IS_ENABLED(CONFIG_BLK_DEV_ZONED) &&
 	    (lim.features & BLK_FEAT_ZONED)) {
-		blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, vblk->disk->queue);
 		err = blk_revalidate_disk_zones(vblk->disk);
 		if (err)
 			goto out_cleanup_disk;
diff --git a/drivers/nvme/host/zns.c b/drivers/nvme/host/zns.c
index 06f2417aa50de7..99bb89c2495ae3 100644
--- a/drivers/nvme/host/zns.c
+++ b/drivers/nvme/host/zns.c
@@ -108,13 +108,12 @@ int nvme_query_zone_info(struct nvme_ns *ns, unsigned lbaf,
 void nvme_update_zone_info(struct nvme_ns *ns, struct queue_limits *lim,
 		struct nvme_zone_info *zi)
 {
-	lim->features |= BLK_FEAT_ZONED;
+	lim->features |= BLK_FEAT_ZONED | BLK_FEAT_ZONE_RESETALL;
 	lim->max_open_zones = zi->max_open_zones;
 	lim->max_active_zones = zi->max_active_zones;
 	lim->max_zone_append_sectors = ns->ctrl->max_zone_append;
 	lim->chunk_sectors = ns->head->zsze =
 		nvme_lba_to_sect(ns->head, zi->zone_size);
-	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, ns->queue);
 }
 
 static void *nvme_zns_alloc_report_buffer(struct nvme_ns *ns,
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 26b6e92350cda9..8c79f588f80d8b 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -592,8 +592,6 @@ int sd_zbc_revalidate_zones(struct scsi_disk *sdkp)
 int sd_zbc_read_zones(struct scsi_disk *sdkp, struct queue_limits *lim,
 		u8 buf[SD_BUF_SIZE])
 {
-	struct gendisk *disk = sdkp->disk;
-	struct request_queue *q = disk->queue;
 	unsigned int nr_zones;
 	u32 zone_blocks = 0;
 	int ret;
@@ -603,7 +601,7 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, struct queue_limits *lim,
 		return 0;
 	}
 
-	lim->features |= BLK_FEAT_ZONED;
+	lim->features |= BLK_FEAT_ZONED | BLK_FEAT_ZONE_RESETALL;
 
 	/*
 	 * Per ZBC and ZAC specifications, writes in sequential write required
@@ -632,7 +630,6 @@ int sd_zbc_read_zones(struct scsi_disk *sdkp, struct queue_limits *lim,
 	sdkp->early_zone_info.zone_blocks = zone_blocks;
 
 	/* The drive satisfies the kernel restrictions: set it up */
-	blk_queue_flag_set(QUEUE_FLAG_ZONE_RESETALL, q);
 	if (sdkp->zones_max_open == U32_MAX)
 		lim->max_open_zones = 0;
 	else
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index c0e06ff1b24a3d..ffb7a42871b4ed 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -316,6 +316,9 @@ enum {
 
 	/* is a zoned device */
 	BLK_FEAT_ZONED				= (1u << 10),
+
+	/* supports Zone Reset All */
+	BLK_FEAT_ZONE_RESETALL			= (1u << 11),
 };
 
 /*
@@ -586,7 +589,6 @@ struct request_queue {
 #define QUEUE_FLAG_REGISTERED	22	/* queue has been registered to a disk */
 #define QUEUE_FLAG_QUIESCED	24	/* queue has been quiesced */
 #define QUEUE_FLAG_PCI_P2PDMA	25	/* device supports PCI p2p requests */
-#define QUEUE_FLAG_ZONE_RESETALL 26	/* supports Zone Reset All */
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
@@ -607,7 +609,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_nonrot(q)	((q)->limits.features & BLK_FEAT_ROTATIONAL)
 #define blk_queue_io_stat(q)	((q)->limits.features & BLK_FEAT_IO_STAT)
 #define blk_queue_zone_resetall(q)	\
-	test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags)
+	((q)->limits.features & BLK_FEAT_ZONE_RESETALL)
 #define blk_queue_dax(q)	((q)->limits.features & BLK_FEAT_DAX)
 #define blk_queue_pci_p2pdma(q)	\
 	test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 23/26] block: move the zone_resetall flag to queue_limits
  2024-06-11  5:19 ` [PATCH 23/26] block: move the zone_resetall flag to queue_limits Christoph Hellwig
@ 2024-06-11  8:24   ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:24 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the zone_resetall flag into the queue_limits feature field so that
> it can be set atomically and all I/O is frozen when changing the flag.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 24/26] block: move the pci_p2pdma flag to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (22 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 23/26] block: move the zone_resetall flag to queue_limits Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:24   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 25/26] block: move the skip_tagset_quiesce " Christoph Hellwig
  2024-06-11  5:19 ` [PATCH 26/26] block: move the bounce flag into the feature field Christoph Hellwig
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the pci_p2pdma flag into the queue_limits feature field so that it
can be set atomically and all I/O is frozen when changing the flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-debugfs.c   | 1 -
 drivers/nvme/host/core.c | 8 +++-----
 include/linux/blkdev.h   | 7 ++++---
 3 files changed, 7 insertions(+), 9 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index f2fd72f4414ae8..8b5a68861c119b 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -90,7 +90,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(STATS),
 	QUEUE_FLAG_NAME(REGISTERED),
 	QUEUE_FLAG_NAME(QUIESCED),
-	QUEUE_FLAG_NAME(PCI_P2PDMA),
 	QUEUE_FLAG_NAME(RQ_ALLOC_TIME),
 	QUEUE_FLAG_NAME(HCTX_ACTIVE),
 	QUEUE_FLAG_NAME(SQ_SCHED),
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5ecf762d7c8837..31e752e8d632cd 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -3735,6 +3735,9 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 
 	if (ctrl->opts && ctrl->opts->data_digest)
 		lim.features |= BLK_FEAT_STABLE_WRITES;
+	if (ctrl->ops->supports_pci_p2pdma &&
+	    ctrl->ops->supports_pci_p2pdma(ctrl))
+		lim.features |= BLK_FEAT_PCI_P2PDMA;
 
 	disk = blk_mq_alloc_disk(ctrl->tagset, &lim, ns);
 	if (IS_ERR(disk))
@@ -3744,11 +3747,6 @@ static void nvme_alloc_ns(struct nvme_ctrl *ctrl, struct nvme_ns_info *info)
 
 	ns->disk = disk;
 	ns->queue = disk->queue;
-
-	if (ctrl->ops->supports_pci_p2pdma &&
-	    ctrl->ops->supports_pci_p2pdma(ctrl))
-		blk_queue_flag_set(QUEUE_FLAG_PCI_P2PDMA, ns->queue);
-
 	ns->ctrl = ctrl;
 	kref_init(&ns->kref);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index ffb7a42871b4ed..cc4f6e64e8e3f5 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -319,6 +319,9 @@ enum {
 
 	/* supports Zone Reset All */
 	BLK_FEAT_ZONE_RESETALL			= (1u << 11),
+
+	/* supports PCI(e) p2p requests */
+	BLK_FEAT_PCI_P2PDMA			= (1u << 12),
 };
 
 /*
@@ -588,7 +591,6 @@ struct request_queue {
 #define QUEUE_FLAG_STATS	20	/* track IO start and completion times */
 #define QUEUE_FLAG_REGISTERED	22	/* queue has been registered to a disk */
 #define QUEUE_FLAG_QUIESCED	24	/* queue has been quiesced */
-#define QUEUE_FLAG_PCI_P2PDMA	25	/* device supports PCI p2p requests */
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
@@ -611,8 +613,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_zone_resetall(q)	\
 	((q)->limits.features & BLK_FEAT_ZONE_RESETALL)
 #define blk_queue_dax(q)	((q)->limits.features & BLK_FEAT_DAX)
-#define blk_queue_pci_p2pdma(q)	\
-	test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags)
+#define blk_queue_pci_p2pdma(q)	((q)->limits.features & BLK_FEAT_PCI_P2PDMA)
 #ifdef CONFIG_BLK_RQ_ALLOC_TIME
 #define blk_queue_rq_alloc_time(q)	\
 	test_bit(QUEUE_FLAG_RQ_ALLOC_TIME, &(q)->queue_flags)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 24/26] block: move the pci_p2pdma flag to queue_limits
  2024-06-11  5:19 ` [PATCH 24/26] block: move the pci_p2pdma " Christoph Hellwig
@ 2024-06-11  8:24   ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:24 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the pci_p2pdma flag into the queue_limits feature field so that it
> can be set atomically and all I/O is frozen when changing the flag.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 25/26] block: move the skip_tagset_quiesce flag to queue_limits
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (23 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 24/26] block: move the pci_p2pdma " Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:25   ` Damien Le Moal
  2024-06-11  5:19 ` [PATCH 26/26] block: move the bounce flag into the feature field Christoph Hellwig
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the skip_tagset_quiesce flag into the queue_limits feature field so
that it can be set atomically and all I/O is frozen when changing the
flag.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-mq-debugfs.c   | 1 -
 drivers/nvme/host/core.c | 8 +++++---
 include/linux/blkdev.h   | 6 ++++--
 3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 8b5a68861c119b..344f9e503bdb32 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -93,7 +93,6 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(RQ_ALLOC_TIME),
 	QUEUE_FLAG_NAME(HCTX_ACTIVE),
 	QUEUE_FLAG_NAME(SQ_SCHED),
-	QUEUE_FLAG_NAME(SKIP_TAGSET_QUIESCE),
 };
 #undef QUEUE_FLAG_NAME
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 31e752e8d632cd..bf410d10b12006 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -4489,13 +4489,15 @@ int nvme_alloc_io_tag_set(struct nvme_ctrl *ctrl, struct blk_mq_tag_set *set,
 		return ret;
 
 	if (ctrl->ops->flags & NVME_F_FABRICS) {
-		ctrl->connect_q = blk_mq_alloc_queue(set, NULL, NULL);
+		struct queue_limits lim = {
+			.features	= BLK_FEAT_SKIP_TAGSET_QUIESCE,
+		};
+
+		ctrl->connect_q = blk_mq_alloc_queue(set, &lim, NULL);
         	if (IS_ERR(ctrl->connect_q)) {
 			ret = PTR_ERR(ctrl->connect_q);
 			goto out_free_tag_set;
 		}
-		blk_queue_flag_set(QUEUE_FLAG_SKIP_TAGSET_QUIESCE,
-				   ctrl->connect_q);
 	}
 
 	ctrl->tagset = set;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index cc4f6e64e8e3f5..d7ad25def6e50b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -322,6 +322,9 @@ enum {
 
 	/* supports PCI(e) p2p requests */
 	BLK_FEAT_PCI_P2PDMA			= (1u << 12),
+
+	/* skip this queue in blk_mq_(un)quiesce_tagset */
+	BLK_FEAT_SKIP_TAGSET_QUIESCE		= (1u << 13),
 };
 
 /*
@@ -594,7 +597,6 @@ struct request_queue {
 #define QUEUE_FLAG_RQ_ALLOC_TIME 27	/* record rq->alloc_time_ns */
 #define QUEUE_FLAG_HCTX_ACTIVE	28	/* at least one blk-mq hctx is active */
 #define QUEUE_FLAG_SQ_SCHED     30	/* single queue style io dispatch */
-#define QUEUE_FLAG_SKIP_TAGSET_QUIESCE	31 /* quiesce_tagset skip the queue*/
 
 #define QUEUE_FLAG_MQ_DEFAULT	(1UL << QUEUE_FLAG_SAME_COMP)
 
@@ -629,7 +631,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q);
 #define blk_queue_registered(q)	test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags)
 #define blk_queue_sq_sched(q)	test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags)
 #define blk_queue_skip_tagset_quiesce(q) \
-	test_bit(QUEUE_FLAG_SKIP_TAGSET_QUIESCE, &(q)->queue_flags)
+	((q)->limits.features & BLK_FEAT_SKIP_TAGSET_QUIESCE)
 
 extern void blk_set_pm_only(struct request_queue *q);
 extern void blk_clear_pm_only(struct request_queue *q);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 25/26] block: move the skip_tagset_quiesce flag to queue_limits
  2024-06-11  5:19 ` [PATCH 25/26] block: move the skip_tagset_quiesce " Christoph Hellwig
@ 2024-06-11  8:25   ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:25 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the skip_tagset_quiesce flag into the queue_limits feature field so
> that it can be set atomically and all I/O is frozen when changing the
> flag.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

* [PATCH 26/26] block: move the bounce flag into the feature field
  2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
                   ` (24 preceding siblings ...)
  2024-06-11  5:19 ` [PATCH 25/26] block: move the skip_tagset_quiesce " Christoph Hellwig
@ 2024-06-11  5:19 ` Christoph Hellwig
  2024-06-11  8:26   ` Damien Le Moal
  25 siblings, 1 reply; 104+ messages in thread
From: Christoph Hellwig @ 2024-06-11  5:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

Move the bounce field into the flags field to reclaim a little bit of
space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 block/blk-settings.c    | 1 -
 block/blk.h             | 2 +-
 drivers/scsi/scsi_lib.c | 2 +-
 include/linux/blkdev.h  | 6 ++++--
 4 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 96e07f24bd9aa1..d0e9096f93ca8a 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -479,7 +479,6 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 					b->max_write_zeroes_sectors);
 	t->max_zone_append_sectors = min(queue_limits_max_zone_append_sectors(t),
 					 queue_limits_max_zone_append_sectors(b));
-	t->bounce = max(t->bounce, b->bounce);
 
 	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
 					    b->seg_boundary_mask);
diff --git a/block/blk.h b/block/blk.h
index 79e8d5d4fe0caf..fa32f7fad5d7e6 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -394,7 +394,7 @@ struct bio *__blk_queue_bounce(struct bio *bio, struct request_queue *q);
 static inline bool blk_queue_may_bounce(struct request_queue *q)
 {
 	return IS_ENABLED(CONFIG_BOUNCE) &&
-		q->limits.bounce == BLK_BOUNCE_HIGH &&
+		(q->limits.features & BLK_FEAT_BOUNCE_HIGH) &&
 		max_low_pfn >= max_pfn;
 }
 
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 54f771ec8cfb5e..e2f7bfb2b9e450 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1986,7 +1986,7 @@ void scsi_init_limits(struct Scsi_Host *shost, struct queue_limits *lim)
 		shost->dma_alignment, dma_get_cache_alignment() - 1);
 
 	if (shost->no_highmem)
-		lim->bounce = BLK_BOUNCE_HIGH;
+		lim->features |= BLK_FEAT_BOUNCE_HIGH;
 
 	dma_set_seg_boundary(dev, shost->dma_boundary);
 	dma_set_max_seg_size(dev, shost->max_segment_size);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d7ad25def6e50b..d1d9787e76ce73 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -325,6 +325,9 @@ enum {
 
 	/* skip this queue in blk_mq_(un)quiesce_tagset */
 	BLK_FEAT_SKIP_TAGSET_QUIESCE		= (1u << 13),
+
+	/* bounce all highmem pages */
+	BLK_FEAT_BOUNCE_HIGH			= (1u << 14),
 };
 
 /*
@@ -332,7 +335,7 @@ enum {
  */
 #define BLK_FEAT_INHERIT_MASK \
 	(BLK_FEAT_WRITE_CACHE | BLK_FEAT_FUA | BLK_FEAT_ROTATIONAL | \
-	 BLK_FEAT_STABLE_WRITES | BLK_FEAT_ZONED)
+	 BLK_FEAT_STABLE_WRITES | BLK_FEAT_ZONED | BLK_FEAT_BOUNCE_HIGH)
 
 /* internal flags in queue_limits.flags */
 enum {
@@ -352,7 +355,6 @@ enum blk_bounce {
 struct queue_limits {
 	unsigned int		features;
 	unsigned int		flags;
-	enum blk_bounce		bounce;
 	unsigned long		seg_boundary_mask;
 	unsigned long		virt_boundary_mask;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 104+ messages in thread

* Re: [PATCH 26/26] block: move the bounce flag into the feature field
  2024-06-11  5:19 ` [PATCH 26/26] block: move the bounce flag into the feature field Christoph Hellwig
@ 2024-06-11  8:26   ` Damien Le Moal
  0 siblings, 0 replies; 104+ messages in thread
From: Damien Le Moal @ 2024-06-11  8:26 UTC (permalink / raw)
  To: Christoph Hellwig, Jens Axboe
  Cc: Geert Uytterhoeven, Richard Weinberger, Philipp Reisner,
	Lars Ellenberg, Christoph Böhmwalder, Josef Bacik, Ming Lei,
	Michael S. Tsirkin, Jason Wang, Roger Pau Monné,
	Alasdair Kergon, Mike Snitzer, Mikulas Patocka, Song Liu, Yu Kuai,
	Vineeth Vijayan, Martin K. Petersen, linux-m68k, linux-um,
	drbd-dev, nbd, linuxppc-dev, ceph-devel, virtualization,
	xen-devel, linux-bcache, dm-devel, linux-raid, linux-mmc,
	linux-mtd, nvdimm, linux-nvme, linux-s390, linux-scsi,
	linux-block

On 6/11/24 2:19 PM, Christoph Hellwig wrote:
> Move the bounce field into the flags field to reclaim a little bit of

s/flags/feature

> space.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>

Reviewed-by: Damien Le Moal <dlemoal@kernel.org>

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 104+ messages in thread

end of thread, other threads:[~2024-06-17  6:03 UTC | newest]

Thread overview: 104+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-11  5:19 move features flags into queue_limits Christoph Hellwig
2024-06-11  5:19 ` [PATCH 01/26] sd: fix sd_is_zoned Christoph Hellwig
2024-06-11  5:46   ` Damien Le Moal
2024-06-11  8:11   ` Hannes Reinecke
2024-06-11 10:50   ` Johannes Thumshirn
2024-06-11 19:18   ` Bart Van Assche
2024-06-11  5:19 ` [PATCH 02/26] sd: move zone limits setup out of sd_read_block_characteristics Christoph Hellwig
2024-06-11  5:51   ` Damien Le Moal
2024-06-11  5:52     ` Christoph Hellwig
2024-06-11  5:54       ` Christoph Hellwig
2024-06-11  7:25         ` Damien Le Moal
2024-06-11  7:20       ` Damien Le Moal
2024-06-12  4:45     ` Christoph Hellwig
2024-06-13  9:39     ` Christoph Hellwig
2024-06-16 23:01       ` Damien Le Moal
2024-06-17  4:53         ` Christoph Hellwig
2024-06-17  6:03           ` Damien Le Moal
2024-06-11  8:12   ` Hannes Reinecke
2024-06-11  5:19 ` [PATCH 03/26] loop: stop using loop_reconfigure_limits in __loop_clr_fd Christoph Hellwig
2024-06-11  5:53   ` Damien Le Moal
2024-06-11  5:54     ` Christoph Hellwig
2024-06-11  8:14   ` Hannes Reinecke
2024-06-11 19:21   ` Bart Van Assche
2024-06-11  5:19 ` [PATCH 04/26] loop: always update discard settings in loop_reconfigure_limits Christoph Hellwig
2024-06-11  5:54   ` Damien Le Moal
2024-06-11  8:15   ` Hannes Reinecke
2024-06-11 19:23   ` Bart Van Assche
2024-06-11  5:19 ` [PATCH 05/26] loop: regularize upgrading the lock size for direct I/O Christoph Hellwig
2024-06-11  5:56   ` Damien Le Moal
2024-06-11  5:59     ` Christoph Hellwig
2024-06-11  8:16   ` Hannes Reinecke
2024-06-11 19:27   ` Bart Van Assche
2024-06-11  5:19 ` [PATCH 06/26] loop: also use the default block size from an underlying block device Christoph Hellwig
2024-06-11  5:58   ` Damien Le Moal
2024-06-11  5:59     ` Christoph Hellwig
2024-06-11  8:17   ` Hannes Reinecke
2024-06-11 19:28   ` Bart Van Assche
2024-06-11  5:19 ` [PATCH 07/26] loop: fold loop_update_rotational into loop_reconfigure_limits Christoph Hellwig
2024-06-11  6:00   ` Damien Le Moal
2024-06-11  8:18   ` Hannes Reinecke
2024-06-11 19:31   ` Bart Van Assche
2024-06-11  5:19 ` [PATCH 08/26] virtio_blk: remove virtblk_update_cache_mode Christoph Hellwig
2024-06-11  7:26   ` Damien Le Moal
2024-06-11  8:19   ` Hannes Reinecke
2024-06-11 11:49   ` Johannes Thumshirn
2024-06-11 15:43   ` Stefan Hajnoczi
2024-06-11 19:32   ` Bart Van Assche
2024-06-11  5:19 ` [PATCH 09/26] nbd: move setting the cache control flags to __nbd_set_size Christoph Hellwig
2024-06-11  7:28   ` Damien Le Moal
2024-06-11  8:20   ` Hannes Reinecke
2024-06-11 16:50   ` Josef Bacik
2024-06-11 19:34   ` Bart Van Assche
2024-06-11  5:19 ` [PATCH 10/26] xen-blkfront: don't disable cache flushes when they fail Christoph Hellwig
2024-06-11  7:30   ` Damien Le Moal
2024-06-12  4:50     ` Christoph Hellwig
2024-06-11  8:21   ` Hannes Reinecke
2024-06-12  8:01   ` Roger Pau Monné
2024-06-12 15:00     ` Christoph Hellwig
2024-06-12 15:56       ` Roger Pau Monné
2024-06-13 14:05         ` Christoph Hellwig
2024-06-14  7:56           ` Roger Pau Monné
2024-06-11  5:19 ` [PATCH 11/26] block: freeze the queue in queue_attr_store Christoph Hellwig
2024-06-11  7:32   ` Damien Le Moal
2024-06-11  8:22   ` Hannes Reinecke
2024-06-11 19:36   ` Bart Van Assche
2024-06-11  5:19 ` [PATCH 12/26] block: remove blk_flush_policy Christoph Hellwig
2024-06-11  7:33   ` Damien Le Moal
2024-06-11  8:23   ` Hannes Reinecke
2024-06-11 19:37   ` Bart Van Assche
2024-06-11  5:19 ` [PATCH 13/26] block: move cache control settings out of queue->flags Christoph Hellwig
2024-06-11  7:55   ` Damien Le Moal
2024-06-12  4:54     ` Christoph Hellwig
2024-06-11  9:58   ` Hannes Reinecke
2024-06-12  4:52     ` Christoph Hellwig
2024-06-12 14:53   ` Ulf Hansson
2024-06-11  5:19 ` [PATCH 14/26] block: move the nonrot flag to queue_limits Christoph Hellwig
2024-06-11  8:02   ` Damien Le Moal
2024-06-11  5:19 ` [PATCH 15/26] block: move the add_random " Christoph Hellwig
2024-06-11  8:06   ` Damien Le Moal
2024-06-11  5:19 ` [PATCH 16/26] block: move the io_stat flag setting " Christoph Hellwig
2024-06-11  8:09   ` Damien Le Moal
2024-06-12  4:58     ` Christoph Hellwig
2024-06-11  5:19 ` [PATCH 17/26] block: move the stable_write flag " Christoph Hellwig
2024-06-11  8:12   ` Damien Le Moal
2024-06-11  5:19 ` [PATCH 18/26] block: move the synchronous " Christoph Hellwig
2024-06-11  8:13   ` Damien Le Moal
2024-06-11  5:19 ` [PATCH 19/26] block: move the nowait " Christoph Hellwig
2024-06-11  8:16   ` Damien Le Moal
2024-06-12  5:01     ` Christoph Hellwig
2024-06-11  5:19 ` [PATCH 20/26] block: move the dax " Christoph Hellwig
2024-06-11  8:17   ` Damien Le Moal
2024-06-11  5:19 ` [PATCH 21/26] block: move the poll " Christoph Hellwig
2024-06-11  8:21   ` Damien Le Moal
2024-06-12  5:03     ` Christoph Hellwig
2024-06-11  5:19 ` [PATCH 22/26] block: move the zoned flag into the feature field Christoph Hellwig
2024-06-11  8:23   ` Damien Le Moal
2024-06-11  5:19 ` [PATCH 23/26] block: move the zone_resetall flag to queue_limits Christoph Hellwig
2024-06-11  8:24   ` Damien Le Moal
2024-06-11  5:19 ` [PATCH 24/26] block: move the pci_p2pdma " Christoph Hellwig
2024-06-11  8:24   ` Damien Le Moal
2024-06-11  5:19 ` [PATCH 25/26] block: move the skip_tagset_quiesce " Christoph Hellwig
2024-06-11  8:25   ` Damien Le Moal
2024-06-11  5:19 ` [PATCH 26/26] block: move the bounce flag into the feature field Christoph Hellwig
2024-06-11  8:26   ` Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).