From: Damien Le Moal <dlemoal@kernel.org>
To: Bart Van Assche <bvanassche@acm.org>, Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org, linux-scsi@vger.kernel.org,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v23 01/16] block: Support block devices that preserve the order of write requests
Date: Tue, 12 Aug 2025 11:12:44 +0900 [thread overview]
Message-ID: <7570f60f-932b-4b76-a87d-8f3f0760c44f@kernel.org> (raw)
In-Reply-To: <20250811200851.626402-2-bvanassche@acm.org>
On 8/12/25 5:08 AM, Bart Van Assche wrote:
> Some storage controllers preserve the request order per hardware queue.
> Some but not all device mapper drivers preserve the bio order. Introduce
> the feature flag BLK_FEAT_ORDERED_HWQ to allow block drivers and stacked
> drivers to indicate that the order of write commands is preserved per
> hardware queue and hence that serialization of writes per zone is not
> required if all pending writes are submitted to the same hardware queue.
> Add a sysfs attribute for controlling write pipelining support.
Why ? Why would you want to disable write pipelining since it give better
performance ?
The commit message also does not describe BLK_FEAT_PIPELINE_ZWR, but I think
this enable/disable flag is not needed.
>
> Cc: Damien Le Moal <dlemoal@kernel.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> ---
> Documentation/ABI/stable/sysfs-block | 15 +++++++++++++++
> block/blk-settings.c | 10 ++++++++++
> block/blk-sysfs.c | 7 +++++++
> include/linux/blkdev.h | 9 +++++++++
> 4 files changed, 41 insertions(+)
>
> diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
> index 803f578dc023..5a42d99cf39a 100644
> --- a/Documentation/ABI/stable/sysfs-block
> +++ b/Documentation/ABI/stable/sysfs-block
> @@ -637,6 +637,21 @@ Description:
> I/O size is reported this file contains 0.
>
>
> +What: /sys/block/<disk>/queue/pipeline_zoned_writes
> +Date: August 2025
> +Contact: Bart Van Assche <bvanassche@acm.org>
> +Description:
> + [RW] If this attribute is present it means that the block driver
> + and the storage controller both support preserving the order of
> + zoned writes per hardware queue. This attribute controls whether
> + or not pipelining zoned writes is enabled. If the value of this
> + attribute is zero, the block layer restricts the queue depth for
> + sequential writes per zone to one (zone append operations are
> + not affected). If the value of this attribute is one, the block
> + layer does not restrict the queue depth of sequential writes per
> + zone to one.
> +
> +
> What: /sys/block/<disk>/queue/physical_block_size
> Date: May 2009
> Contact: Martin K. Petersen <martin.petersen@oracle.com>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 07874e9b609f..01c0edf2308a 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -119,6 +119,14 @@ static int blk_validate_zoned_limits(struct queue_limits *lim)
> lim->max_zone_append_sectors =
> min_not_zero(lim->max_hw_zone_append_sectors,
> min(lim->chunk_sectors, lim->max_hw_sectors));
> +
> + /*
> + * If both the block driver and the block device preserve the write
> + * order per hwq, enable zoned write pipelining.
> + */
> + if (lim->features & BLK_FEAT_ORDERED_HWQ)
> + lim->features |= BLK_FEAT_PIPELINE_ZWR;
> +
> return 0;
> }
>
> @@ -780,6 +788,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
> t->features &= ~BLK_FEAT_NOWAIT;
> if (!(b->features & BLK_FEAT_POLL))
> t->features &= ~BLK_FEAT_POLL;
> + if (!(b->features & BLK_FEAT_ORDERED_HWQ))
> + t->features &= ~BLK_FEAT_ORDERED_HWQ;
>
> t->flags |= (b->flags & BLK_FLAG_MISALIGNED);
>
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index 78ee8d324c7f..4bf0b663f25d 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -270,6 +270,7 @@ QUEUE_SYSFS_FEATURE(rotational, BLK_FEAT_ROTATIONAL)
> QUEUE_SYSFS_FEATURE(add_random, BLK_FEAT_ADD_RANDOM)
> QUEUE_SYSFS_FEATURE(iostats, BLK_FEAT_IO_STAT)
> QUEUE_SYSFS_FEATURE(stable_writes, BLK_FEAT_STABLE_WRITES);
> +QUEUE_SYSFS_FEATURE(pipeline_zwr, BLK_FEAT_PIPELINE_ZWR);
>
> #define QUEUE_SYSFS_FEATURE_SHOW(_name, _feature) \
> static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \
> @@ -554,6 +555,7 @@ QUEUE_LIM_RO_ENTRY(queue_dax, "dax");
> QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout");
> QUEUE_LIM_RO_ENTRY(queue_virt_boundary_mask, "virt_boundary_mask");
> QUEUE_LIM_RO_ENTRY(queue_dma_alignment, "dma_alignment");
> +QUEUE_LIM_RW_ENTRY(queue_pipeline_zwr, "pipeline_zoned_writes");
>
> /* legacy alias for logical_block_size: */
> static struct queue_sysfs_entry queue_hw_sector_size_entry = {
> @@ -700,6 +702,7 @@ static struct attribute *queue_attrs[] = {
> &queue_dax_entry.attr,
> &queue_virt_boundary_mask_entry.attr,
> &queue_dma_alignment_entry.attr,
> + &queue_pipeline_zwr_entry.attr,
> &queue_ra_entry.attr,
>
> /*
> @@ -746,6 +749,10 @@ static umode_t queue_attr_visible(struct kobject *kobj, struct attribute *attr,
> !blk_queue_is_zoned(q))
> return 0;
>
> + if (attr == &queue_pipeline_zwr_entry.attr &&
> + !(q->limits.features & BLK_FEAT_ORDERED_HWQ))
> + return 0;
> +
> return attr->mode;
> }
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 95886b404b16..79d14b3d3309 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -338,6 +338,15 @@ typedef unsigned int __bitwise blk_features_t;
> /* skip this queue in blk_mq_(un)quiesce_tagset */
> #define BLK_FEAT_SKIP_TAGSET_QUIESCE ((__force blk_features_t)(1u << 13))
>
> +/*
> + * The request order is preserved per hardware queue by the block driver and by
> + * the block device. Set by the block driver.
> + */
> +#define BLK_FEAT_ORDERED_HWQ ((__force blk_features_t)(1u << 14))
> +
> +/* Whether to pipeline zoned writes. Controlled by the block layer. */
> +#define BLK_FEAT_PIPELINE_ZWR ((__force blk_features_t)(1u << 15))
> +
> /* undocumented magic for bcache */
> #define BLK_FEAT_RAID_PARTIAL_STRIPES_EXPENSIVE \
> ((__force blk_features_t)(1u << 15))
--
Damien Le Moal
Western Digital Research
next prev parent reply other threads:[~2025-08-12 2:15 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-11 20:08 [PATCH v23 00/16] Improve write performance for zoned UFS devices Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 01/16] block: Support block devices that preserve the order of write requests Bart Van Assche
2025-08-12 2:12 ` Damien Le Moal [this message]
2025-08-12 23:57 ` Bart Van Assche
2025-08-14 8:30 ` Damien Le Moal
2025-08-11 20:08 ` [PATCH v23 02/16] blk-mq: Always insert sequential zoned writes into a software queue Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 03/16] blk-mq: Restore the zone write order when requeuing Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 04/16] blk-mq: Run all hwqs for sq scheds if write pipelining is enabled Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 05/16] block/mq-deadline: Preserve the zwr order if zoned write plugging " Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 06/16] blk-zoned: Add an argument to blk_zone_plug_bio() Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 07/16] blk-zoned: Split an if-statement Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 08/16] blk-zoned: Move code from disk_zone_wplug_add_bio() into its caller Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 09/16] blk-zoned: Introduce a loop in blk_zone_wplug_bio_work() Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 10/16] blk-zoned: Support pipelining of zoned writes Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 11/16] null_blk: Add the preserves_write_order attribute Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 12/16] scsi: core: Retry unaligned zoned writes Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 13/16] scsi: sd: Increase retry count for " Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 14/16] scsi: scsi_debug: Add the preserves_write_order module parameter Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 15/16] scsi: scsi_debug: Support injecting unaligned write errors Bart Van Assche
2025-08-11 20:08 ` [PATCH v23 16/16] ufs: core: Inform the block layer about write ordering Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7570f60f-932b-4b76-a87d-8f3f0760c44f@kernel.org \
--to=dlemoal@kernel.org \
--cc=axboe@kernel.dk \
--cc=bvanassche@acm.org \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).