From: Zhang Yi <yi.zhang@huaweicloud.com>
To: linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-block@vger.kernel.org, dm-devel@lists.linux.dev,
linux-nvme@lists.infradead.org, linux-scsi@vger.kernel.org
Cc: linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
hch@lst.de, tytso@mit.edu, djwong@kernel.org,
john.g.garry@oracle.com, bmarzins@redhat.com,
chaitanyak@nvidia.com, shinichiro.kawasaki@wdc.com,
brauner@kernel.org, yi.zhang@huawei.com,
yi.zhang@huaweicloud.com, chengzhihao1@huawei.com,
yukuai3@huawei.com, yangerkun@huawei.com
Subject: [RFC PATCH v4 01/11] block: introduce BLK_FEAT_WRITE_ZEROES_UNMAP to queue limits features
Date: Mon, 21 Apr 2025 10:14:59 +0800 [thread overview]
Message-ID: <20250421021509.2366003-2-yi.zhang@huaweicloud.com> (raw)
In-Reply-To: <20250421021509.2366003-1-yi.zhang@huaweicloud.com>
From: Zhang Yi <yi.zhang@huawei.com>
Currently, disks primarily implement the write zeroes command (aka
REQ_OP_WRITE_ZEROES) through two mechanisms: the first involves
physically writing zeros to the disk media (e.g., HDDs), while the
second performs an unmap operation on the logical blocks, effectively
putting them into a deallocated state (e.g., SSDs). The first method is
generally slow, while the second method is typically very fast.
For example, on certain NVMe SSDs that support NVME_NS_DEAC, submitting
REQ_OP_WRITE_ZEROES requests with the NVME_WZ_DEAC bit can accelerate
the write zeros operation by placing disk blocks into a deallocated
state (this is a best effort, not a mandatory requirement, some devices
may partially fall back to writing physical zeroes due to factors such
as receiving unaligned commands). However, it is difficult to determine
whether the storage device supports unmap write zeroes. We cannot
determine this by querying bdev_limits(bdev)->max_write_zeroes_sectors.
Therefore, add a new queue limit feature, BLK_FEAT_WRITE_ZEROES_UNMAP
and the corresponding sysfs entry, to indicate whether the block device
explicitly supports the unmapped write zeroes command. Each device
driver should set this bit if it is certain that the attached disk
supports this command. If the bit is not set, the disk either does not
support it, or its support status is unknown.
For the stacked devices cases, the BLK_FEAT_WRITE_ZEROES_UNMAP should be
supported both by the stacking driver and all underlying devices.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
Documentation/ABI/stable/sysfs-block | 18 ++++++++++++++++++
block/blk-settings.c | 6 ++++++
block/blk-sysfs.c | 3 +++
include/linux/blkdev.h | 8 ++++++++
4 files changed, 35 insertions(+)
diff --git a/Documentation/ABI/stable/sysfs-block b/Documentation/ABI/stable/sysfs-block
index 3879963f0f01..6531cdfcaacf 100644
--- a/Documentation/ABI/stable/sysfs-block
+++ b/Documentation/ABI/stable/sysfs-block
@@ -763,6 +763,24 @@ Description:
0, write zeroes is not supported by the device.
+What: /sys/block/<disk>/queue/write_zeroes_unmap
+Date: January 2025
+Contact: Zhang Yi <yi.zhang@huawei.com>
+Description:
+ [RO] Devices that explicitly support the unmap write zeroes
+ operation in which a single write zeroes request with the unmap
+ bit set to zero out the range of contiguous blocks on storage
+ by freeing blocks, rather than writing physical zeroes to the
+ media. If the write_zeroes_unmap is set to 1, this indicates
+ that the device explicitly supports the write zero command.
+ However, this may be a best-effort optimization rather than a
+ mandatory requirement, some devices may partially fall back to
+ writing physical zeroes due to factors such as receiving
+ unaligned commands. If the parameter is set to 0, the device
+ either does not support this operation, or its support status is
+ unknown.
+
+
What: /sys/block/<disk>/queue/zone_append_max_bytes
Date: May 2020
Contact: linux-block@vger.kernel.org
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 6b2dbe645d23..3331d07bd5d9 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -697,6 +697,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->features &= ~BLK_FEAT_NOWAIT;
if (!(b->features & BLK_FEAT_POLL))
t->features &= ~BLK_FEAT_POLL;
+ if (!(b->features & BLK_FEAT_WRITE_ZEROES_UNMAP))
+ t->features &= ~BLK_FEAT_WRITE_ZEROES_UNMAP;
t->flags |= (b->flags & BLK_FLAG_MISALIGNED);
@@ -819,6 +821,10 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
t->zone_write_granularity = 0;
t->max_zone_append_sectors = 0;
}
+
+ if (!t->max_write_zeroes_sectors)
+ t->features &= ~BLK_FEAT_WRITE_ZEROES_UNMAP;
+
blk_stack_atomic_writes_limits(t, b, start);
return ret;
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index a2882751f0d2..7a9c20bd3779 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -261,6 +261,7 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \
QUEUE_SYSFS_FEATURE_SHOW(fua, BLK_FEAT_FUA);
QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX);
+QUEUE_SYSFS_FEATURE_SHOW(write_zeroes_unmap, BLK_FEAT_WRITE_ZEROES_UNMAP);
static ssize_t queue_poll_show(struct gendisk *disk, char *page)
{
@@ -510,6 +511,7 @@ QUEUE_LIM_RO_ENTRY(queue_atomic_write_unit_min, "atomic_write_unit_min_bytes");
QUEUE_RO_ENTRY(queue_write_same_max, "write_same_max_bytes");
QUEUE_LIM_RO_ENTRY(queue_max_write_zeroes_sectors, "write_zeroes_max_bytes");
+QUEUE_LIM_RO_ENTRY(queue_write_zeroes_unmap, "write_zeroes_unmap");
QUEUE_LIM_RO_ENTRY(queue_max_zone_append_sectors, "zone_append_max_bytes");
QUEUE_LIM_RO_ENTRY(queue_zone_write_granularity, "zone_write_granularity");
@@ -656,6 +658,7 @@ static struct attribute *queue_attrs[] = {
&queue_atomic_write_unit_min_entry.attr,
&queue_atomic_write_unit_max_entry.attr,
&queue_max_write_zeroes_sectors_entry.attr,
+ &queue_write_zeroes_unmap_entry.attr,
&queue_max_zone_append_sectors_entry.attr,
&queue_zone_write_granularity_entry.attr,
&queue_rotational_entry.attr,
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e39c45bc0a97..7c8752578e36 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -342,6 +342,9 @@ typedef unsigned int __bitwise blk_features_t;
#define BLK_FEAT_ATOMIC_WRITES \
((__force blk_features_t)(1u << 16))
+/* supports unmap write zeroes command */
+#define BLK_FEAT_WRITE_ZEROES_UNMAP ((__force blk_features_t)(1u << 17))
+
/*
* Flags automatically inherited when stacking limits.
*/
@@ -1341,6 +1344,11 @@ static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev)
return bdev_limits(bdev)->max_write_zeroes_sectors;
}
+static inline bool bdev_write_zeroes_unmap(struct block_device *bdev)
+{
+ return bdev_limits(bdev)->features & BLK_FEAT_WRITE_ZEROES_UNMAP;
+}
+
static inline bool bdev_nonrot(struct block_device *bdev)
{
return blk_queue_nonrot(bdev_get_queue(bdev));
--
2.46.1
next prev parent reply other threads:[~2025-04-21 2:25 UTC|newest]
Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-21 2:14 [RFC PATCH v4 00/11] fallocate: introduce FALLOC_FL_WRITE_ZEROES flag Zhang Yi
2025-04-21 2:14 ` Zhang Yi [this message]
2025-05-05 11:54 ` [RFC PATCH v4 01/11] block: introduce BLK_FEAT_WRITE_ZEROES_UNMAP to queue limits features Christoph Hellwig
2025-05-06 4:21 ` Martin K. Petersen
2025-05-06 7:51 ` Zhang Yi
2025-04-21 2:15 ` [RFC PATCH v4 02/11] nvme: set BLK_FEAT_WRITE_ZEROES_UNMAP if device supports DEAC bit Zhang Yi
2025-05-05 11:55 ` Christoph Hellwig
2025-04-21 2:15 ` [RFC PATCH v4 03/11] nvme-multipath: add BLK_FEAT_WRITE_ZEROES_UNMAP support Zhang Yi
2025-05-05 11:55 ` Christoph Hellwig
2025-04-21 2:15 ` [RFC PATCH v4 04/11] nvmet: set WZDS and DRB if device supports BLK_FEAT_WRITE_ZEROES_UNMAP Zhang Yi
2025-05-05 11:56 ` Christoph Hellwig
2025-04-21 2:15 ` [RFC PATCH v4 05/11] scsi: sd: set BLK_FEAT_WRITE_ZEROES_UNMAP if device supports unmap zeroing mode Zhang Yi
2025-04-21 2:15 ` [RFC PATCH v4 06/11] dm: add BLK_FEAT_WRITE_ZEROES_UNMAP support Zhang Yi
2025-04-21 2:15 ` [RFC PATCH v4 07/11] fs: statx add write zeroes unmap attribute Zhang Yi
2025-05-05 13:22 ` Christoph Hellwig
2025-05-05 14:29 ` Darrick J. Wong
2025-05-06 4:28 ` Zhang Yi
2025-05-06 4:39 ` Christoph Hellwig
2025-05-06 11:16 ` Zhang Yi
2025-05-06 12:11 ` Christoph Hellwig
2025-05-07 7:33 ` Zhang Yi
2025-05-07 21:03 ` Darrick J. Wong
2025-05-08 5:01 ` Christoph Hellwig
2025-05-08 12:17 ` Zhang Yi
2025-05-08 20:24 ` Theodore Ts'o
2025-05-09 12:35 ` Zhang Yi
2025-05-06 5:02 ` Christoph Hellwig
2025-05-06 5:36 ` Darrick J. Wong
2025-05-06 5:47 ` Christoph Hellwig
2025-05-06 11:25 ` Zhang Yi
2025-05-06 12:10 ` Christoph Hellwig
2025-05-06 15:55 ` Darrick J. Wong
2025-05-07 8:23 ` Zhang Yi
2025-04-21 2:15 ` [RFC PATCH v4 08/11] fs: introduce FALLOC_FL_WRITE_ZEROES to fallocate Zhang Yi
2025-05-05 13:22 ` Christoph Hellwig
2025-04-21 2:15 ` [RFC PATCH v4 09/11] block: factor out common part in blkdev_fallocate() Zhang Yi
2025-04-21 2:15 ` [RFC PATCH v4 10/11] block: add FALLOC_FL_WRITE_ZEROES support Zhang Yi
2025-04-21 2:15 ` [RFC PATCH v4 11/11] ext4: " Zhang Yi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250421021509.2366003-2-yi.zhang@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=bmarzins@redhat.com \
--cc=brauner@kernel.org \
--cc=chaitanyak@nvidia.com \
--cc=chengzhihao1@huawei.com \
--cc=djwong@kernel.org \
--cc=dm-devel@lists.linux.dev \
--cc=hch@lst.de \
--cc=john.g.garry@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=shinichiro.kawasaki@wdc.com \
--cc=tytso@mit.edu \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).