* [PATCH] btrfs-progs: use zero-range ioctl to verify discard support
@ 2025-07-28 14:36 Anand Jain
2025-07-30 17:09 ` Boris Burkov
2025-07-31 2:56 ` Martin K. Petersen
0 siblings, 2 replies; 6+ messages in thread
From: Anand Jain @ 2025-07-28 14:36 UTC (permalink / raw)
To: linux-btrfs
The discard_supported() function currently checks
/sys/block/sda/queue/discard_granularity to determine if discard
is supported, assuming a non-zero value means support is available.
However, this isn't always reliable. For example, on a virtual device,
discard_granularity may be non-zero (e.g., 512), but an actual
ioctl(BLKDISCARD) call still fails with EOPNOTSUPP.
Example:
$ cat /sys/block/sda/queue/discard_granularity
512
$ ./mkfs.btrfs -vvv -f /dev/sda
...
Performing full device TRIM /dev/sda (3.00GiB) ...
discard_range ret -1 errno Operation not supported
...
One workaround is to also check discard_max_bytes for a non-zero value.
$ cat /sys/block/sda/queue/discard_max_bytes
0
But a better and more direct way is to test discard support by issuing
a BLKDISCARD ioctl with a zero-length range. If this call fails with
EOPNOTSUPP, discard isn't supported.
This patch changes discard_supported() to use that method.
Signed-off-by: Anand Jain anand.jain@oracle.com
---
common/device-utils.c | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/common/device-utils.c b/common/device-utils.c
index 783d79555446..d95d406d9676 100644
--- a/common/device-utils.c
+++ b/common/device-utils.c
@@ -53,30 +53,24 @@
*/
static int discard_range(int fd, u64 start, u64 len)
{
+ int ret;
u64 range[2] = { start, len };
- if (ioctl(fd, BLKDISCARD, &range) < 0)
- return errno;
+ ret = ioctl(fd, BLKDISCARD, &range);
+ if (ret < 0)
+ return -errno;
return 0;
}
-static int discard_supported(const char *device)
+static bool discard_supported(int fd)
{
int ret;
- char buf[128] = {};
- ret = device_get_queue_param(device, "discard_granularity", buf, sizeof(buf));
- if (ret == 0) {
- pr_verbose(3, "cannot read discard_granularity for %s\n", device);
- return 0;
- } else {
- if (atoi(buf) == 0) {
- pr_verbose(3, "%s: discard_granularity %s", device, buf);
- return 0;
- }
- }
+ ret = discard_range(fd, 0, 0);
+ if (ret == -EOPNOTSUPP)
+ return false;
- return 1;
+ return true;
}
/*
@@ -278,7 +272,7 @@ int btrfs_prepare_device(int fd, const char *file, u64 *byte_count_ret,
* is not necessary for the mkfs functionality but just an
* optimization.
*/
- if (discard_supported(file)) {
+ if (discard_supported(fd)) {
if (opflags & PREP_DEVICE_VERBOSE)
printf("Performing full device TRIM %s (%s) ...\n",
file, pretty_size(byte_count));
--
2.50.0
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] btrfs-progs: use zero-range ioctl to verify discard support
2025-07-28 14:36 [PATCH] btrfs-progs: use zero-range ioctl to verify discard support Anand Jain
@ 2025-07-30 17:09 ` Boris Burkov
2025-07-31 2:56 ` Martin K. Petersen
1 sibling, 0 replies; 6+ messages in thread
From: Boris Burkov @ 2025-07-30 17:09 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
On Mon, Jul 28, 2025 at 10:36:52PM +0800, Anand Jain wrote:
> The discard_supported() function currently checks
> /sys/block/sda/queue/discard_granularity to determine if discard
> is supported, assuming a non-zero value means support is available.
>
> However, this isn't always reliable. For example, on a virtual device,
> discard_granularity may be non-zero (e.g., 512), but an actual
> ioctl(BLKDISCARD) call still fails with EOPNOTSUPP.
>
> Example:
>
> $ cat /sys/block/sda/queue/discard_granularity
> 512
>
> $ ./mkfs.btrfs -vvv -f /dev/sda
> ...
> Performing full device TRIM /dev/sda (3.00GiB) ...
> discard_range ret -1 errno Operation not supported
> ...
>
> One workaround is to also check discard_max_bytes for a non-zero value.
>
> $ cat /sys/block/sda/queue/discard_max_bytes
> 0
>
> But a better and more direct way is to test discard support by issuing
> a BLKDISCARD ioctl with a zero-length range. If this call fails with
> EOPNOTSUPP, discard isn't supported.
Looking at the kernel code for the discard ioctl, it's basically just
checking discard_max_bytes but in the kernel.
After it succeeds or fails there, it will always fail with EINVAL in
blk_validate_discard_range() so at least it should be safe to issue the
0 length discard (for now?)
To me, it makes more sense to just read the other file.
I like that your approach handles the kernel logic changing how it
decides EOPNOTSUPP. I don't like that your approach might result in
actually doing some unexpected zero length discard in the future if
that takes on a different semantic meaning.
I dunno, both seem unlikely, but it seems easiest to just read both
files.
Thanks,
Boris
>
> This patch changes discard_supported() to use that method.
>
> Signed-off-by: Anand Jain anand.jain@oracle.com
> ---
> common/device-utils.c | 26 ++++++++++----------------
> 1 file changed, 10 insertions(+), 16 deletions(-)
>
> diff --git a/common/device-utils.c b/common/device-utils.c
> index 783d79555446..d95d406d9676 100644
> --- a/common/device-utils.c
> +++ b/common/device-utils.c
> @@ -53,30 +53,24 @@
> */
> static int discard_range(int fd, u64 start, u64 len)
> {
> + int ret;
> u64 range[2] = { start, len };
>
> - if (ioctl(fd, BLKDISCARD, &range) < 0)
> - return errno;
> + ret = ioctl(fd, BLKDISCARD, &range);
> + if (ret < 0)
> + return -errno;
> return 0;
> }
>
> -static int discard_supported(const char *device)
> +static bool discard_supported(int fd)
> {
> int ret;
> - char buf[128] = {};
>
> - ret = device_get_queue_param(device, "discard_granularity", buf, sizeof(buf));
> - if (ret == 0) {
> - pr_verbose(3, "cannot read discard_granularity for %s\n", device);
> - return 0;
> - } else {
> - if (atoi(buf) == 0) {
> - pr_verbose(3, "%s: discard_granularity %s", device, buf);
> - return 0;
> - }
> - }
> + ret = discard_range(fd, 0, 0);
> + if (ret == -EOPNOTSUPP)
> + return false;
>
> - return 1;
> + return true;
> }
>
> /*
> @@ -278,7 +272,7 @@ int btrfs_prepare_device(int fd, const char *file, u64 *byte_count_ret,
> * is not necessary for the mkfs functionality but just an
> * optimization.
> */
> - if (discard_supported(file)) {
> + if (discard_supported(fd)) {
> if (opflags & PREP_DEVICE_VERBOSE)
> printf("Performing full device TRIM %s (%s) ...\n",
> file, pretty_size(byte_count));
> --
> 2.50.0
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] btrfs-progs: use zero-range ioctl to verify discard support
2025-07-28 14:36 [PATCH] btrfs-progs: use zero-range ioctl to verify discard support Anand Jain
2025-07-30 17:09 ` Boris Burkov
@ 2025-07-31 2:56 ` Martin K. Petersen
2025-07-31 14:25 ` Christoph Hellwig
2025-08-01 2:02 ` Anand Jain
1 sibling, 2 replies; 6+ messages in thread
From: Martin K. Petersen @ 2025-07-31 2:56 UTC (permalink / raw)
To: Anand Jain; +Cc: linux-btrfs
Hi Anand!
> However, this isn't always reliable. For example, on a virtual device,
> discard_granularity may be non-zero (e.g., 512), but an actual
> ioctl(BLKDISCARD) call still fails with EOPNOTSUPP.
Regrettably this has changed as a result of block layer code shuffling.
Documentation/ABI/stable/sysfs-block states:
What: /sys/block/<disk>/queue/discard_granularity
[...]
A discard_granularity of 0 means that the device does not support
discard functionality.
The semantics visible to userspace were inadvertently changed and I
would actually like to see that addressed. Reporting a default
granularity of 512 when discard is unsupported has broken other apps
too.
> One workaround is to also check discard_max_bytes for a non-zero value.
>
> $ cat /sys/block/sda/queue/discard_max_bytes
> 0
>
> But a better and more direct way is to test discard support by issuing
> a BLKDISCARD ioctl with a zero-length range. If this call fails with
> EOPNOTSUPP, discard isn't supported.
In SCSI, a WRITE SAME operation with a transfer length of 0 translates
to "entire device". And in NVMe, a WRITE ZEROES block count of 0
translates to "1 logical block".
While we currently catch zero-length discards in the block layer before
they get passed down the stack, I am concerned about userland
applications relying on these operations being non-destructive in the
zero-length case. Something which can't be expressed at the storage
protocol level.
So I would prefer for you to validate that both discard_granularity and
discover_max_bytes are non-zero. Obviously we have no plans to remove
the existing check in blk_ioctl_discard(). But if we mess something up
by accident, data loss could result. A zero transfer length write
operation is not inherently safe. Whereas inspecting the queue topology
is entirely innocuous.
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] btrfs-progs: use zero-range ioctl to verify discard support
2025-07-31 2:56 ` Martin K. Petersen
@ 2025-07-31 14:25 ` Christoph Hellwig
2025-07-31 15:42 ` Martin K. Petersen
2025-08-01 2:02 ` Anand Jain
1 sibling, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2025-07-31 14:25 UTC (permalink / raw)
To: Martin K. Petersen; +Cc: Anand Jain, linux-btrfs
On Wed, Jul 30, 2025 at 10:56:12PM -0400, Martin K. Petersen wrote:
> The semantics visible to userspace were inadvertently changed and I
> would actually like to see that addressed. Reporting a default
> granularity of 512 when discard is unsupported has broken other apps
> too.
Next time you see a change that breaks interfaces, a timely bug report
would be really helpful.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] btrfs-progs: use zero-range ioctl to verify discard support
2025-07-31 14:25 ` Christoph Hellwig
@ 2025-07-31 15:42 ` Martin K. Petersen
0 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2025-07-31 15:42 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Martin K. Petersen, Anand Jain, linux-btrfs
Hi Christoph!
> Next time you see a change that breaks interfaces, a timely bug report
> would be really helpful.
I wasn't the one that tripped over this. Anand's patch just made me
remember that somebody else had identified the same change in behavior a
while back. I'm thinking maybe it was in the context of the loop driver
or maybe nbd or drbd. My recollection is that it was in that general
area but I'm not sure. Since this wasn't caused by anything I did, I
didn't pay too much attention at the time. I just remember seeing the
discussion fly by...
--
Martin K. Petersen
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] btrfs-progs: use zero-range ioctl to verify discard support
2025-07-31 2:56 ` Martin K. Petersen
2025-07-31 14:25 ` Christoph Hellwig
@ 2025-08-01 2:02 ` Anand Jain
1 sibling, 0 replies; 6+ messages in thread
From: Anand Jain @ 2025-08-01 2:02 UTC (permalink / raw)
To: Martin K. Petersen, Boris Burkov; +Cc: linux-btrfs
Boris, Martin,
Sorry for the delayed response, I was OOO.
> So I would prefer for you to validate that both discard_granularity and
> discover_max_bytes are non-zero.
Thanks for the feedback - understood. I'll switch to the alternative
check: whether either discard_granularity or discard_max_bytes is zero.
> Obviously we have no plans to remove
> the existing check in blk_ioctl_discard(). But if we mess something up
> by accident, data loss could result. A zero transfer length write
> operation is not inherently safe. Whereas inspecting the queue topology
> is entirely innocuous.
Got it.
Thanks, Anand
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-08-01 2:02 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-28 14:36 [PATCH] btrfs-progs: use zero-range ioctl to verify discard support Anand Jain
2025-07-30 17:09 ` Boris Burkov
2025-07-31 2:56 ` Martin K. Petersen
2025-07-31 14:25 ` Christoph Hellwig
2025-07-31 15:42 ` Martin K. Petersen
2025-08-01 2:02 ` Anand Jain
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.