From: "Martin K. Petersen" <martin.petersen@oracle.com>
To: Anand Jain <anand.jain@oracle.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs-progs: use zero-range ioctl to verify discard support
Date: Wed, 30 Jul 2025 22:56:12 -0400 [thread overview]
Message-ID: <yq1ldo5i0jn.fsf@ca-mkp.ca.oracle.com> (raw)
In-Reply-To: <2f9687740a9f9d60bdea8d24f215c6c0e2a9657b.1753713395.git.anand.jain@oracle.com> (Anand Jain's message of "Mon, 28 Jul 2025 22:36:52 +0800")
Hi Anand!
> However, this isn't always reliable. For example, on a virtual device,
> discard_granularity may be non-zero (e.g., 512), but an actual
> ioctl(BLKDISCARD) call still fails with EOPNOTSUPP.
Regrettably this has changed as a result of block layer code shuffling.
Documentation/ABI/stable/sysfs-block states:
What: /sys/block/<disk>/queue/discard_granularity
[...]
A discard_granularity of 0 means that the device does not support
discard functionality.
The semantics visible to userspace were inadvertently changed and I
would actually like to see that addressed. Reporting a default
granularity of 512 when discard is unsupported has broken other apps
too.
> One workaround is to also check discard_max_bytes for a non-zero value.
>
> $ cat /sys/block/sda/queue/discard_max_bytes
> 0
>
> But a better and more direct way is to test discard support by issuing
> a BLKDISCARD ioctl with a zero-length range. If this call fails with
> EOPNOTSUPP, discard isn't supported.
In SCSI, a WRITE SAME operation with a transfer length of 0 translates
to "entire device". And in NVMe, a WRITE ZEROES block count of 0
translates to "1 logical block".
While we currently catch zero-length discards in the block layer before
they get passed down the stack, I am concerned about userland
applications relying on these operations being non-destructive in the
zero-length case. Something which can't be expressed at the storage
protocol level.
So I would prefer for you to validate that both discard_granularity and
discover_max_bytes are non-zero. Obviously we have no plans to remove
the existing check in blk_ioctl_discard(). But if we mess something up
by accident, data loss could result. A zero transfer length write
operation is not inherently safe. Whereas inspecting the queue topology
is entirely innocuous.
--
Martin K. Petersen
next prev parent reply other threads:[~2025-07-31 2:56 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-28 14:36 [PATCH] btrfs-progs: use zero-range ioctl to verify discard support Anand Jain
2025-07-30 17:09 ` Boris Burkov
2025-07-31 2:56 ` Martin K. Petersen [this message]
2025-07-31 14:25 ` Christoph Hellwig
2025-07-31 15:42 ` Martin K. Petersen
2025-08-01 2:02 ` Anand Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=yq1ldo5i0jn.fsf@ca-mkp.ca.oracle.com \
--to=martin.petersen@oracle.com \
--cc=anand.jain@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.