[PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
@ 2026-03-11 22:26 David Timber
  2026-03-12  3:23 ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: David Timber @ 2026-03-11 22:26 UTC (permalink / raw)
  To: linkinjeon, sj1557.seo
  Cc: yuezhang.mo, linux-fsdevel, linux-kernel, David Timber

When a file in exfat fs gets truncated up or fallocate()'d up, only
additional clusters are allocated and isize updated whilst VDL(valid
data length) remains unchanged. If an application writes to the file
past the VDL, significant IO delay can occur as the skipped range
[VDL, offset) has to be zeroed out before returning to userspace. Some
users may find this caveat unacceptible.

Some niche applications(especially embedded systems) may want to query
the discrepancy between the VDL and isize before doing lseek() and
write() to estimate the delay from implicit zeroring.

The commit introduces a new ioctl EXFAT_IOC_GET_VALID_DATA, which
correspond to `fsutil file queryvaliddata ...` available on Windows.
With the new ioctl, applications could assess the delay that may incur
and make decisions accordingly before seeking past the VDL to write.

Signed-off-by: David Timber <dxdt@dev.snart.me>
---
 fs/exfat/file.c            | 22 ++++++++++++++++++++++
 include/uapi/linux/exfat.h |  6 ++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 90cd540afeaa..a13044a7065a 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -449,6 +449,22 @@ static int exfat_ioctl_set_attributes(struct file *file, u32 __user *user_attr)
 	return err;
 }
 
+static int exfat_ioctl_get_valid_data(struct inode *inode, unsigned long arg)
+{
+	u64 valid_size;
+
+	/*
+	 * Doesn't really make sense to acquire lock for a getter op but we have
+	 * to stay consistent with the grandfather clause -
+	 * ioctl_get_attributes().
+	 */
+	inode_lock(inode);
+	valid_size = EXFAT_I(inode)->valid_size;
+	inode_unlock(inode);
+
+	return put_user(valid_size, (__u64 __user *)arg);
+}
+
 static int exfat_ioctl_fitrim(struct inode *inode, unsigned long arg)
 {
 	struct fstrim_range range;
@@ -544,10 +560,15 @@ long exfat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 	u32 __user *user_attr = (u32 __user *)arg;
 
 	switch (cmd) {
+	/* inode-specific ops */
 	case FAT_IOCTL_GET_ATTRIBUTES:
 		return exfat_ioctl_get_attributes(inode, user_attr);
 	case FAT_IOCTL_SET_ATTRIBUTES:
 		return exfat_ioctl_set_attributes(filp, user_attr);
+	case EXFAT_IOC_GET_VALID_DATA:
+		return exfat_ioctl_get_valid_data(inode, arg);
+
+	/* fs-wide ops */
 	case EXFAT_IOC_SHUTDOWN:
 		return exfat_ioctl_shutdown(inode->i_sb, arg);
 	case FITRIM:
@@ -556,6 +577,7 @@ long exfat_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
 		return exfat_ioctl_get_volume_label(inode->i_sb, arg);
 	case FS_IOC_SETFSLABEL:
 		return exfat_ioctl_set_volume_label(inode->i_sb, arg);
+
 	default:
 		return -ENOTTY;
 	}
diff --git a/include/uapi/linux/exfat.h b/include/uapi/linux/exfat.h
index 050dcea0aa12..cbc10458122e 100644
--- a/include/uapi/linux/exfat.h
+++ b/include/uapi/linux/exfat.h
@@ -12,8 +12,10 @@
  * exfat-specific ioctl commands
  */
 
-#define EXFAT_IOCTL_MAGIC	0xEF			/* shared with ntfs3 */
-#define EXFAT_IOC_SHUTDOWN	_IOR('X', 125, __u32)
+#define EXFAT_IOCTL_MAGIC		0xEF			/* shared with ntfs */
+#define EXFAT_IOC_SHUTDOWN		_IOR('X', 125, __u32)
+/* Get the current valid data length(VDL) of a file */
+#define EXFAT_IOC_GET_VALID_DATA	_IOR(EXFAT_IOCTL_MAGIC, 0x01, __u64)
 
 /*
  * Flags used by EXFAT_IOC_SHUTDOWN
-- 
2.53.0.1.ga224b40d3f.dirty


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
  2026-03-11 22:26 [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl David Timber
@ 2026-03-12  3:23 ` Matthew Wilcox
  2026-03-12 14:02   ` David Timber
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2026-03-12  3:23 UTC (permalink / raw)
  To: David Timber
  Cc: linkinjeon, sj1557.seo, yuezhang.mo, linux-fsdevel, linux-kernel

On Thu, Mar 12, 2026 at 07:26:13AM +0900, David Timber wrote:
> When a file in exfat fs gets truncated up or fallocate()'d up, only
> additional clusters are allocated and isize updated whilst VDL(valid
> data length) remains unchanged. If an application writes to the file
> past the VDL, significant IO delay can occur as the skipped range
> [VDL, offset) has to be zeroed out before returning to userspace. Some
> users may find this caveat unacceptible.
> 
> Some niche applications(especially embedded systems) may want to query
> the discrepancy between the VDL and isize before doing lseek() and
> write() to estimate the delay from implicit zeroring.
> 
> The commit introduces a new ioctl EXFAT_IOC_GET_VALID_DATA, which
> correspond to `fsutil file queryvaliddata ...` available on Windows.
> With the new ioctl, applications could assess the delay that may incur
> and make decisions accordingly before seeking past the VDL to write.

We already have two interfaces for this on Linux.  One is SEEK_HOLE /
SEEK_DATA and the other is fiemap (Documentation/filesystems/fiemap.rst)
Why are both of these interfaces unsuitable?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
  2026-03-12  3:23 ` Matthew Wilcox
@ 2026-03-12 14:02   ` David Timber
  2026-03-12 14:57     ` Matthew Wilcox
  0 siblings, 1 reply; 11+ messages in thread
From: David Timber @ 2026-03-12 14:02 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linkinjeon, sj1557.seo, yuezhang.mo, linux-fsdevel, linux-kernel

On 3/12/26 12:23, Matthew Wilcox wrote:
> We already have two interfaces for this on Linux.  One is SEEK_HOLE /
> SEEK_DATA and the other is fiemap (Documentation/filesystems/fiemap.rst)
> Why are both of these interfaces unsuitable?
https://lore.kernel.org/linux-fsdevel/cf6c2b08-b7ff-4f70-95f4-cdb12ef5a666@dev.snart.me/

Because exFAT is not a sparse file system. The VDL is only a shorthand
for fast cluster allocation without writing actual data to them. In
other words, the range between the VDL and isize is not actually a hole.
The blocks in the range are actually allocated, filled with garbage data
on the disk. The kernel has to be careful not to return it to userspace,
which is something Linux kernel actually does.

Suporting SEEK_DATA in exFAT will certainly give a wrong impression that
it is a sparse fs. On the other hand, FALLOC_FL_PUNCH_HOLE on an
arbitrary range(other than docking to EOF) is not possible. Therefore,
it wouldn't make sense if a fs could do SEEK_HOLE, but not punch holes.
That would most likely confuse many userland programs like cp. That's
why I thought lseek() is not a good fit for querying VDL.

Truncating up a file in sparse file systems does not change st_blksize,
but exFAT does. I have to say this is rather an unfortunate digergence,
certainly not POSIX compliant, but the behaviour existed for a long time
now and no one is coming forward to say that something caught on fire yet.

I don't understand why people keep bringing up FIEMAP. It is a debugging
interface and it should stay that way. It's simply not the right tool to
find holes in files. SEEK_DATA and SEEK_HOLE became POSIX, so using
FIEMAP to find holes and data shouldn't even be an option at this point.

Davo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
  2026-03-12 14:02   ` David Timber
@ 2026-03-12 14:57     ` Matthew Wilcox
  2026-03-12 15:06       ` Darrick J. Wong
  0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2026-03-12 14:57 UTC (permalink / raw)
  To: David Timber
  Cc: linkinjeon, sj1557.seo, yuezhang.mo, linux-fsdevel, linux-kernel

On Thu, Mar 12, 2026 at 11:02:29PM +0900, David Timber wrote:
> On 3/12/26 12:23, Matthew Wilcox wrote:
> > We already have two interfaces for this on Linux.  One is SEEK_HOLE /
> > SEEK_DATA and the other is fiemap (Documentation/filesystems/fiemap.rst)
> > Why are both of these interfaces unsuitable?
> https://lore.kernel.org/linux-fsdevel/cf6c2b08-b7ff-4f70-95f4-cdb12ef5a666@dev.snart.me/
> 
> Because exFAT is not a sparse file system. The VDL is only a shorthand
> for fast cluster allocation without writing actual data to them. In
> other words, the range between the VDL and isize is not actually a hole.
> The blocks in the range are actually allocated, filled with garbage data
> on the disk. The kernel has to be careful not to return it to userspace,
> which is something Linux kernel actually does.

Uh, no it's not.  If you try to read from the file at positions mapped
to those blocks, Linux will return zeroes.

You seem to be under the impression that SEEK_HOLE only finds blocks
which have not been allocated.  That's not the behaviour of any
filesystem whch uses iomap.  Look:

static int iomap_seek_hole_iter(struct iomap_iter *iter,
                loff_t *hole_pos)
{
        loff_t length = iomap_length(iter);

        switch (iter->iomap.type) {
        case IOMAP_UNWRITTEN:
                *hole_pos = mapping_seek_hole_data(iter->inode->i_mapping,
                                iter->pos, iter->pos + length, SEEK_HOLE);
                if (*hole_pos == iter->pos + length)
                        return iomap_iter_advance(iter, length);
                return 0;
        case IOMAP_HOLE:
                *hole_pos = iter->pos;
                return 0;

Yes, if there's literally a hole, that counts as a hole, but what you're
talking about is an unwritten extent.  And that counts as a hole
*unless* we've written to the page cache covering the hole.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
  2026-03-12 14:57     ` Matthew Wilcox
@ 2026-03-12 15:06       ` Darrick J. Wong
  2026-03-13  7:59         ` David Timber
  0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2026-03-12 15:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: David Timber, linkinjeon, sj1557.seo, yuezhang.mo, linux-fsdevel,
	linux-kernel

On Thu, Mar 12, 2026 at 02:57:04PM +0000, Matthew Wilcox wrote:
> On Thu, Mar 12, 2026 at 11:02:29PM +0900, David Timber wrote:
> > On 3/12/26 12:23, Matthew Wilcox wrote:
> > > We already have two interfaces for this on Linux.  One is SEEK_HOLE /
> > > SEEK_DATA and the other is fiemap (Documentation/filesystems/fiemap.rst)
> > > Why are both of these interfaces unsuitable?
> > https://lore.kernel.org/linux-fsdevel/cf6c2b08-b7ff-4f70-95f4-cdb12ef5a666@dev.snart.me/
> > 
> > Because exFAT is not a sparse file system. The VDL is only a shorthand
> > for fast cluster allocation without writing actual data to them. In
> > other words, the range between the VDL and isize is not actually a hole.
> > The blocks in the range are actually allocated, filled with garbage data
> > on the disk. The kernel has to be careful not to return it to userspace,
> > which is something Linux kernel actually does.
> 
> Uh, no it's not.  If you try to read from the file at positions mapped
> to those blocks, Linux will return zeroes.
> 
> You seem to be under the impression that SEEK_HOLE only finds blocks
> which have not been allocated.  That's not the behaviour of any
> filesystem whch uses iomap.  Look:
> 
> static int iomap_seek_hole_iter(struct iomap_iter *iter,
>                 loff_t *hole_pos)
> {
>         loff_t length = iomap_length(iter);
> 
>         switch (iter->iomap.type) {
>         case IOMAP_UNWRITTEN:
>                 *hole_pos = mapping_seek_hole_data(iter->inode->i_mapping,
>                                 iter->pos, iter->pos + length, SEEK_HOLE);
>                 if (*hole_pos == iter->pos + length)
>                         return iomap_iter_advance(iter, length);
>                 return 0;
>         case IOMAP_HOLE:
>                 *hole_pos = iter->pos;
>                 return 0;
> 
> Yes, if there's literally a hole, that counts as a hole, but what you're
> talking about is an unwritten extent.  And that counts as a hole
> *unless* we've written to the page cache covering the hole.

/me wonders if the problem here is that the "unwritten" post-VDL range
is worse than a regular unwritten range in the sense that you have to
write zeroes to all the space between the VDL and wherever your write()
starts, e.g. if the VDL is set to 1G and I pwrite a single byte at 8GB,
that turns into a 7-billion-x write amplification.

OTOH its exfat so nobody's expecting it to be fast, so we could just
treat the post-vdl area as unwritten, as far as SEEK_HOLE is concerned.

Also FIEMAP is evil because (a) the filesystem can change/optimize
mappings in the background so the results can be obsolete by the time
the syscall returns and (b) it doesn't tell you about dirty pagecache
fronting unwritten areas.

--D

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
  2026-03-12 15:06       ` Darrick J. Wong
@ 2026-03-13  7:59         ` David Timber
  2026-03-13 13:59           ` Namjae Jeon
  2026-03-13 16:24           ` Darrick J. Wong
  0 siblings, 2 replies; 11+ messages in thread
From: David Timber @ 2026-03-13  7:59 UTC (permalink / raw)
  To: Darrick J. Wong, Matthew Wilcox
  Cc: linkinjeon, sj1557.seo, yuezhang.mo, linux-fsdevel, linux-kernel

On 3/13/26 00:06, Darrick J. Wong wrote:
> /me wonders if the problem here is that the "unwritten" post-VDL range
> is worse than a regular unwritten range in the sense that you have to
> write zeroes to all the space between the VDL and wherever your write()
> starts, e.g. if the VDL is set to 1G and I pwrite a single byte at 8GB,
> that turns into a 7-billion-x write amplification.
Yes. That's the problem being addressed here.

Again, imho, VDL is NOT a hole. It should be treated differently. VDL is
not required to be aligned to the block/extent/cluster size(holes in
sparse files tend to be albeit not a requirement). You can't punch or
dig holes in exFAT. Using SEEK_*HOLE* to find *VDL* doesn't make any
sense(to me, at least).

thb, I don't really care if the ioctl patch is accepted or not. If
SEEK_HOLE/SEEK_DATA is really what maintainers see fit despite these
deviations, I'd have to honour that decision and bring iomap to exFAT, I
guess. That'd be a lot of work though because it mean will the rework of
the entire exFAT code base. Not saying exFAT doesn't deserve iomap but
that might be a little over my paygrade since there are thousands of
embedded devices already using in-kernel exFAT.

I retract the patches regarding the ioctl. For the time being, the focus
should be on making exFAT and NTFS useable and stable before introducing
ioctls. Sorry for my poor judgement.

Davo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
  2026-03-13  7:59         ` David Timber
@ 2026-03-13 13:59           ` Namjae Jeon
  2026-03-13 16:24           ` Darrick J. Wong
  1 sibling, 0 replies; 11+ messages in thread
From: Namjae Jeon @ 2026-03-13 13:59 UTC (permalink / raw)
  To: David Timber
  Cc: Darrick J. Wong, Matthew Wilcox, sj1557.seo, yuezhang.mo,
	linux-fsdevel, linux-kernel

> thb, I don't really care if the ioctl patch is accepted or not. If
> SEEK_HOLE/SEEK_DATA is really what maintainers see fit despite these
> deviations, I'd have to honour that decision and bring iomap to exFAT, I
> guess. That'd be a lot of work though because it mean will the rework of
> the entire exFAT code base. Not saying exFAT doesn't deserve iomap but
> that might be a little over my paygrade since there are thousands of
> embedded devices already using in-kernel exFAT.
I'm working on adding iomap support to exFAT, and I think SEEK_HOLE
will be able to address the requirements we discussed. I will bring
this up again once the iomap work is complete.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
  2026-03-13  7:59         ` David Timber
  2026-03-13 13:59           ` Namjae Jeon
@ 2026-03-13 16:24           ` Darrick J. Wong
  2026-03-16 22:37             ` David Timber
  1 sibling, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2026-03-13 16:24 UTC (permalink / raw)
  To: David Timber
  Cc: Matthew Wilcox, linkinjeon, sj1557.seo, yuezhang.mo,
	linux-fsdevel, linux-kernel

On Fri, Mar 13, 2026 at 04:59:27PM +0900, David Timber wrote:
> On 3/13/26 00:06, Darrick J. Wong wrote:
> > /me wonders if the problem here is that the "unwritten" post-VDL range
> > is worse than a regular unwritten range in the sense that you have to
> > write zeroes to all the space between the VDL and wherever your write()
> > starts, e.g. if the VDL is set to 1G and I pwrite a single byte at 8GB,
> > that turns into a 7-billion-x write amplification.
> Yes. That's the problem being addressed here.

Ah, ok.  Does it do that zeroing at write() time, or only when you're
initiating writeback from the pagecache?  I'm guessing write() time,
since otherwise you're signing the kernel up for initiating a lot of IO
at a time when memory could be scarce.

> Again, imho, VDL is NOT a hole. It should be treated differently. VDL is
> not required to be aligned to the block/extent/cluster size(holes in
> sparse files tend to be albeit not a requirement). You can't punch or
> dig holes in exFAT. Using SEEK_*HOLE* to find *VDL* doesn't make any
> sense(to me, at least).

<remembers to put on his language lawyering glasses>

The lseek manpage states:

   lseek() allows the file offset to be set beyond the end of the file
   (but this does not change the size of the file).  If data is later
   written at this  point,  subsequent  reads of the data in the gap (a
   "hole") return null bytes ('\0') until data is actually written into
   the gap.

IOWs, a "hole" is defined here to be a region of a file which has never
been written to, and therefore reads will return all zeroes.  It doesn't
say anything about the storage that may or may not be backing that
range.

Mild logic leap here: ftruncate() to increase file size also creates a
gap where reads will return null bytes.  From ftruncate:

   If the file previously was larger than this  size,  the  extra  data
   is lost.   If  the file previously was shorter, it is extended, and
   the extended part reads as null bytes ('\0').

In contrast, lot of people call readable file regions not backed by any
space "sparse holes".  Unfortunately the fallocate manpage muddies
things up by saying:

   Deallocating file space
       Specifying  the FALLOC_FL_PUNCH_HOLE flag (available since Linux
       2.6.38) in mode deallocates space (i.e., creates  a  hole)  in
       the byte  range starting  at  offset and continuing for len
       bytes.  Within the specified range, partial filesystem blocks are
       zeroed, and whole filesystem blocks are removed from the file.
       After a successful  call,  subsequent  reads from this range will
       return zeros.

Note that it doesn't say "sparse hole", just "hole".  This ambiguity wrt
fallocate is very unfortunate, because it regularly causes confusion on
fsdevel and other places.  Note that classic Unixy filesystems will, in
creating the "hole" as part of writing at a point past the end of the
file, also create a "sparse hole" by not mapping blocks into the gap.

Also, I sorta lie, because XFS and ext4 are Unixy filesystems, but they
both have modes (large rt extent size and bigalloc) where they actually
can map unwritten blocks into a never-written hole.

exfat doesn't support "sparse holes".  But it does support holes per the
lseek definition, because you can increase the file size via ftruncate,
it'll allocate clusters to back the whole range, and (VDL, i_size] is
the part that exfat knows has never been written and will always return
null bytes in response to a read.

Ok, back to lseek:

  SEEK_HOLE
      Adjust  the file offset to the next hole in the file greater than
      or equal to offset.  If offset points into the middle of a  hole,
      then  the file offset is set to offset.  If there is no hole past
      offset, then the file offset is adjusted to the end of  the  file
      (i.e., there is an implicit hole at the end of any file).

So, SEEK_HOLE jumps to the next file range that has never been written.
It doesn't say anything about backing storage at all.  For exfat, that
would be the start of the VDL.  This is what willy was getting at.

It's too bad that lseek, being a generic interface, has no way to convey
that writes to a lseek hole will be potentially very expensive.  A
program could infer that by the existence of SEEK_HOLE holes and
FALLOC_FL_PUNCH_HOLE returning EOPNOTSUPP.

OTOH I guess they could confirm that by calling the VDL ioctl and
getting a non-error response.  But if we've solved finding the VDL by
making SEEK_HOLE return values below EOF, then why do we need the ioctl?
What if we added a statx flag to advertise sparse hole support on a
file?  And then didn't set it for exfat?

> thb, I don't really care if the ioctl patch is accepted or not. If
> SEEK_HOLE/SEEK_DATA is really what maintainers see fit despite these
> deviations, I'd have to honour that decision and bring iomap to exFAT, I
> guess. That'd be a lot of work though because it mean will the rework of
> the entire exFAT code base. Not saying exFAT doesn't deserve iomap but
> that might be a little over my paygrade since there are thousands of
> embedded devices already using in-kernel exFAT.

/me notes that you can implement iomap only for lseek.

--D

> I retract the patches regarding the ioctl. For the time being, the focus
> should be on making exFAT and NTFS useable and stable before introducing
> ioctls. Sorry for my poor judgement.
> 
> Davo
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
  2026-03-13 16:24           ` Darrick J. Wong
@ 2026-03-16 22:37             ` David Timber
  2026-03-19  0:51               ` Namjae Jeon
  0 siblings, 1 reply; 11+ messages in thread
From: David Timber @ 2026-03-16 22:37 UTC (permalink / raw)
  To: Darrick J. Wong, linkinjeon
  Cc: Matthew Wilcox, linkinjeon, sj1557.seo, yuezhang.mo,
	linux-fsdevel, linux-kernel

On 3/13/26 22:59, Namjae Jeon wrote:
> I'm working on adding iomap support to exFAT, and I think SEEK_HOLE
> will be able to address the requirements we discussed. I will bring
> this up again once the iomap work is complete.
Good! exFAT not constructed using iomap leading to an eventual
catastrophic tech debt has been one of my concerns. Since you're working
on it, you'd naturally implement FIEMAP as well, right? I was cooking up
a quick and dirty functions for FIEMAP... Not sure if you're interested
in them, but would happy to submit in the future. It's only a
kernel-specific interface almost no userspace uses, but really
insightful tool in analyzing fragmentation issue I'm trying to solve.
Might come in handy in the meantime the new iomap implementation is
settling down.

On 3/14/26 01:24, Darrick J. Wong wrote:
> Ah, ok.  Does it do that zeroing at write() time, or only when you're
> initiating writeback from the pagecache?  I'm guessing write() time,
> since otherwise you're signing the kernel up for initiating a lot of IO
> at a time when memory could be scarce.
NO! That'd be insane! I'm pretty sure faults are handled "sparsely".
> In contrast, lot of people call readable file regions not backed by any
> space "sparse holes".  Unfortunately the fallocate manpage muddies
> things up by saying:
Yes, that to which I myself have fallen victim. Thank you for such a
great insight from a filesystem guru, btw.

> OTOH I guess they could confirm that by calling the VDL ioctl and
> getting a non-error response.  But if we've solved finding the VDL by
> making SEEK_HOLE return values below EOF, then why do we need the ioctl?
> What if we added a statx flag to advertise sparse hole support on a
> file?  And then didn't set it for exfat?
> /me notes that you can implement iomap only for lseek.
Yes, I'm aware. Was just expressing my concerns regarding the tech debt
in exFAT mentioned earlier. Just swapped back in differently from my head.

Anyway, admitting my recent defeat, I've been toying with the idea of
using SEEK_DATA and SEEK_HOLE for detecting the [VDL - isize)
discrepancy(attached at the end of the email). Not making an official
submission yet because I've been testing if it breaks any critial
userland utils. So far, so good. A bunch of xfstests fail, though. The
ones written with the assumption that filesystems that support SEEK_DATA
and SEEK_HOLE can always have data or hole in the middle(which I'd like
to call "hole sandwich/burger" and "data sandwich/burger",
respectively). This was what I was worried about and it seems it
actually manifested.

generic/285:
> 05.15 SEEK_HOLE expected -1 with errno -6, got -6.               
> succ  ERROR 28: Failed to write 524288 bytes
generic/490:
> File system does not support punch hole.
>   ERROR 28: Failed to write 32768 bytes
Not sure why all the units "succ" whilst the program returns non-zero
exit code before saying:
> seek sanity check failed!
Might have to fix the test program to catch the misbehaving filesystems
that returns other erronos.

Davo

diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 2daf0dbabb24..99ba1f5f9a57 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -799,8 +799,80 @@ static ssize_t exfat_splice_read(struct file *in, loff_t *ppos,
     return filemap_splice_read(in, ppos, pipe, len, flags);
 }
 
+/*
+ * A special SEEK_DATA and SEEK_HOLE handler that treats the unwritten range
+ * between the VDL(valid data length) and EOF as a hole. Since the VDL in exFAT
+ * is not required to be aligned to any block boundary and holes in extent-based
+ * filesystems are typically aligned to a certain block size, we try our best to
+ * align the VDL to the device block size as not to confuse any userland
+ * programs that may depend on that assumption.
+ *
+ * The function will treat the last block containing data as the last data block
+ * and the block that follows immediately after as the start of the hole leading
+ * up to EOF. The last data block may have some unwritten bytes, but that's only
+ * O(1) write amplification.
+ */
+static loff_t exfat_vdl_llseek(struct file *file, loff_t offset, int whence)
+{
+    struct inode *inode = file->f_mapping->host;
+    struct super_block *sb = inode->i_sb;
+    struct exfat_inode_info *ei = EXFAT_I(inode);
+    loff_t maxbytes = inode->i_sb->s_maxbytes;
+    loff_t datasize;
+    loff_t size;
+
+    inode_lock(inode);
+
+    size = i_size_read(inode);
+
+    datasize = EXFAT_B_TO_BLK_ROUND_UP(ei->valid_size, sb);
+    datasize = EXFAT_BLK_TO_B(datasize, sb);
+    if (datasize > size)
+        datasize = size;
+
+    /* Same check found in iomap_seek_*() */
+    if (offset < 0 || offset >= size) {
+        offset = -ENXIO;
+        goto out;
+    }
+
+    if (whence == SEEK_DATA) {
+        /*
+         * As exFAT does not support sparse files, SEEK_DATA is pretty
+         * much useless. But still, to be compliant, SEEK_DATA shouldn't
+         * work if the offset is in a hole.
+         */
+        if (offset >= datasize)
+            offset = -ENXIO;
+    }
+    else if (whence == SEEK_HOLE) {
+        if (offset < datasize)
+            offset = datasize;
+    }
+    else
+        BUG();
+
+out:
+    inode_unlock(inode);
+
+    if (offset < 0) {
+        return offset;
+    }
+
+    return vfs_setpos(file, offset, maxbytes);
+}
+
+static loff_t exfat_file_llseek(struct file *file, loff_t offset, int whence)
+{
+    if (whence == SEEK_DATA || whence == SEEK_HOLE) {
+        return exfat_vdl_llseek(file, offset, whence);
+    }
+
+    return generic_file_llseek(file, offset, whence);
+}
+
 const struct file_operations exfat_file_operations = {
-    .llseek        = generic_file_llseek,
+    .llseek        = exfat_file_llseek,
     .read_iter    = exfat_file_read_iter,
     .write_iter    = exfat_file_write_iter,
     .unlocked_ioctl = exfat_ioctl,



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
  2026-03-16 22:37             ` David Timber
@ 2026-03-19  0:51               ` Namjae Jeon
  2026-03-19  3:29                 ` David Timber
  0 siblings, 1 reply; 11+ messages in thread
From: Namjae Jeon @ 2026-03-19  0:51 UTC (permalink / raw)
  To: David Timber
  Cc: Darrick J. Wong, Matthew Wilcox, sj1557.seo, yuezhang.mo,
	linux-fsdevel, linux-kernel

On Tue, Mar 17, 2026 at 7:37 AM David Timber <dxdt@dev.snart.me> wrote:
>
> On 3/13/26 22:59, Namjae Jeon wrote:
> > I'm working on adding iomap support to exFAT, and I think SEEK_HOLE
> > will be able to address the requirements we discussed. I will bring
> > this up again once the iomap work is complete.
> Good! exFAT not constructed using iomap leading to an eventual
> catastrophic tech debt has been one of my concerns. Since you're working
> on it, you'd naturally implement FIEMAP as well, right? I was cooking up
> a quick and dirty functions for FIEMAP... Not sure if you're interested
> in them, but would happy to submit in the future. It's only a
> kernel-specific interface almost no userspace uses, but really
> insightful tool in analyzing fragmentation issue I'm trying to solve.
> Might come in handy in the meantime the new iomap implementation is
> settling down.
Since SEEK_HOLE will be supported as part of the upcoming iomap work,
your patch below can be dropped and I won't apply it. And I would be
very grateful if you could submit your FIEMAP patch after iomap
support lands.
Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
  2026-03-19  0:51               ` Namjae Jeon
@ 2026-03-19  3:29                 ` David Timber
  0 siblings, 0 replies; 11+ messages in thread
From: David Timber @ 2026-03-19  3:29 UTC (permalink / raw)
  To: Namjae Jeon; +Cc: sj1557.seo, yuezhang.mo, linux-fsdevel, linux-kernel

On 3/19/26 09:51, Namjae Jeon wrote:
> Since SEEK_HOLE will be supported as part of the upcoming iomap work,
> your patch below can be dropped and I won't apply it. And I would be
> very grateful if you could submit your FIEMAP patch after iomap
> support lands.
> Thanks!
Fair enough. As mentioned previously, a bunch of xfstests will fail
because exFAT's SEEK_HOLE doesn't behave like all the other sparse file
systems would. Will have to disable part or all of lseek() test cases.

I had another idea of using FALLOC_FL_ZERO_RANGE as means of truncating
VDL. I'll make that one offical send send them through. Or, if you
already thought of that too, well, sorry to bother you.

Cheers,
Davo

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2026-03-19  3:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-11 22:26 [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl David Timber
2026-03-12  3:23 ` Matthew Wilcox
2026-03-12 14:02   ` David Timber
2026-03-12 14:57     ` Matthew Wilcox
2026-03-12 15:06       ` Darrick J. Wong
2026-03-13  7:59         ` David Timber
2026-03-13 13:59           ` Namjae Jeon
2026-03-13 16:24           ` Darrick J. Wong
2026-03-16 22:37             ` David Timber
2026-03-19  0:51               ` Namjae Jeon
2026-03-19  3:29                 ` David Timber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox