public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed
From: David Timber <dxdt@dev.snart.me>
To: linux-fsdevel@vger.kernel.org
Cc: David Timber <dxdt@dev.snart.me>
Subject: [PATCH v1 1/1] exfat: add fallocate mode 0 support
Date: Sat, 28 Feb 2026 17:44:14 +0900	[thread overview]
Message-ID: <20260228084542.485615-2-dxdt@dev.snart.me> (raw)
In-Reply-To: <20260228084542.485615-1-dxdt@dev.snart.me>

Currently, the Linux (ex)FAT drivers do not employ any cluster
allocation strategy to keep fragmentation at bay. As a result, when
multiple processes are competing for new clusters to expand files in
exfat filesystem on Linux simultaneously, the files end up heavily
fragmented. HDDs are most impacted, but this could also have some
negative impact on various forms of flash memory depending on the
type of underlying technology.

For instance, modern digital cameras produce multiple media files for a
single video stream. If the application does not take the fragmentation
issue into account or the system is under memory pressure, the kernel
end up allocating clusters in said files in a interleaved manner.

Demo script:

	for (( i = 0; i < 4; i += 1 ));
	do
	    dd if=/dev/urandom iflag=fullblock bs=1M count=64 of=frag-$i &
	done
	for (( i = 0; i < 4; i += 1 ));
	do
	    wait
	done

	filefrag frag-*

Result - Linux kernel native exfat, async mount:
	780 extents found
	740 extents found
	809 extents found
	712 extents found

Result - Linux kernel native exfat, sync mount:
	1852 extents found
	1836 extents found
	1846 extents found
	1881 extents found

Result - Windows XP:
	3 extents found
	3 extents found
	3 extents found
	2 extents found

Windows kernel, on the other hand, regardless of the underlying storage
interface or the medium, seems to space out clusters for each file.
Similar strategy has to be employed by Linux fat filesystems for
efficient utilisation of storage backend.

In the meantime, userspace applications like rsync may
use fallocate to to combat this issue.

This patch may introduce a regression-like behaviour to some niche
filesystem-agnostic applications that use fallocate and proceed to
non-sequentially write to the file. Examples:

 - libtorrent's use of posix_fallocate() and the first fragment from a
   peer is near the end of the file
 - "Download accelerators" that do partial content requests(HTTP 206)
   in multiple threads writing to the same file

The delay incurred in such use cases is documented in WinAPI. Patches
that add the ioctl equivalents to the WinAPI function
SetFileValidData() and `fsutil file queryvaliddata ...` will follow.

Signed-off-by: David Timber <dxdt@dev.snart.me>
---
 fs/exfat/file.c | 41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)

diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index 90cd540afeaa..4ab7e7e90ae6 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -13,6 +13,7 @@
 #include <linux/msdos_fs.h>
 #include <linux/writeback.h>
 #include <linux/filelock.h>
+#include <linux/falloc.h>
 
 #include "exfat_raw.h"
 #include "exfat_fs.h"
@@ -90,6 +91,45 @@ static int exfat_cont_expand(struct inode *inode, loff_t size)
 	return -EIO;
 }
 
+/*
+ * Preallocate space for a file. This implements exfat's fallocate file
+ * operation, which gets called from sys_fallocate system call. User space
+ * requests len bytes at offset. In contrary to fat, we only support "mode 0"
+ * because by leaving the valid data length(VDL) field, it is unnecessary to
+ * zero out the newly allocated clusters.
+ */
+static long exfat_fallocate(struct file *file, int mode,
+			  loff_t offset, loff_t len)
+{
+	struct inode *inode = file->f_mapping->host;
+	loff_t newsize = offset + len;
+	int err = 0;
+
+	/* No support for other modes */
+	if (mode != 0)
+		return -EOPNOTSUPP;
+
+	/* No support for dir */
+	if (!S_ISREG(inode->i_mode))
+		return -EOPNOTSUPP;
+
+	if (unlikely(exfat_forced_shutdown(inode->i_sb)))
+		return -EIO;
+
+	inode_lock(inode);
+
+	if (newsize <= i_size_read(inode))
+		goto error;
+
+	/* This is just an expanding truncate */
+	err = exfat_cont_expand(inode, newsize);
+
+error:
+	inode_unlock(inode);
+
+	return err;
+}
+
 static bool exfat_allow_set_time(struct mnt_idmap *idmap,
 				 struct exfat_sb_info *sbi, struct inode *inode)
 {
@@ -771,6 +811,7 @@ const struct file_operations exfat_file_operations = {
 	.fsync		= exfat_file_fsync,
 	.splice_read	= exfat_splice_read,
 	.splice_write	= iter_file_splice_write,
+	.fallocate	= exfat_fallocate,
 	.setlease	= generic_setlease,
 };
 
-- 
2.53.0.1.ga224b40d3f.dirty


  reply	other threads:[~2026-02-28  8:46 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-28  8:44 [PATCH v1 0/1] exfat: add fallocate mode 0 support David Timber
2026-02-28  8:44 ` David Timber [this message]
2026-03-03  6:13   ` [PATCH v1 1/1] " Namjae Jeon
2026-03-04 10:30   ` Namjae Jeon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260228084542.485615-2-dxdt@dev.snart.me \
    --to=dxdt@dev.snart.me \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox